April 27, 2006
Jim Starkey: Introducing Falcon (Storage Engine for MySQL)
Last session of the MySQL UC is Jim Starkey giving an introduction to the new Falcon storage engine. Jim is an icon in the database field, was the creator of MVCC and the BLOB data type. There's a can of Falcon beer for anyone who asks a good question.
What is Falcon
- transactional MySQL storage engine
- based on Netfrastructure database engine
- engine has been in mission critical apps for more than 4 years
- extended and integrated into MySQL
Falcon is NOT
- an InnoDB clone
- a Firebird clone
- a standalone database management management system
Jim's been at this for a long time, there have been some changes since he wrote his first database at DEC:
- Uni-processors to multi-core
- CPU performance from 1.7 to 42000 MPS
- Address space from 32-bit to 64-bit
- Memory speed from 1 micro to 7 nanoseconds
- Memory from 2mb to 4GB
- Disk access has gone from 20 ms to 7 ms
- back then 100 users was big, now 100,000 users is a small application
- RDMSs moved from decision support -> OLTP -> Data minint -> Web
- DBAs are smarter and they are harder to find
- Application programmers skill levels are down, using higher-level languages
- Expert design means appearance, not architevture
- Applicactions have much larger databases, more queries per human interaction, fewer rows per results set, latency is more critical, blobs are much larger (more important), and search without context.
What Jim has learned
- CPUs and memory are faster
- disks are much slower
- MVCC (which was invented by Jim) works
- Record versions on disk are problematic
- Web applications are better and are the future (the attention span is low so you have to make good ones and are forced to constantly be making them better)
- People have more important things to do than tune databases - these days machines are powerful enough to be able to tune themselves, people shouldn't have to spend
Falcon is designed for the next 20 years. Jim is comfortable saying that what he's learned over the past 20 years in databases and has put into Falcon will be .
Goals of Falcon
- Exploit large memory for more than just a bigger cache
- Use threads and processors for data migration
- design to elimnate tradeoffs, minimize tuning
- scale gracefuly to very heavy loads
Architebtural OverviewIncomplete in-memory database with backfill from disk that has two caches. The traditional LRU page cache for disk. A larger row cache with age group scavenging. Falcon is multi-version in memory and single version on disk. All transaction state is in memory with automatic overflow to disk. Data and indexes are single file plus log files. In the future Jim would like to create BLOB repositories where the data is stored off to the side. Hoping to provide multiple page spaces in the future.
Falcon uses Btree indexes with prefix compression. There is no data except the key in the index.
Uncommitted row data is staged in memory (can overflow to to scratch file). Indexes are updated immediately. On commit row data is copied to the serial log and written. Post commit dedicated thread copies row data from serial log pages to data pages. Page cache is periodically flushed to disk. BLOB data is scheduled for write at creation.
Data reliability is protected by "careful write" where writes are sequensed to the disk so it's always valid and consistent. There is a repair mechanism but Jim's hope is that noone will ever have to use it.
Falcon has a do/redu/undo log in the serial.
Jim's got a secret agenda of things he'd like to do in the database world, starting with MySQL:
- Jim wants to replace varchar with string, varchar is a throwback to punchcard technology
- Replace tiny, small, medium, and big integers with "number"
- Adopt a security model useful for the web
- Row-level security (filter sets)
- Teach the database world the merits free context search - why do you have to use a SELECT statements
- Foreign keys - have them but not enforced
- Backups - Netfrastrucure has two
- When can we see it - will get a beta in Q3 ( from Robin)
- Filesystem storage - stored in one file broken into fixed-length pages from 4K to 32K
- Memory - everything except BLOBs come into the cache
- Performance . . . 420 GB instance in Massachusetts that is pretty active and is only using 30% of the CPU
Posted by mike at April 27, 2006 5:19 PM