July 27, 2004
MySQL Performance Workshop
Jeremy Zawodny is doing a tutorial about tuning MySQL at OSCON 2004. Kind of a cool approach to the tutorial, Jeremy puts up a list of Topics and is asking for audience input on how to proceed. I saw got quite a bit of information on tuning at MySQL 2004, but I'm still convinced there's a part of our system somewhere that needs tweaked. I haven't noticed any performance issues on our installation, but want to be educated about how to solve them should they arise.
Jeremy starts with a slew of questions to get familiar with the audience and then gets into the "how MySQL is used at Yahoo." 500-1000 installations, mostly run on FreeBSD 4.x on 32bit Intel (commodity hardware), anywhere from 2-30G of data. Lots of replication, load balancing. Alteon and Foundry load balancers used. Leaned in Alteon direction because you could have it execute a script which requested a custom php page from the database server which would run some checks and return a more sensible health check compared to a TCP/IP check on port 3306.
What things might affect performance
- Schema, Indexes, Table Type
- Hardware (CPU,RAM,Disks,Network)
- MySQL Configuration
- Application design
Things that don't matter
- Language - Perl, PHP, Python, Ruby all use a wrapper around the C library which will give similar performance to all languages
- Java JDBC is pure Java
Schemas, Indexes, Table Type
- Choosing good indexes
- InnoDB has a penalty for larger primary keys.
- MySQL only uses one index per table per query, may need to have compound indexes
Review of storage engines and features (MyISAM, HEAP, BDB, InnoDB, NDB, Archive). Archive table type is a new type in 4.1.
- Query cache - set aside a chunk of memory for queries
- Use bulk inserts where you can
- Perpared statements (4.1)
- Put hints in query
- slow query log
- tweak query cache, key buffer and InnoDB buffer pool
Disks and I/O
- IDE disks are less smart - SCSI can take multiple requests and grab them in the best order
- seektime is critical
- only expect 150 I/O operations/second on 10K SCSI drives
- more spindles is better, means more concurrent read, plus more data on outside of disks
- battery-backed write cache on disks is great because don't have to wait for disk write to complete transaction
- SAN and NAS storage machines are good solution because they add many spindles, can cost quite a bit
- use RAID 0 on slaves, very fast but not redundant in any way
- set stripe sizes larger, 16K to 32K to keep data in sequence
Application Design Issues
- Caching at application level - Amazon needs 300+ queries/page to generate front page, makes a lot of sense to cache that information
- Compress blobs
- hardware isn't the solution
- use EXPLAIN
- fewer joins are better
- indexes - be careful of using functions in the where clause, many times will make it so index isn't used
What tools does Jeremy use to find problems?
First thing is to determine where the bottleneck is. Start with using vmstat, sar, top, taskmanager to find CPU, memory, disk etc. Sometimes getting I/O can be difficult, iostat is a good start. Once beyond the OS level uses mytop to dig into MySQL.
Is hard to imagine replication being a burden on the master, reading the binlog is easy because it's typically in memory, so the only real threat for replication is network bottleneck. Important to realize that the slaves have to execute the same queries as the master.
Posted by mike at July 27, 2004 2:48 PM