« Slides for Creating INFORMATION_SCHEMA Tables | Main | Post-Talk Thoughts about INFORMATION_SCHEMA »

April 26, 2007

Paul Tuckfield: Scaling MySQL at YouTube

Listening to Paul Tuckfield talk about YouTube's use of MySQL at the 2007 MySQL User Conference.

YouTube's web stuff is Python and Memcache. Database is MySQL with some serious replication. 100M views in a day happened in July 2006 but it is actually looks like it has more than doubled since then according to a graph that Paul showed.

Started with the replication setup with a single master for writes and many slaves for reads. Moved to a system where specific pages are pulled from specific replicas.

One of the important lessons they learned. When upgrading to 5.0 from 4.1 they had some of the servers perform much better than that others. Turned out that if they dump ed and then reimported the data the server performed much better. When moving to 5.0 if the tables are rebuilt they use a more compact data storage and gets better performance.

Paul spends some time talking about the RAID cache, filesystem cache, and database cache and making sure that they don't get in the way of performance. With a database cache the RAID and filesystem cache mostly get in the way for both reads and writes.

Paul's presentation works it's way up a graph of videos served over time. He highlighting some of the points on the graph where the system plateaus or drops off and what was happening with the DB at the time and how they got past it.

Note: Paul's presentation didn't end with the conclusion of the keynote. There is quite a "following" outside the conference hall listening to him go into more detail on the work at YouTube. I guess it continued beyond that by the group migrating to the breakfast area.

Posted by mike at April 26, 2007 11:25 AM