« One Laptop Per Child | Main | Professional Cat Herding: A FOSS Community Panel »

April 24, 2007

Technology at digg.com

Listening to Elliott White III (Eli) and Tim Ellis (Time) talk about technology at the 2007 MySQL Conference. As expected, a packed session.

Eli is the Senior PHP dude, Time is the DBA.

Initially started with Apache 1.3 and PHP 4.x. Used MyIsam and MySQL full text search. Then moved to multiple servers, Apache 2.x, MySQL Innodb, PHP 5, and memcached. Are now using Lucene with solar for search. Currently around 100 machines, 30 databases and the rest serve other functions. 9 memcached machines running on the db slaves. Right now around 30 gigs of data. XSF for user interface stuff (millions of user images). ext3 for other things.

Many PHP servers behind a load balancer. Many MySQL slaves talking to a master. Randomize connections between PHP servers to MySQL connections.

Memcached is used heavily for caching chunks of content. You can't cache the entire page because it almost entirely customized for the user. There are issues to consider with memcached data how to failover and to handle things like when a memcached server goes down and then comes back up but has older data.

With MySQL, the first thing the DBA did was to bring in memcached to offload some of the works from MySQL. Then they started breaking databases into smaller ones (sharding) in order to get better performance. It makes it harder to do things in SQL because the data is in separate places.

Types of sharding:

This is like partitioning, but partitioning wasn't available at the time.

digg uses MySQL 5.0.32 on Debian. Their engine type is InnoDB (recoveres very quickly on a crash).

Current challenges:

Posted by mike at April 24, 2007 12:01 PM