« MySQL Cluster Configuration, Tuning, and Maintenance | Main | Dual Core Processors are Cool »

April 24, 2006

MySQL Replication for Scaling and High Availability

Afternoon tutorial at MySQL Users Conference (2006) is Replication for Scaling and High Availaility by Jeremy Cole (not this jeremycole) from Yahoo!.

Jeremy starts with a general overview of replication and provides some detail on general things replication is and is not. There is no limit to the number of slaves, but there is a limit on the bandwidth that a server can serve so with too many slaves the master could be overloaded. If you need massive numbers of replicated machines it's better to not have them all pointed at one master. It's better to have a relay slave who's job it is to stand in the middle and serve data out to groups of machines.

Master to master replication is possible, but there are lots of problems. Primary keys can be carefully disignated but the bottom line is that when it fails it's hard to know what state things are in.

Jeremy's got this cool diagram that shows the flow of replication from the master through the IO thread to the SQL thread, through the replication configuration filters and into the slave data.


A few things that Jeremy says to be aware of:


A few things to not do:

Replication Commands

Jeremy goes through the replication-related SQL commands which include:

Configuration Options

The following options are used in configuring MySQL replication:

Building Replication Failover

The goal is to build a system that can failover. This is use *most* often in planned failovers, it's more likely that you'll need it for machine updates than you will in an actual failure (but it also works in a failure). DNS is not good because it's unreliable for change as it doesn't ensure clients notice the change.

You do not connect directly to the master, you connect to a proxy that connects you to the master. Proxy decides who the current master is based on the configuration in the proxy. Stunnel with a null-cipher would do the same.

The setup is multi-master, but you only write to one master at a time. Jeremy doesn't recommend automatic failover, there's a problem in determining when to failover. It could be that the network could be down, it could be just down from the view of the monitor.

(having to work on a somewhat important issue at work, missing a bit here)

Jeremy says that, as a Yahoo!, the purchase of InnoDB doesn't worry him.

Yahoo! uses two different camps of hardware. When having to use existing hardware it's typically 32-bit 1.x GHz dual CPU with a few gigs of RAM. If they can buy new it's AMD opteron dual 2.x GHz with 4G or 8G of RAM.

Yahoo! uses mostly version 4.0 still (5,000 on 4.0 and 1,000-2,000 on 4.1).

Posted by mike at April 24, 2006 4:42 PM