« MySQL Cluster New Features and Roadmap | Main | A Tour of the MySQL Source Code »

April 20, 2005

MySQL Cluster Architecture

Listinging to Max Mether present on the architecture of MySQL cluster at MySQL 2005.

Data Ditribution

Tables are horizontally partitioned. Spread equally across the data nodes. Each piece of data has a master and a secondary data node. If a node failes then it's secondary data node. If you loose too many nodes Max says "it's game over for the cluster". Max suggests that you should associate one node with one computer, having multiple nodes on a computer doesn't help with redundancy because if the computer fails all the nodes on that machine will fail.

Synchronous Replication

The cluster operates on the two phase commit principle. In the first phase all the data nodes are updated. If the first phase was successful on every node, the transactions are committed. The commit phase is the second phase.

Failure Detection

Two methods used to determine if a node has failed.
  1. communication loss with node
  2. heartbeat failure - each node sends heartbeat information to tne next node in the cluster. if a node isn't sending a heartbeat, the node that should be getting it will inform the others. hearbeats are sent every 1.5 seconds by default. A node is failed if it misses three consecutive heartbeats

Node Recovery

A failed node needs to be recovered ASAP. The restored node:
  1. rejoins the heartbeat circle
  2. copy meta data (tables, indexes, cluster info)
  3. join possible transactions
  4. copy data from the primaty node
  5. regain the primary status

System Recovery

Want to avoid a complete system shutdown if possible. There are checkpoints that put the data to disk that can be used, but it's a bit of work.

On each committed transaction there is a redo log on disk that gets written. The data is flushed to disk on every global check point, which happens every 2 seconds. There is also a local checkpoint that makes a complete copy of the data in the cluster. The local checkpoint takes a lot of time, so they write dirty data but also create an undo log that can be applied to bring the data to a consistent snapshot. Each node has it's own backup, redo and undo log.

(I missed the rest, got caught up in a sysadmin issue dealing with a restore from backup back at the data center).

Posted by mike at April 20, 2005 2:16 PM