« Fourth Year of the Mike Kruckenberg Weblog | Main | Afternoon visit to the Harley Davidson Store »

March 10, 2006

Storing Binary Data in MySQL

The topic of storing binary data in MySQL rears it's head every so often, a debate always ensues about if and when it's appropriate, and fades out without any resolution. Sheeri posed the question again a few days ago.

We did this at Tufts, had ~1.5 million images (~35G) that were stored and served from a MyISAM binary_data table. I think the grass is always greener on the other side. The performance was decent, and we were able to use replication to spread load but we regularly talked about moving to an NFS.

I should mention that I presented about this at OSCON, and was fairly pro database for binary data. Since that presentation our image database has more than doubled in size, and we gained access to a fullly-reduntant, multi-terrabyte Network Appliance. Once we had some of our data on the netapp, we thought the grass was greener over in NFS land for our images for a few reasons:

  1. Database backups and snapshots are much easier without huge amounts of binary data. If you need to refresh a slave or sync data down to a development environment it adds a lot of extra time when the dump and restore includes the binary data. It took us many hours to get a snapshot and sync it down to another machine.
  2. The data center could back up and restore to the filesystem, but not a MyISAM table. If an image (or 10) got deleted, the filesystem backups could handle putting them back. With MySQL you had to drop the entire table, restore from snapshot and roll the binary logs forward. Just seemed like a lot of work to get images back.
  3. A corrupted disk with ~1.5million images is easier to deal with than a corrupted 35G MyISAM table. I feared the day that our binary_data table reported that it needed repair. I never had to run the repair on that table, but on tables a fraction of the size it was a painful wait.

Having said that, there were good things about having the images in the database. Being able to replicate and have images appear on many servers in near-real time was pretty cool. The load could be spread across several machines instead of having one server. We didn't use foreign keys, but having integrity with your binary files is nice.

To Sheeri's point about moving to MySQL because you don't want to get another NFS server to add more images, putting them in MySQL might not change that. A lot depends on how you design the system. If you need more space or horsepower at some point, you'll still need to swap in bigger drives or slice the data out onto another machine. There's a slide in my presentation that compares performance between filesystem, MyISAM and InnoDB retrieval. I did not compare inserts or updates.

We had a lengthy conversation about this on one of my last days at Tufts, will have to check in with the developers in a few months and see what they decided.

I do think the grass is often greener on the other side, if you're unhappy with NFS the database looks better, if you're unhappy with the database NFS is the answer.

Posted by mike at March 10, 2006 10:03 PM