« Switching from embperl to Mason | Main | MySQL 2004 Underway »
April 12, 2004
rsync to Keep Machines in Sync
Today I'm working on a process to keep directories on load-balanced servers in sync. I don't have the energy to document the ongoing debate about why wer're not using NFS for shared storage. Suffice it to say we're not using it (may hear more on that later).
The solution we've been using is rsync. In the past the webservers have pulled files from a central machine, where all files needed on the web were dumped. With the upcoming release of a web-based authoring tool we now enter a more complex arrangement where any one of the webservers will be getting new uploaded files, adding the necessity for updates going in each direction.
Thinking about each webserver being responsible for syncing to all the other webservers was more than anyone wanted to think about. Even having each webserver sync down to a central location was bringing up more questions than we wanted to answer. We've decided to try having one central machine handle all data synchronization. It will first rsync down from each webserver to a central directory, and after gathering all the new files will go back and push out to each machine. The current thinking is to have one script to do this which will be run every minute from cron, with an internal check to exit if the last process is still running.
Today I'm writing the script, and came across this issue. There are currently 18 directories with ~50,000 files. We've determined that only 4 of these directories need to be synced on the minute, the other ones can be done nightly. The 4 directories currently contain ~5,000 files.
The question: Is it faster to run 4 individual rsync processes, one for each sub-directory, or to use the --exclude option on one huge dir?
Running an rsync process for each sub-directory (with no files to sync) takes ~8 seconds. To run one process and --exclude the stuff I didn't want takes 2.7 seconds. That can be a pretty significant savings when trying to keep several machines up to date.
For the curious, the options I'm using for rsync are: -rltue ssh
Posted by mike at April 12, 2004 6:08 PM
Hard Drive Recovery Group offers hard disk data recovery services for RAID, laptops and servers. Complete clean room and hard drive repair service.Trackback Pings
TrackBack URL for this entry:
http://mike.kruckenberg.com/mt/mt-tb.cgi/536