« Replace NFS with HTTP | Main | Building the Amazon.com Platform »
July 29, 2004
Managing 1,000 TicketMaster Servers In Your Spare Time
Listening to Sean Lynch from Ticketmaster talk about their process in managing 1,000+ servers.
Background
- 800 1u xSeries Linux 2.4
- 200 dev, qa, test systems
- 4 prod clusters - 2 in Chicago, 2 in LA - all 4 active and completely independant
- 6,000 Tickets/min @ 800Mb/s
- atypical traffic patterns - immense jumps in traffic
- Apache C/Perl Modules, very modular
- in 2001 did a complete rebuild, used to be WinNT but didn't scale - now all Linux/Apache/mod_perl
- strong push toward automation
- machine to sysadmin ratio is 300:1
- looked at cfengine didn't meet their needs
- would like to open source the next gen
- using nagios for monitoring, ganglia is good too
- use MySQL for event data
All systems should be active
- True active using multi-master Oracle replication, Active/Active VRRP, Clustered NAS, DNB0based GSLB.
- utilize 100% of hardware investment
- forces low-level understanding
- better to decomission a "known" system that commission "unknown" system
- forces symmetry
Utilize a Flixible System Organization
- operations handle - class-node#.product.cluster.sa
- class is a unique configuration - must be diligent about being
- classes include proxy, cache, database etc
Automate Everything
- kickstart for OS install
- automation software keys from the class
- each class symlinks into a repository of hundreds of RPMs used build
- rsync m4 templates by class over root files
- m4 files include class-specific definitions
- system deployment is 100% automated, but has drift
- next generation config system is in development
- always running, decoupled from kickstart
- "actions" tackle config drift
- XML document has all node config
- is parsed by class and then specific node config object made available to action
- actions run in change or audit mode
- severity level specified to indicate how it might interrupt
- Template::Toolkit replaces m4
- APT replaces symlinks
Posted by mike at July 29, 2004 11:29 AM