July 23, 2007
Building Scalable Internet Architectures
This session is about how to big really big stuff. Theo talks fast and confesses that things will move fast...will catch interesting points (to me).
Scalability is not performance. To Theo, scalability is:
How well a solution to some problem will work when the size of the problem increases.
Performance is the capabilities of a machine or product when observed under particular conditions.
Solutions for High Uptime
- parallel servers - simultaneous machines doing the same thing
- hot spare/standby - automatic failover
- warm spare/standby - not automated failover
- cold spare/standby - equipment and backups to reestablish
Scalability also means that maintenance of the site isn't overbearing.
Procedures make for better scalability. Having code reviews, using version control, and upgrade strategies.
- Understand the velocity of development, how much code is going in
- Understand administrative aspects
- Understand liklihood of failure - sometimes it's better to buy an extra box instead of a support contract
- Know how stable the software is - how often will the library be updated and how will it affect your application
If you're developing stuff faster than you can check the code properly then you're going to end up with a problem down the road with bugs and poorly architected code.
Three Simple Rules
- optimize where it counts
- complexiy has costs
- use the right tool
Clustered Image Serving Example
Goal is to serve static images at 120 MBs throughput with 24x7 uptime across three geographically distributed sites.
Rather than have multiple dedicated HA/LB servers sitting on top of your systems as described in the "white paper", have all of your servers exposed and run the HA/LB as a process on each machine.
The solution is wackamole (Theo is an author), a peer-based HA solution. Uses commodity hardware, each machine being responsible for a set of IP addresses, but each machine capable of taking the IP address of the other machines if they fail.
Theo takes a tour of spread, which is a pre-requisite for wackamole. Wackamole works internally by sending messages around that servers are or are not part of the group. Externally, it works by responding to ARP requests from user machines.
Use clever DNS to get people to the set of IP addresses. First is server-side, have the application dynamically buld pages to point to a specific site for images based on login or other credentials. Second is proximity-base DNS. Not good to rely on internet DNS mechanism. Third solution is to use DNS on a shared IP (anycast and BGP).
Theo doesn't like tiered systems. They are expensive technically and financially. Good way to scale up, makes it hard to scale down. Often locks you into a certain architecture and difficult to troubleshoot. Database replication is hard, but needed if you're trying to break the system into functions.
If you don't want to tier, how do you replicate databases?
Theo walks through an introduction to database replication with some of the pros and cons of different solutions.
Posted by mike at July 23, 2007 1:31 PM