July 21, 2003

Getting More Than Zero Nines

Since the inception of our project, we have never cared to keep track of downtime, and translate that into percentage of uptime.

So one could either say based on our lack of records either we've never had uptime, or we've never had downtime. I'm determined to change our attention to uptime, if nothing else to put a little pressure on myself (and a few team members) when considering making one of those mid-day, hope-it-works changes to production. So I started a small log in which I record any downtime, planned or unplanned.

My first entry was after 14 minutes of downtime for replacing a bettery in our hardware RAID array. I guess at that point if someone would have asked what our uptime statistics were I would have to had said 0.001% uptime.

It's now been 42 days and we've experienced an additional 45 seconds of downtime for Apache restarts (loading Perl libraries). I guess that works out to around 99.999%.

Have seen this a few places, most recently Jim documented the numbers on the nines.

Somehow any number of nines doesn't make me feel any better. There's a lot more to an application than being boolean ) or 1. What about the users who are experiencing bugs in the application, to them the system might as well be down.

Posted by mike at July 21, 2003 4:57 PM