April 19, 2005

Building Redundancy into the Human System

Listening to Andrew Cowie talk about operations professionalism. I attement the pre-conference tutorial.

We need a systematic way to learn from experiences, bridging between structure and flexibility.

The goal of the presentation: Awareness of operations professionalism, and we need to take pride in what we do.

In IT we have a tendency to wing it. True professionalism is being respectful enough to follow the procedures every time so no mistakes are made. When something does go wrong to take notes, document the problem.

Andrew suggests that in operations we could learn something about good practices from extremen programmming. Having two people sitting together to document and run operations will prevent numerous problems.

When the system breaks the programmer say, "Well, systems break. What did you expect?" The programmers have no direct role in the operations, even though they are running things that . Build teams to include systems administrators, database administratos and programmers.

You can't always hire good systems administrators. They might be experienced in some technology, but to make them good in your environment it takes time and training. The faster you can transfer knowledge to new people the faster they get up to speed.

Andrew spends some time talking about the decision-making process. OODA Loop as an example.

The scientific method is rarely used in the IT world. Example is performance tuning. Do we change one thing at a time to see what the performance changes are, or do we change a lot of stuff and then aren't sure about why the performance improved or declined.

Posted by mike at April 19, 2005 1:31 PM