« February 2003 | Main | April 2003 »

March 31, 2003

Weekend at the Datacenter

Just completed a 38-hour saga restoring services to our machines, including hours and more hours with Sun (on the phone and in person).

Saturday afternoon I got a page from one of our machines, HTTP service was down. I looked at the machine, which was completely siezed up. Some of the typical commands used to determine what was going on were hanging (df, ls). Our console server couldn't get a prompt, so I tried to shutdown the machine. Shutdown hung as well. Time for a trip to the datacenter . . .

Once I got to the datacenter I rebooted the machine, it hung after finding the boot disk. Tried several boot commands without success and finally decided to call Sun.

3 hours and dozens of reboots later they decided it's a hardware issue. By this time it's midnight and we've been working on the issue for 8 hours. Told the engineer to meet me at 6am and went home to sleep for a bit.

6am on Sunday the engineer shows up with a main board, cpu, memory and a scsi backplane. Within 5 minutes he's figured out it's the controller in the A1000 RAID array and the harware he brought is of no use.

For the next 14 hours we work with the Sun guy while he orders a controller, installs it, realizes it's got old firmware, orders another controller, installs it, spends several hours trying to set it up, get's stuck, orders a third controller and new hard drives, and at last finishes the project using a GUI tool running on our laptop via X11. During the process our data gets wiped.

The Sun guy had warned us midday that we might need a restore from backups. The data center operator said he wasn't sure how to do a restore on our box, but showed us how he did it on another box. We gasped when he logged into an NT box and showed us where on the Start menu he needed to click to restore data to the box.

So, I worked with the data center operator and we (I) figured out how to push a restore from backup server (which runs Solaris) out to our box. We were delighted to find that all the tapes were on-site. The operator got all the tapes mounted into the backup machine (backup and restores are done with legato over the network) long before our A1000 was fixed. It was nice to know that the second the drives were ready we could start feeding the data onto them.

38 hours after the service went down the restore had completed and we were fully running. This has been our first major failure and the first time we've called on Sun to exercise our service contract. It took much longer than we had thought to get the A1000 up and running, and we didn't expect to have our data wiped. Having sounded like Sun really sicked, we enjoyed working with the engineer and felt he had technical expertise and made good decisions. Most of the time waiting was because hardware wasn't quite right or some instructions the the engineer had weren't quite right (and he always kicked himself for following the Sun recommended path instead of the one he knew).

It was a good thing it was over the weekend when few students needed the system. Even though we spent a lot of time there, it was really the Sun guy who was sweating bullets. It sucked to be at the datacenter all weekend, but then again that means I've got good reason to have not gone in today (still debating about tomorrow) and be taking a nap in a few minutes . . .

Posted by mike at 1:50 PM

March 28, 2003

My Mistake Finishing Bad Disk Fix

Even though completing yesterday's disk work seemed simple, I goofed up. I had one disk with all the data on it, and a disk that I had repaired (which had become unformatted during the repair).

Unfortunately when I created the mirror and attached the first submirror disk I attached the unformatted disk. Thus when I attached the second submirror disk it synced up to be identical to the first disk. So I ended up with a mirror of unformatted disks, unusable.

The data on most of our machines is rsynced up from a common machine, so it wasn't the worst case scenario. I removed the mirror and submirrors, newfs'd the drives, recreated the mirror and attached the two submirrored disks. It took a few hours to get all the data transferred back over (this machine happens to be our media server, Real, Zoom etc).

In the end it appears that doing a repair on the disk has, at least temporarily, fixed the issue. Will be interesting to see how long it lasts.

Posted by mike at 11:33 AM

March 27, 2003

Another Bad Disk

Started getting messages yesterday about a problem on a mirrored disk from Solaris Disk Suite on one of our machines. I think this is the third disk that has had problems.

Last time this happened I raced over to the data center and did a complete swap of the mirrored drive pair, this time I thought I'd try something else first just to see.

Steps this time:
1. wait until system is quiet
2. unmount meta device
3. detach drives from mirror
4. clear mirror
5. mount up single disk (the good one)
6. use format->analyze->repair on bad disk (says the disk is OK)

I'm waiting for the next quiet period and then I plan to:
7. unmount single disk
8. recreate metadevice and submirrors
9. change vfstab
10. mount up md

We'll see if the metacheck reports problems, if so I'll be over to the datacenter to swap out the disks.

Posted by mike at 10:55 AM

Hanging Out with Pete - Open Source Themed Visit

I've been in Utah for the past few days hanging out with Pete. Spent a great deal of time discussing technology, open source initiatives, work, home etc.

Several good things came out of the trip. During our traditional trip to DQ we struck up a conversation about taking vs. giving to the open source community. It's been on my mind lately and the conversation prompted me to start thinking of ways I could contribute to the community. Presenting at OSCON is a start, I wondered if I could conjure up my writing skills I developed at school and during a short stint as a marketing dude and put together an article or two. I also thought about taking the top few open source projects we use most at work and giving a little time to coding, testing etc. How else does one contribute?

Another good thing was the implementation of Gallery on the kruckenberg family site. It was fun getting it all set up and customizing it to work within our framework, but the real reward was having good tools to share pictures with others. I used Pete's digital camera quite a bit to capture activities and then was fairly religous about posting them quickly to the gallery.

Posted by mike at 10:33 AM

March 19, 2003

Tricking the Spamassassin Filter

I got a spam today that said this (both lines were linked):

A nice lady wants to correspond with you. check her out.

Let me know and I won't write you again. Thanks

We use spamassassin filtering, and it does a pretty good job. Very little spam gets through, so when one does I like to view the headers and see how close a spam got to being diverted. This particular piece of mail had a score of 3.1 (5.0 required to divert). I noticed when I turned the headers on (pine) that the body of the message actually looks like this:

A ni<!--z[@Fi@An,@F8OFA 0,sz-->ce lady wa<!--z[@Fi@An,@F8OFA 0,sz-->nts to corr<!--z[@Fi@An,@F8OFA 0,sz-->espond with you.<!----> check her out

Le<!--z[@Fi@An,@F8OFA 0,sz-->t me kn<!--z[@Fi@An,@F8OFA 0,sz-->ow an<!--z[@Fi@An,@F8OFA 0,sz-->d I won't wri<!--z[@Fi@An,@F8OFA 0,sz-->te y<!--z[@Fi@An,@F8OFA 0,sz-->ou again.<!----> Than<!--z[@Fi@An,@F8OFA 0,sz-->ks

Maybe this has been happening for awhile, but I had never seen it. Pretty sneaky to split up all the words so a matching algorithm would fail. Of course wouldn't be that hard to add a rule that blocks messages once they reach a certain number of comments.

Posted by mike at 8:18 PM

March 15, 2003

Overnight Server Migration

Last night I worked with the folks at Rackspace to upgrade the Woodland server. I do all the programming, sysadmin, design for Woodland, a small mail-order company. I chose rackspace as our hosting company coming up on two years ago after being dissapointed with CIHost.

The migration went pretty well, Rackspace had built my machine with RedHat 6.2 and were willing to upgrade to 7.3 at no cost. I shut down my services around 9pm. They took down the machine, stuck in a new disk, installed 7.3, and then mounted up the old 6.2 disk so I could move data. They were competent and quick. I finished around 1:30 in the morning (after having fallen asleep twice).

Rackspace claims they are fanatical about their support and I have to agree. They were right on the money with this migration, had good communication and technical competence. I haven't had to interact with them that much in the past two years, the only other time was when I took a survey and marked some shortcomings they were very serious about getting in touch with me to see how they could improve.

Posted by mike at 4:43 PM

March 13, 2003

Rebooting an Airplane - Thoughts Provoked by McConnell Essay

I was pointed to a collection of essays by Steven McConnell which are on their way into the 2nd edition of his book After the Gold Rush.

I found his essay on process vs commitment oriented development interesting. Wondering exactly how my development efforts fit into his categories.

Things are extremely flexible in our department, we have a lack of process, but there really isn't any telltale signs that would peg us as commitment oriented. Of course we're not turning doing anything revolutionary here, so maybe it's OK to be using a little bit of each approach.

The primary motivater in my work is being a part of a small team (3 developers) that makes it's own decisions about technologies to use and methods to solve problems (I've been meaning to make an entry breaking down what makes my job valuable). That doesn't mean I'm up unto 3am every night hacking up something new, but it does mean that I sometimes come in early or spend the evening looking into some new technology because I'm excited about how it might tie into a project.

The intro chapter to After the Gold Rush 2nd Ed. starts with a funny story about rebooting an airplane.

Posted by mike at 2:56 PM

US Propoganda dropped on Iraq

Flipping through the BBC's collection of US propoganda being dropped on Iraq, primarily targeted at soldiers.

I feel sick when I think about our country and how it bullies other people around.

Posted by mike at 8:40 AM

Centrino Missing the Mark

Today I'm was reading this writeup on this week's unveiling of mobile processors, including Intel, AMD and Transmeta.

Intel has decided to include wireless capabilities in it's Centrino chipset. Not sure they've really thought this whole thing through.

"There's a groundswell of desire" for wireless networking, said Intel CEO Craig Barrett.

Intel's wireless chip uses the 802.11b standard, which transmits and receives data at 11 megabits per second. Rival chip-making firms offer alternatives that are compatible with the faster 802.11a and g standards, which offer 54 megabit-per-second speed.

What I don't understand is why they chose 802.11b. The chips aren't even available yet and the technology is outdated. There are plenty of manufacturers on the 802.11g bandwagon, Apple, Linksys and DLink are offering access points and nic cards using the standard.

Then again, maybe this will drive users who want the current standard to look to something other than the Centrino chipset.

Posted by mike at 7:59 AM

March 9, 2003

Top Ranking O'Reilly Hate Site

I was looking at one of my previous entries about O'Reilly books and decided to run the google search again. Now if you search for "oreilly books suck", "oreilly books bad", or "hate oreilly books" my weblog entry is the first or second match. I didn't really intend to become the foremost expert on dislike of OReilly books (and I suppose this entry won't help things). Proof that there really isn't much out there in the way of dislike for the books.

Posted by mike at 8:54 PM

March 7, 2003

The Ideal: Unlimited Bandwidth and a Captive Audience

I was reading Pete's entry about bandwidth, caused me to reflect on two things I appreciate about where I work.

1. We rarely face bandwidth issues. The users of our service at Tufts are med students, often spending the entire day on campus and using our site in a classroom or computer lab (which are all connected by either a 100Mbt or gigabit line to our servers). In addition, a high percentage of them have good connections (DSL, cable modem) in their apartments. We occasionally get a dial-up user who has issues with downloading a large PDF filled with high-res images, but 85% of what we serve up is HTML so even on a slower connection most of the information is available.

2. We have a captive audience. Professors put up their images, video clips, PDFs, quizzes, evaluations syllabi and the students have no alternative but to come to us to get the information they need. Promotion is not something I enjoy, and it's nice to know that we're not at the whims of search engine rankings.

Maybe I'm being narrow sighted, one could probably argue that we are better off paying attention to both issues, bandwidth and marketing, even though they might different from the next guy's internet service.

Posted by mike at 9:25 AM

March 6, 2003

O'Reilly Books Don't Suck

I have a friend who, when trying to find a technical book about something, seems to avoid O'Reilly books, buying them when there is no other alternative.

So I got to wondering, am I not in the know about some audience who dislikes O'Reilly? When looking for a technical book my default is to start with O'Reilly. I typically review the chapter titles or sections to see if it addresses the issues I need before purchase. I often don't look elsewhere unless O'Reilly doesn't have a book on the subject.

I did a light search this morning to see if I could find anything to support my friend's trend to avoid O'Reilly. I did these searches:

Glad that I wasn't missing some boat. I really do like O'Reilly books and wouldn't have changed anyway, but it still has me puzzled.

Posted by mike at 8:43 AM

March 5, 2003

OSCON Presentation Accepted

Got this notification today:

You have been accepted as a presenter at the O'Reilly Open Source Software Convention 2003

To be honest, I'm suprised. At the time I submitted the proposal, the proposal was the end not the means to an end. I hadn't actually thought much about what it would mean if I actually had to present. I've done a fair amount of public speaking, but not much to technical audiences. This will be a good challenge for me.

Title of presentation: Transforming XML for Web and Print
Length: 45 minutes

I'm also excited. In all the conferences out there I find that OSCON pulls together the most relevant people, tutorials and presentations to what I'm working on. Now that I know I'm going I'm looking forward to seeing what tutorials will be offered and details on the tracks.

Posted by mike at 3:07 PM

March 4, 2003

Cancelling DirecTV

We got a notice the other day saying that AT&T was raising it's internet rates for customers who didn't subscribe to their cable TV service. We have DirecTV, but hadn't watched it much in months so was already thinking about cancelling.

So I call DirecTV . . .

I told the truth to the first person, I was cancelling because I'm going to get cheaper TV through AT&T. She listens and then says "OK, I'm going to transfer you to the customer retention department." Why you would mention the name of that department is beyond me, rather than battle with the retention people I hung up and called back.

I got a new rep and I lied saying that I couldn't afford the service anymore. He was very sympathetic and said no problem. Apparently they still have to run you through the ringer so I got transferred. He was smart enought to know not to name the department I was being transferred to for cancellation.

The woman who was to process my cancellation had every offer imaginable. She first offered a cheaper service, then offered to credit my account for two months worth of free service. I continued to refuse. She pulled out the big guns and started talking about slashing my service fees in half for a year while we got back on our feet (they are normally $37). As a last ditch effort she offered free premium channels for 6 months (didn't seem as impressive as her second to last offer). Finally she caved and agreed to cancel the account.

I should have called DirecTV awhile ago and gave them that story and then "caved in" to get reduced rates.

Posted by mike at 10:00 PM

Using Laptop Fulltime

I've recently become fairly frustrated with the number of computers I use in the course of a week and the extra work it takes to move between machines. I've got three under my desk at the office, four at home, and a dozen or so servers around the US that I'm on at some point during the week. I got fed up with having to maintain so many machines so I've decided to try two things:

  1. Use my laptop exclusively for a desktop. I've been trying this for two weeks now, using a monitor and keyboard when at my desk. Here's what I was hoping to accomplish.
    • Minimize redundancy in downloading, installing, transferring files etc

    • Avoid unreachable relevant information. I can't count the times I've been at the datacenter with my laptop and realized the stuff I really want is at home or work (takes too much work to get laptop online).

  2. Develop a standard for filing/organizing source code, packages, documents, applications, notes, music, photos etc. With this I'm hoping:
    • Familiar tree navigation on any machine

    • Backup or move all files with assurance nothing is left behind

    • Cross-machine comparisons easier

I'm sure as I continue this the list will grow or shrink. The one con:

Posted by mike at 5:19 PM