« Bringing a friend to freedom, then shot by whom? | Main | Sakai 1.5 Released »

March 8, 2005

Upgrade Disks in 3 Servers

I'm sitting in the Tufts University data center, waiting for a ufsdump | ufsrestore to complete in order to finish off a day dedicated to hardware.

For a long time we've been able to get by with 36G drives, primarily because the data is spread across a few classes of servers. A while back, one of the webservers (SunFire v210) started triggering nagios alerts that it was out of disk space. We'd dealt with it (in a temporary kind of way) by rotating out large apache logs and cleaning out cached data. On Monday we were to the point that after all we could do the 36G disks were at 94%.

I'd ordered new disks a few weeks back, and they've been sitting at my desk for a few days now so I scheduled today to spend with the servers and get the disks upgraded.

We run Solaris DiskSuite for disk mirroring on all of our machines, which is great for keeping machines with failed disks up, but adds an extra level to replacing disks.

Development /data Disks
First was our development server /data disks. I started there first, because it didn't involve touching the boot disks (which adds an additional level of complexity and stress). Let's see if I can get all the steps into once sentence. I broke the disk mirror, took one of the disks out (c0t3d0), put the larger one in, formatted and newfs'd the new disk, fsdumped the data from the old disk to the new, unmounted the old, mounted up the new disk at the /data mount point, pulled out the other small disk (c0t2d0), stuck in the second larger disk and set up the mirror.

Yea, that's not a very good sentence.

Replace Disk in Unbuilt Server
By far, the easiest server to upgrade tonight is another v210 that's in another data center. It's part of our disaster recovery plan, and hasn't been jumpstarted yet. That means that there's no data on the boot disks yet and the new disks can be swaped out before the OS is installed on the machine.

To make that even better, I don't even have to go over to the other data center because an operator here is willing to take the disks up and do the swap (since there is no data migration).

Webserver Boot Disks
The last machine to swap out disks on was one of our 1U v210 servers, which has only one pair of disks that are the boot as well as data disks.

The pain here is that you have to deal with DiskSuite's medadb, there are not just one mount point but 4 (/, /usr, /var, /local, /data), and every change you make to /etc/vfstab could kill your chances of getting the machine to come back.

For instance, the meta-devices (/dev/md/dsk/d1, d3, d4) supposedly coorespond to the disk partitions (/dev/dsk/c0t0d0s1, s3, s4). After clearing out the metadb I made the chances to /etc/vfstab to make sure it was pointing at the disk, not the meta information. However, upon boot things were all screwy until I realized that /dev/md/dsk/d2, which is /var, is actually a mirror of /dev/dsk/c0tXd0s4. So, upon reboot the system was very confused until I got back to editing the mountpoints and getting them put in the right place.

There is something good to say about this, at least the server is a part of a load-balanced pool and having the machine down doesn't mean a ton of extra time pressure because folks are waiting on it or there is a closing window.

In fact, although this seems to be going pretty well, it's a nice thought that if something gets completely screwed up I can still go home and sleep and tackle the problem tomorrow.

The wait for fsdump is over, the metadb stuff went off without a hitch and I've been able to boot just fine from the new disk. Going home while the disk mirror's continue to sync up, will try one more reboot in the morning before it's called good.

Did I mention I'm not a systems administrator?

Posted by mike at March 8, 2005 8:47 PM