Main | February 2003 »

January 30, 2003

Statistics on the Nation's Health

I stumbled into the National Center for Health Statistics site today after having a speculative conversation about suicide rates. I find these stats really interesting to peruse. The fast stats makes it easy to browse through various types of health statistics.

To highlight a few:
- there are nearly 62 million cases of the common cold annually
- 1,665 people die from the flu each year
- 2.1 million annual visits for schizophrenic disorders
- in 1965, 42% of the population smoked, in 2000 it was down to 23%
- 4.2 million visits to the emergency room from work-related visits in 2000

Are we a healthy country? I'd speculate there are much healthier . . . but haven't stumbled into a Worldwide Center for Health Statistics site.

Posted by mike at 9:56 PM

January 28, 2003

XSL-FO & FOP: trouble with special characters

I've been digging into XSL-FO for the past few days.

About two years ago we started a massive migration, moving our collection of HTML documents to HSCML (Health Sciences Curricular Markup Language). For the most part these documents were print documents converted to HTML for use in medical, dental and veterinary courses. We urged people to move to XML primarily so their documents would have meaningful structure, but knew there were many advantages to having documents in XML.

One advantage is XSL-FO, providing a way for any XML document in our database to be rendered as PDF (the users were excited about this).

My battle this past few days was with special characters. For the most part users are creating the XML on PCs. We've provided a unicode font with a fairly complete set of glyphs, which are stored as named entities in the XML. The problem I came across is when trying to transform the XML to XSL-FO and running it through FOP on Solaris the special characters weren't rendering (the PDF shows a # for problem characters).

I pinpointed two issues:
1) The entities mappings we were using in the DTD were for PCs, I created a new set of entity files which mapped the named entities to numbered, wrapping them in a tag which resulted in FOP using a different font for the entities.
2) The fonts on Solaris (some included with FOP) weren't as comprehensive as the unicode font we made available for our users. After searching high and low for a complete unicode font for solaris I took a shot in the dark and copied the TrueType font from a PC to our server. The commands to import the font into the FO processor returned no complaints. I haven't done a ton with fonts, but that suprised me.

Researching glyph problems: 5 hrs
Building new entity files: 1 hr
Converting the font: 30 min
Installing font and entity files: 30 min
Rerendering PDFs again and again: 2 hrs
Displaying 200+ glyphs in PDF: priceless

Posted by mike at 3:38 PM

January 24, 2003

XML Schemas: converting DTDs with a tool?

I keep hearing about XML schemas and the advantages they provide. Having used DTDs for awhile and knowing some of the limitations, I believe I'm excited to step into this alternate world of XML validation. I've been thinking this for a year, why haven't I moved to XML schema?

It started last year at OSCON when I attended An Introduction to XML Schemas by Eric van der Vlist. I came out more confused and intimidated than ever, and have still continued to vocally support XML schema but feared actually diong something about it.

Slashdot had a posting DTD vs XML schema. I poked through the links and found this very basic article. I guess that's where I need to start, very basic. Also referenced in the post is a tool for converting DTDs to XML schema, will definitely give that a shot. We have around 3000 definition lines across 50 DTDs. If the tool proves to accurately convert the DTDs we will save a ton of time.

Posted by mike at 7:22 AM

January 23, 2003


The Smoking Gun has put together a collection of documents sent to concert venues outlining requirements of the performing band. It appears that in the music industry this document is called a rider. Some of them are humorous:

Rolling Stones travel with their own furniture and require a Snooker table, 5 video games and a ping-pong table.

Both Ozzy Ozbourne and John Mellencamp require a doctor to be present during the show.

Creed asks venues to avoid asian, vegetarian and exotic dishes.

Moby asks for 10 pairs of white underwear?

Third Eye Blind asks for a different kind of cereal each day of the week.

And of course, Britney Spears says that if the rider isn't kept strictly confidential there may be "irreparable harm which may not be compensable in monetary damages."

Posted by mike at 9:07 AM

January 22, 2003

MySQL Recovery

Today's urgent message . . . "Um, I accidentally deleted 500 rows from db_x.table_y". Amazingly, the first time in my 2 years at Tufts. Also the first time our backup/restore protocol was exercised in production (well tested in dev).

When I first started here we did a mysqldump of modified records every hour, but it began to be noticable to the users so I redesigned our backup plan.

Our backup/restore protocol:
Each night we mysqlhotcopy the database into a directory named with date. Prior to the backup we move the binlog into the previous days dir, keeping the snapshot and all the changes for the day in one place. We keep 4 days worth of backups.

To restore, the db is stopped long enough to copy the necessary table files from that morning's snapshot into the mysql data directory. Once the database is started we use a perl script I wrote to run the logs:

hot_restore --file=<mysql log file> [--database=<database>] [--table=<table>] [--stoptime=<mysql log timestamp>]

This script outputs the relevant updates from the log file(s), allowing the user to specify a timestamp where the script will end printing statements (most likely immediately before the statement that did the damage).

Today's issues were resolved with the database being down just long enough to copy the table files over and the few seconds it took to run the day's log file against the table.

Doesn't mean I wasn't nervous through the whole thing.

Posted by mike at 10:02 PM

January 21, 2003

Favorite Open Source Project?

This has been on my mind. Last year at OSCON I attended the Internet Quiz Show. One of the questions they asked participants was to name their favorite open source project. Most of them made up some abstract comment that had little to do with open source projects (like "Jon Orwant"), but the question is interesting.

Just trying to get the scope of how many project I interact with during a given day is a challenge. Must be somewhere in the hundreds. Just booting into Linux goes through a slew.

So I've been asking myself that question, "What is my favorite open source project?" I'd have to rule out projects that currently enable my Linux environment but I'm largely unaware of. Sorry to those folks . . . from time to time I have to dig down and end up learning about some module, but for the most part it's the applications, libraries and tools that are in my toolbox, and thus on my mind.

Understanding that "the favorite" is apt to change without notice, my current favorite is XML::Twig by Michel Rodriguez. A "Perl packages that subclasses XML-Parser". Makes parsing, walking through, adding to, changing and building an XML structure extremely simple. I have become well-versed in the methods from repeated use for a wide arrangement of data migrations.

XML::Twig has been on the top for awhile (2 months) now, but I continue to ask myself the question.

Posted by mike at 10:27 PM

January 19, 2003

Typing Test

Today I wondered how well I type. Last time I checked I was around 45 WPM, but that was right out of college (1997). Found a typing test site, my score was 59 WPM with 100% accuracy. Not too shabby . . . I didn't know programming would increase my typing skills, no doubt email and IM have made bigger contibutions.

Posted by mike at 9:04 PM

January 18, 2003

Value of Pipe

I had an interesting experience yesterday. We had a number of people complain that they recieved blank email messages from our server. We run sendmail, but there are no mail accounts on our machine (and very few shell accounts), we use sendmail purely for reminders from the application (and believe we have secured scripts to do so). Our firewall blocks any inbound mail.

A little worrisome . . . mail relay? cracked?, obviously we needed to get to the bottom.

Now I'm barely a sendmail novice, had little idea where to start except to go to /var/log/syslog and see what could be determined. I also am aware that there may be methods to do the following in less commands (which some may think more noble) but I was not shooting for any awards here, just wanted to get the necessary information.

I started by grabbing sma and pulling out mail logs (I used sma's CustomLog to get address and date only). I found a spike of messages had been sent on two different days between specific periods of time, so I grepped based on date and time. Then I wanted to sort based on name so the names were all in order (needed to pipe to uniq), or in sma I could configure the date to be first and then sort based on time (to see start and stop time). After this sort I determined there were a number of admin email addresses I wanted left out so I did a grep -v to disclude any line matching a handful of strings. Once I had all the mail addresses in order (and after doing a wc to get that stat) I piped to uniq -c to get a listing of how many messages had been sent to each unique user. Of course, having this in order required another pipe to sort. I switched between a pipe to more and wc to determine number of uniq users and to report the actual mail addresses to the requesting helpdesk person.

After sharing the information with the helpdesk we determined that an admin user had triggered a reminder email from the application but failed to fill in a subject or body.

The list of pipes: sma | grep | sort | grep| uniq -c | sort | (wc or more)

Just in case you had forgotten how great piping is . . .

Posted by mike at 4:55 PM

January 16, 2003


Last night I was on my way upstairs and I pushed a door closed, when I got to the top of the stairs I looked back down and saw the door just closing. I reflected on what a wonderful thing momentum is. I gave the door a push and the momentum carried the door closed while I attended to something completely different.

It made me pause and reflect on momentum, and how far reaching it's effects are. I bike from time to time and momentum is what makes a bike ride good, put some initial energy into it and then coast along and enjoy the ride. The same is true for almost anything that moves, bowling ball, avalanche, the moon. With some initial energy the object continues on it's course by law of nature, without human intervention. Momentum, it's wonderful.

Then there's social momentum. Put a little energy into planning a community service project and stand back as individuals execute well beyond expectations. Social momentum abounds at a Red Cross blood donation centers after a local or national crisis.

And what about mental momentum? Those incredible hours or days when you get in a groove and each thing builds to the next. Hours fly by and you are accomplishing so much without thought.

The list goes on . . .

Momentum, I'll take a cup of that please.

Posted by mike at 11:38 PM

January 15, 2003

Replacing Bad Disk in Disk Mirror

For the past two years I've relied heavily on a consultant when it came to Sun hardware. Recently we had a disk go bad in our webserver and for whatever reason I felt inspired to tackle the project myself. The most intimidating piece of the puzzle was the disk mirroring. Primarily because I hadn't used Sun's DiskSuite before. In hindsite there isn't much to the DiskSuite, a little knowledge reduced all uncertainty.

I should also say that my understanding of DiskSuite is that the disks must be identical, down to the exact number of cylinders. I couldn't find an exact match for the bad disk, so I was required to install two new disks, which actually seemed to make the process easier. We use an external MultiPack to house the data (non boot) disks.

I was suprised at how clear and navigable the Sun docs were for Solstice DiskSuite. I couldn't find them with a search on the sunsolve page (which supports this recommendation), but a Google search turned them right up (and yes, they were in the sunsolve site).

Here was my process:
1. Unmount the metadevice
2. Break and remove the DiskQuite mirror (metadetach <mirror> <submirror>)
3. Alter /etc/vfstab to point at the one existing good disk
4. Shut down machine
5. Install two new disks
6. Bring up machine
7. Format new disks
6. newfs new disks
7. create new metadevice and submirrors (wait for resync)
8. mount up md
9. sudo ufsdump 0f - /dev/rdsk/c1t6d0s0 | (cd /datanew; sudo ufsrestore xf -)
10. Alter /etc/vfstab to point at metadevice
11. Shut down machine
12. Pull old disk pair (marking bad one)
13. Bring up machine

Posted by mike at 7:46 AM

January 14, 2003

Network Computing Weblog

Allow myself to introduce . . . my brother (and his weblog). He's a network engineer and visionary at the Utah Education Network(I believe somehow connected with the University of Utah). Weblog contains a nice mix of technical, social, managerial commentaries.

Posted by mike at 2:29 PM

Object Oriented PHP

Many years ago I made my entrance into the web scripting world by using PHP on a very small site. It was a typical configuration, PHP with MySQL enabled to grab information about products for sale. It didn't take long for me to realize that the pages were static enough to be pre-generated, with a token amount of PHP to track session information.

Since everything was already written in PHP, I created an admin suite that used PHP to pull from the database and write out a complete set of PHP files, which were primarily static HTML. Big gain in speed (I was running this on a server which hosted 800 other sites).

Recently the site broke into four different sub sites and I had the challenge of modifying my code to create four different sets of pages. At first this was easy, I created a site table and looped over each site, using data from the site table to plug in things like name, meta tags, path etc.

It was more of a challenge when the owner wanted to tweak specific pieces of functionality on individual sites. I needed a default template which would create PHP/HTML pages as they currently looked and behaved, but have a way for any piece of that page to be overridden.

Having experience with XML and XSLT, I was tempted to rewrite everything. But the amout of work to convert the existing output into a defined XML format and then create stylesheets for the different pages was more than I wanted to commit to (and more than the company wanted to spend).

I also looked at PHP's FastTemplate. I could have gotten some mileage out of it, but the difference between pages was significantly greater than structure of HTML (ie. pull list of products from the database and in one instance create a dropdown with a form action to the url and in another instance create a column of hrefs on a different part of the page). Some might argue that FastTemplate was the answer, it didn's seem quite right to me.

Enter Object Orient PHP. With all my current output being generated by PHP scripts, and many times via functions, I decided that OOPHP would most easily get me to where I needed to be. OOPHP seems primitive, but it had everything I needed. I created a class filled with functions that would output the PHP/HTML for each piece of my default page. Then I created a class that extended my primary class and replaced certain functions with completely different functionality and layout.

The result, a simple method for using inheritance to generate a number of pages with some similar chunks and other parts completely different.

Posted by mike at 2:02 PM

moving from stunnel to private network

The project I've been working on for the past two years at Tufts University ran on one Sun Ultra 60 when I arrived (March 2001). In January 2002 we moved to a multi-server configuration. We split the functionality of one machine into three; web server, streaming server and db server. We run Apache (with mod_ssl and mod_perl) on a U60, stream RealVideo and FlashPix from a second U60, and run MySQL on an E250.

Doing this added some complexity, as data for the webserver needed to be pulled across the network from the db server. We were informed by our noc that our datacenter wasn't on a private subnet and that all traffic on the network was public. We tried both tunnelling through ssh and running a stunnel daemon and decided to go with stunnel. Over the course of the year we cringed from time to time when we thought of how much overhead we were using to encrypt all that data (we store almost everyything in MySQL, including images).

During the fall we secured a Cisco Catalyst 2950 and set it aside waiting for a rainy day to install it. Last week, when our load increased to back-breaking levels we decided it was time to put in the private subnet.

I worked all night on this. I started by installing the network cards. I had a Quad Fast Ethernet card I battled with for a while and ended up pulling it and using an SunSwift card. After the hardware was recognized it was as simple as adding an /etc/hostname.hme1 to each of the machines and adding an entry in /etc/hosts with the ip address and hostname. We didn't want to run an internal DNS server so we assigned the machines 10.0.0.x addresses and will have to live with using IP addresses in our MySQL connections from the webserver.

When I had the three machines up and running with the private network installed I discovered that the SunSwift cards were misnegotiating to 100Mbts *half-duplex*, and that running traffic over the internal network was slower than using stunnel. It was 4am and we desparately needed something working soon. As a temporary solution, while I determined the proper way to configure the second nic card to force full-duplex, I stuck a crossover cable between the webserver and db machine and found that traffic was moving at twice the speed as it did through stunnel over the public network.

With the pressure off for a bit I'm able to focus a little attention on configuring the switch (which we were doing injustice to by using as it was configured out of the box) and the hme1 interfaces to force the correct speed and duplex.

Story developing . . .

Posted by mike at 12:04 PM

January 13, 2003

tweaking mod_perl

This past week our Solaris server slowed to a crawl. We're running Apache 1.3.27 with mod_perl and mod_ssl compiled into the binary. Usage always peaks at the start of a semester here at Tufts University, but this was abnormal.

While scouring our mod_perl libraries and making some tweaks we found an interesting problem in's $Apache::SizeLimit::MAX_PROCESS_SIZE. We had this set to 29M, which was adequate for our Apache processes a year ago. In the past year we had transferred much of our application to a XML base, being transformed into PDF, SVG, and HTML with XSLT and FOP.

To accomodate the expansion, our Apache processes were starting up at around 42M. Essentially after one request, the Apache process was being recycled, a costly measure. We increased the process size to 50M and immediately began seeing increased speeds by twofold.

Posted by mike at 5:31 PM

Here Goes

My first weblog. Where will this go? I am a web designer who got snagged by a deeper, and more fulfilling path as a technology jack-of-all-trades. Back in the web-crazed 90s, it was an easier switch. My tools then contained Dreamweaver, Photoshop and an ftp client. I gradually shifted my tools from static html to scripted languages just one step above HTML like ColdFusion, PHP and Perl. I've kept some of that in the toolbox, but have expanded into a skillset that includes managing firewalls, setting up secure tunnels, pinpointing memory and cpu usage issues, replacing disks in RAID configurations, designing databases and setting up console servers.

My origins aren't as noble as the CS major who worked 72 hours straight to finish the final project. I spent one all-nighter doing the layout on a brochure for a local company, and building a complimentary site. Yet I have something to offer . . .

I have come a long way since those days, almost 9 years of experience with design and usability, 5 years of scripting and database design, and 3 years of combining web-services programming with Linux/Unix system administration. It's been a rewarding road.

This weblog serves as a place to comment on learning experiences as an IT professional who is pulling together a foundation which is being built dynamically as time requires.

Posted by mike at 5:16 PM