« RCA a Little Off on iPod Comparison | Main | XML - Cleaning up Special Characters »

January 14, 2004

Using IO::Handle & IPC::Open2

Solved a small problem today that got me up to speed on a few things I haven't looked at in awhile, namely Tidy, IO::Handle and IPC::Open2.

We clean up HTML with Tidy before attempting to convert it to our flavor of XML. Back when we started using Tidy (2000), it was only available as a command-line script. That has since been changed and you can now get it in library form. Tidy does an excellent job of creating good HTML from bad.

We wrote a perl module, HTML::Tidy (looks like someone else has since had the same idea). The module is simply a convenient central point for cleaning up HTML files or strings.

The puzzle today was that the library seemed to no longer work, the return was coming back empty. On further inspection I found a section of code like this was causing the problem:

my ($read_fh, $write_fh) = (IO::Handle->new(), IO::Handle->new());
open2($read_fh, $write_fh, $tidy_command);
$write_fh->print($untidy_html);
my $tidy_html = <$read_fh>;
$write_fh->close();
$read_fh->close();
The Perl Cookbook example is very similar, and didn't seem to work. A little poking around and playing with the code and I discovered that nothing gets into the read filehandle until the write filehandle gets an end-of-file signal. A very slight change and we're back in business:
my ($read_fh, $write_fh) = (IO::Handle->new(), IO::Handle->new());
open2($read_fh, $write_fh, $tidy_command);
$write_fh->print($untidy_html);
$write_fh->close();
my $tidy_html = <$read_fh>;
$read_fh->close();

This was one of those "how did this ever work" situations. The calling script hadn't been used in many months, maybe even a year. In that time it's quite possible that everything (Tidy, IO::Handle and IPC::Open2) had been updated and the HTML::Tidy broke on some interface issue with one or more of them.

Posted by mike at January 14, 2004 6:10 PM