January 16, 2004

XML - Cleaning up Special Characters

I've written about using special characters in FOP transformations before, but haven't delved into the continuous headache that special characters pose in our XML authoring and web-rendering environment.

I'm setting out to solve this problem, starting today (although it's 5pm on Friday of a three-day weekend so it will really be on Tuesday). The gist is that we get input from many places, very few of which make sure the XML is well-formed. In attempting to transform the XML we end up with a parser error, caused when our XML parser cannot get past the invalid characters.

Having XML in a bad form has caused us much headache, we continually are cleaning up data *after* someone has reported a 500 error. The better solution is obviously have a library somewhere that cleans up stuff.

I'm starting with an exploration of existing documents on the subject:

A lot of these documents are from more than a year ago, guess we're a little behind. We started storing our documents in XML back in 2001, but after the initial work to get the authoring environment and rendering done we've been focusing on other things.

Posted by mike at January 16, 2004 6:04 PM