January 28, 2003

XSL-FO & FOP: trouble with special characters

I've been digging into XSL-FO for the past few days.

About two years ago we started a massive migration, moving our collection of HTML documents to HSCML (Health Sciences Curricular Markup Language). For the most part these documents were print documents converted to HTML for use in medical, dental and veterinary courses. We urged people to move to XML primarily so their documents would have meaningful structure, but knew there were many advantages to having documents in XML.

One advantage is XSL-FO, providing a way for any XML document in our database to be rendered as PDF (the users were excited about this).

My battle this past few days was with special characters. For the most part users are creating the XML on PCs. We've provided a unicode font with a fairly complete set of glyphs, which are stored as named entities in the XML. The problem I came across is when trying to transform the XML to XSL-FO and running it through FOP on Solaris the special characters weren't rendering (the PDF shows a # for problem characters).

I pinpointed two issues:
1) The entities mappings we were using in the DTD were for PCs, I created a new set of entity files which mapped the named entities to numbered, wrapping them in a tag which resulted in FOP using a different font for the entities.
2) The fonts on Solaris (some included with FOP) weren't as comprehensive as the unicode font we made available for our users. After searching high and low for a complete unicode font for solaris I took a shot in the dark and copied the TrueType font from a PC to our server. The commands to import the font into the FO processor returned no complaints. I haven't done a ton with fonts, but that suprised me.

Researching glyph problems: 5 hrs
Building new entity files: 1 hr
Converting the font: 30 min
Installing font and entity files: 30 min
Rerendering PDFs again and again: 2 hrs
Displaying 200+ glyphs in PDF: priceless

