« Battling with Complex Regular Expression Problem | Main | There's No Place Like Fenway Park »

August 29, 2006

Regular Expression Problem Solved

Just this morning I posted a description of a problem I was trying to solve using regular expressions. Writing it out and then taking a break was just what I needed. I got some inspiration shortly thereafter and have a much better solution than I was headed toward.

As a refresher, I was trying to use a template like this:

$template = "Approval: {{user}} approved the {{object}} for {{project}} on {{date}}";

to convert a message like this:

$sentence = "Approval: Jim Johnson approved the invoice for new computers on 01/01/2006";

Into this (text marked up to be translated, but keywords preserved):

Approval: {{Jim Johnson}} approved the {{invoice}} for {{new computers}} on {{01/01/2006}}

I had a fairly long chunk of code that was attempting to split up the template and then apply it to the sentence when it occurred to me that in all cases the keywords will either be surrounded by two regular language words or be at the end of a sentence prefixed by a regular english word. Rather than attempting to parse and understand the template and then apply it to the sentence I got the idea to use a global regex that just looks at a preword, the keyword, and a postword.

So here it is, using the variables as defined above:

## loop through template text, grab a preword, the keyword, and a postword (or end of line)
while ($template =~ /(\S+\ ){{(.+?)}}(\ \w+|$)/g) {
    # move the position back one word in case there's only one word separating keywords
    pos($template) = $-[3];
    # replace the preword, keyword, and postword with themselves, put keyword in brackets
    $sentence =~ s/($1)(.+)($3)/$1\{\{$2\}\}$3/;

Hopefully the documentation is good enough to help it make sense. With a regex you can cram a ton of functionality into an extremely cryptic, compact few lines. It feels great having come up with a concise solution to the problem, but I'm sure it will take some work to get my mind back around it down the road.

Posted by mike at August 29, 2006 12:45 PM