[om-list] pattern conversions

Mark Butler butlerm at middle.net
Sat Sep 1 12:17:41 EDT 2001


Tom & Chris,

  Avoiding the overhead of multiple formats is a large part of the movement to
standardize on XML.  XML is a rather wasteful format, but the principle is a
good one.

I assume that you have to deal with several formats from outside sources that
you do not have control over. The general problem of format translation goes
as follows:

1. Use source grammar to tokenize and parse input
   into abstract syntax tree
2. Use tree transformation specification to transform
   input tree into output tree
3. Use destination grammar to change output AST back into
   output format

The first and third steps are relatively easy to automate for any format
based on a regular grammar, which most are.  The second step requires
implementing a general purpose programming language to be sufficiently
flexible, which is where the difficulty lies.

Most tools in this class transform input grammars into parser source code that
must be compiled.  Problem two is solved by letting the user write
transformation routines in a conventional programming language.

Of course, if you want to do this all at run time, you probably need some sort
of fully interpreted language for step two.  The only serious problem with
such a strategy is run time performance compared to a compiled translator.

The best reference that I know of for "traditional" parser generators is the
website for a tool called ANTLR at http://www.antlr.org.  The inventor of
ANTLR (Terence Parr) almost single handledly changed the conventional wisdom
about the optimal strategy to use parser designs from LALR(1) (used in yacc)
back to LL(k) (used in recursive descent parsers).

- Mark




More information about the om-list mailing list