Friday, 21 November 2008

Power of JFlex and CUP in Combination: LALR PArsers for Java

As a Java open source junkie you will have almost certainly have heard of JFlex and CUP but not really delved deeply into it, right? Here's a rapid sommaire of the two technologies.
  • JFlex (a rewrite of JLex) is known as the "fast scanner generator for Java".
  • CUP (no, not Cambridge University Press, but "Construction of Useful Parsers") is an LALR parser generator for Java. Rusty on ye olde Dragon Book basics? To remind you, LALR is lookahead LR parsing. LR parsing means the parser reads the input Left to right, and produces a Right-most derivation (LR is a general form of shift-reduce parsing, starting from the leaves of a tree and working up to the root - in computer science trees grow downwards from the root!). When referring to LALR(k) the k is the lookahead value. LALR is quite popular since it results in smaller parsing tables than canonical LR tables (Dragon Book Ch4, Syntax Analysis).
CUP user's manual states it is very similar to YACC, only it is written in Java, uses specifications with embedded Java and produces Java-implemented parsers. Think of CUP as YACC with a Java jacket and Java insoles. YACC was created by Stephen Johnson of AT&T, an article by the inventor can be found on the dinosaur.compilertools.net webpage. Parser generators like YACC are described in 4.9 of Dragon Book. Worth reading just as background to using CUP. YACC stands for "Yet Another Compiler Compiler" reflecting the parser generator mania of the 1970s.

Classgen is a related technology from the Technical University of Munich. It generates classes in the Visitor style. This page gives a good idea of what specifications look like in Classgen and what it can do. The whole area of creating computer programs based on computer specifications is extremely fascinating. Let the computers do all the hard programming and let humans customize the color schemes of the resulting software.

No comments: