Parsing Expression Grammars: A Recognition-Based Syntactic Foundation

Bryan Ford

To appear at Principles of Programming Languages (POPL04), Venice, Italy, January 14-16, 2004


For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of languages and protocols that we design. The power of generative grammars to express ambiguity is crucial to their original purpose of modelling natural languages, but this very power makes it unnecessarily difficult both to express and to parse machine-oriented languages using CFGs. Parsing Expression Grammars (PEGs) provide an alternate, recognition-based formal foundation for describing machine-oriented syntax, which solves the ambiguity problem by not introducing ambiguity in the first place. PEGs address frequently felt expressiveness limitations of CFGs and REs, simplifying syntax definitions and making it unnecessary to separate their lexical and hierarchical components. A linear-time parser can be built for any PEG, avoiding both the complexity and fickleness of LR parsers and the inefficiency of generalized CFG parsing. While PEGs provide a rich set of operators for constructing parsing expressions and grammars, they are reducible to either of two minimal recognition schemas developed around 1970, TS/TDPL and gTS/GTDPL, each of which allows only two rule forms and two constants. By extension we prove these two existing schemas equivalent in power, a previously unknown and unexpected result.

Server START Conference Manager
Update Time 19 Sep 2003 at 17:40:43
Start Conference Manager
Conference Systems