Sunday, November 15, 2009

Choosing a Parser

Here's the problem. I need a parser for a python-like language, but written in Irken. Of course there are no parser generators for Irken, yet.

My first sneaky plan was to generate a set of parsing tables, and then implement the parser engine in Irken. [This is similar to my approach with the lexer - the lexer tables are generated by Python code, and emit Irken code]. I like dparser, but after looking at the parsing engine, it looked a bit intense for my purposes. I then started playing with Earley parsers. You can write an Earley parser in an amazingly small amount of code.

But Earley parsers aren't really table-driven, and it sounds like performance will be an issue.

Then I discovered Jason Evans' "Parsing.py" module. It's perfect. It generates a bog standard set of LR(1) tables. However, there was still a big gap between Python/Grammar/Grammar and the input needed by Parsing.py (which consists of a class for each terminal and non-terminal, and a method for each reduction - annotated with docstring-based reduction rules).

To fill that gap, I needed two things:
  1. A parser for the python meta-grammar syntax.
  2. A translator from that grammar into a Parsing.py parser.
You'll find both in my python/parsing subdirectory, enjoy!

[Thanks to John Plevyak for assistance with dparser]

1 comment: