Saturday, November 21, 2009

Google's book search

Google's book search just came in very handy... I came across a phrase in ATTPL using the term 'dominated', and I wasn't completely sure of its definition. And of course, I couldn't remember where in 100 pages it was defined.

Fine, I'll search the Postscript version of the chapter that I have on hand. Wait, no. That doesn't work. PS != PDF. Sigh.

So I typed in a unique phrase from the book: that found the book on Google Books. Then I simply searched for the term. Wow, it immediately told me exactly what I wanted to know.

Wednesday, November 18, 2009

Typing Twice

I was having trouble propagating type information through the inlining pass. The source nodes have types attached to them, and you can carefully copy it when instantiating each function body... but in some cases the inlined types should be more accurate than they are.

Easy solution, run the type solver on the node tree again!

This should also help find bugs in the inliner... if something passes muster the first pass, it should definitely go through after inlining.

Monday, November 16, 2009

The Resurrected Lexer

I have resurrected the lexer, after reworking lots of supportive library code, including vectors, symbols, the symbol table, lists, etc...

The new variant datatypes seem very comfortable to me. If you'd like to see the lexer working, compile and execute tests/t20.scm:

[rushing@dark ~/src/irken]$ tests/t20
{u8 newline ""}
{u8 keyword "def"}
{u8 whitespace " "}
{u8 ident "thing"}
{u8 whitespace " "}
{u8 lparen ""}
{u8 ident "x"}
{u8 rparen ""}
{u8 colon ""}
{u8 newline ""}
{u8 whitespace " "}
{u8 ident "return"}
{u8 whitespace " "}
{u8 ident "x"}
{u8 whitespace " "}
{u8 addop "+"}
{u8 whitespace " "}
{u8 number "5"}
{u8 newline ""}
#t
{total ticks: 28843224 gc ticks: 0}


Note that "{u8 ...}" is how the printer outputs the (:token kind value) constructor. Each variant constructor is assigned a unique integer tag. You'll see the same (somewhat confusing) output when playing with lists - I need to either pre-allocate (and thus reserve) certain constructor names - or quit using the builtin printer code (which is in C). I'm not sure that it's practical to teach the C printer how to print out variants, since the user can use them in different ways for different purposes.

Sunday, November 15, 2009

Choosing a Parser

Here's the problem. I need a parser for a python-like language, but written in Irken. Of course there are no parser generators for Irken, yet.

My first sneaky plan was to generate a set of parsing tables, and then implement the parser engine in Irken. [This is similar to my approach with the lexer - the lexer tables are generated by Python code, and emit Irken code]. I like dparser, but after looking at the parsing engine, it looked a bit intense for my purposes. I then started playing with Earley parsers. You can write an Earley parser in an amazingly small amount of code.

But Earley parsers aren't really table-driven, and it sounds like performance will be an issue.

Then I discovered Jason Evans' "Parsing.py" module. It's perfect. It generates a bog standard set of LR(1) tables. However, there was still a big gap between Python/Grammar/Grammar and the input needed by Parsing.py (which consists of a class for each terminal and non-terminal, and a method for each reduction - annotated with docstring-based reduction rules).

To fill that gap, I needed two things:
  1. A parser for the python meta-grammar syntax.
  2. A translator from that grammar into a Parsing.py parser.
You'll find both in my python/parsing subdirectory, enjoy!

[Thanks to John Plevyak for assistance with dparser]