Tuesday, March 12, 2013

C backend re-architected to SSA form.

Well, one year later, I've managed to back into the CPS output via a completely different direction.  The rewrite began as an experiment, triggered by the completely unpredictable behavior of new versions of clang when compiling Irken's output.  [I also found a gcc bug.]  I thought I'd try to generate 'real' CPS - i.e., separate C functions that only make tail calls.  This turned out to be much easier than my attempts last year starting from the other end.

After only about 3-4 days of work, we now have output like this:


(let loop ((n 1000000))
  (if (zero? n)
      "done"
      (loop (- n 1))))



static void FUN_loop_17_0 (void) {
  O r10 = ((object*) lenv) [2];
  if PXLL_IS_TRUE(PXLL_TEST(unbox(r10)==0)) {
    O r12 = (object*) &constructed_0;
    PXLL_RETURN(12);
  } else {
    O r16 = (object *) 3;
    O r17 = ((object*) lenv) [2];
    O r15 = box((pxll_int)unbox(r17)-unbox(r16));
    lenv[2] = r15;
    FUN_loop_17_0();
  }
}
A few things to note: the loop is achieved with a tail call in C.  This requires that the C compiler perform tail call optimization or the stack will explode rather quickly.

The 'registers' are now emitted as local variables, with a fresh name for every line. This output is very, very close to the SSA form needed for LLVM itself, and is practically begging for an LLVM backend.

There is no longer any dependency on C extensions - the address-of-label extension was never a happy citizen in LLVM land, anyway - and the lexical functions extension (needed by Irken circa 3 years ago) is a distant memory.

The output of the compiler runs about the same speed, but the compilation time itself is much more predictable.  clang -O3 takes about 30 seconds, and gcc -O3 around a minute.  gcc actually seems to do a better job.  I need to play around with clang and gcc's options in order to figure out the best flags for fast edit-compile-link turnaround - I'm missing the 4-second compiles I used to have with clang and the old compiler.

Other great possibilities include separate compilation, JIT, and (possibly) closure conversion / passing args in registers.  An LLVM backend might open up the possibility of efficient floating-point support.

However, the next step is probably to write a real garbage collector.  I'm seriously considering using Irken for some real server work soon, and to survive in that world a two-space collector just isn't going to cut it.  I've read over the "Beltway" paper several times and I think I grok it well enough to implement it.

Tuesday, March 6, 2012

LLVM, CPS

I've done quite a bit of work these past two months on writing an LLVM backend.
Rather than directly translate the current design (one giant function), I'm trying to map function to function, and use llvm's support for tail calls to get arguments-in-registers to work.

This requires a 'real' CPS transform, and one that's up near the front of the chain of passes, not the last thing before the backend.

The CPS transform introduces 'continuation functions' - these are similar to normal functions, except they take a single argument.  They are targeted by SSA 'terminators' like branch, jump, etc...

A continuation function corresponds to a 'basic block' in SSA - think of the phi insn as if it took the single argument and moved it into the correct register.

If every function were a tail call, we could emit all the code as basic blocks with branches of various kinds. Sadly, some calls are not tail calls.  This requires that we render the continuation function as a real function, and that all 'jumps' to it are tail calls.

Needless to say, this is a pretty major rewrite of Irken.  I've already made three failed swipes at it, but I hope to find the time to get it working within the next few months.

Monday, February 13, 2012

pattern matching with regexps

This is just a reminder to myself to think about extending pattern matching to regular expressions on strings.  It'd be a great feature, though with the famous caveat about regular expressions.  [I have a problem.  I think I can solve it with regular expressions.  Now I have two problems.]



(define sample
  "(?P<x>[0-9]+)+(?P<y>[0-9]+)" -> (+ (string->int x) (string->int y))
  "[^x]+(?P<nx>x+)[^x]+" -> (string-length nx)
  )


I still remember looking through the ejabberd source code, trying to find their xml parser.  When it finally dawned on me that it was automagically hidden in the pattern matching, I was impressed.

Saturday, January 21, 2012

Some notes on continuation-passing style

I've been going over the CPS transform lately, and wrote some notes on how to use the transform to compile to C, stackless style.

dark.nightmare.com/rushing/factcps/

Monday, January 9, 2012

thinking about an llvm backend, again

I've been very happy with my other project that uses LLVM as the backend for a compiler.
I've also learned a lot about LLVM's IR design.

I can't help but think that it might only take a week or two to write a back end for Irken.
However, there are a few issues.

  1. My CPS output assumes that each insn has a result that is put into a register.  This doesn't fit LLVM's model, which assumes that arguments to insns can be either an immediate or a register.  In fact, they do not allow you to put an immediate into a register.
  2. LLVM doesn't like gcc's goto-label feature.   I think they implemented it reluctantly, because it doesn't fit the SSA model very well.  The majority of funcalls by Irken are to known functions, which translate to a branch to a known label.  However, whenever higher-level functions are called, this leads to something like "goto *r0;" in C, which turns into an indirectbr insn in LLVM.  It implements this using a gigantic phi/indirectbr table at the very end of the function.  Maybe this isn't really a problem - it just looks bad.
  3. The extremely useful %cexp primitive would go away!  There's something to be said for being able to refer to OS constants, OS functions, macros, etc... with such a simple mechanism.  I'd probably just have to let it go, and force users to write stubs.
I think #1 can be dealt with by adding an extra pass that translates between the two representations.  (maybe I should revisit the CPS==SSA paper?)
Another approach might be to use an annoying alloca/store/load sequence and hope that mem2reg does a good job?

Monday, January 2, 2012

Irken status, LLVM bitcode

This is just a quick note that the Irken project has not died, only gone into a temporary hibernation.
When in the course of history it becomes necessary to earn a living...

Anyway, I have some tangentially related comments about LLVM.
My current employer is interested in maybe doing a little LLVM JIT work, wherein my experience with compilers may prove useful.  A nice side effect for me is that I finally get to play around a bit with LLVM, something which I carefully avoided while working on Irken.

My first reaction was to pick up llvm-py, which looked like a fairly complete interface to the various libraries that make up the llvm system.

Big Mistake.  It doesn't build any more.  It's only a little over a year old.  The last release of llvm-py was for LLVM 2.8rc2.  I am now running 3.0.

Here's the problem: the LLVM API's are constantly changing.  And that's a good thing, really.  But since those C++ API's are the only well-supported interface to the system, it's a moving target.

There are three main options for a compiler writer who wants to target LLVM.  The first is to use the C++ API.  (or you could use the completely undocumented and incomplete C interface).

The second option is to write llvm assembly.  This is not a bad option, but will be slower because of the need to parse the assembly.

The third is to write llvm 'bitcode'.  I thought this was probably the right approach.  Bitcode is a binary representation of the LLVM IR language, it seems that it would be likely to change much more slowly than the library interfaces.

The problem is that the person who designed the bitcode file format was on meth at the time.  This has got to be one of the most evil formats I've ever seen.  I think they were trying to make the resulting file as small as possible, but at the expense of counting every bit.  (perhaps an early LLVM project flew on a space probe?)  Just to give you an idea, 3 and 4-bit integers (not aligned on any boundary) are a common field type.  And not just that, but some of the fields are variably sized, starting at 2 bits and going up.  Symbols are represented using 6-bit characters.  Also, the concept of 'bitstream' is taken very literally, the bits in a file are viewed as if it were one large integer, starting from the LSB.  This is actually an elegant approach, but it is completely different from every other file format or network protocol.  I had to continually resist using common parsing patterns with it.   There's also a compression-like method of 'abbreviating' common record types.  And after all that work to save bits, at the end of every block the bit stream is aligned to 32 bits!

Ok, got that off my chest.

My sense is that nobody uses the bitcode format because of this (the only other implementation I could find was in Haskell, and was also impenetrable because Everything Is A Monad, You Know).  Most people just succumb to the pressure to write against the C++ API, and will then spend the rest of their lives reading the llvm-dev list to keep track of the changes.

I have mostly decoded the format, and I think I could actually output it for a compiler.
But I think I'll split the difference and start out by outputting llvm assembly to start with.  If the day comes when I need to shave some of the 10MB off the resulting .so file, and maybe speed things up, I'll revisit the bitcode issue.

In the meanwhile I've had a whack at using Cython's C++ support to make a minimal interface to LLVM where I can feed either llvm assembly or bitcode to the JIT and run it.

Monday, May 23, 2011

git, github, clang

I've imported my svn history into git, and published the whole thing on github:

https://github.com/samrushing/irken-compiler/

Still trying to grok git and github, there'll probably be a stall for a week or so.

In the meanwhile, I have interesting news:  I was able to take out all the 'nested functions' which relied on a gcc extension.  I resisted this for a long time because early testing had shown that declaring the 'registers' as local register variables in the main function was a huge win.

After a quick test the other day, I found that moving everything out to global variables has no visible impact on performance.  I don't know if this is because of improvements in gcc & llvm, or if I just wasn't trying a large enough program.

That means I'm now able to compile the output with clang, rather than relying on llvm+dragonegg, which can be much more annoying to build, install, and use.

The good news continues: although the performance of the generated code is identical (maybe a tiny bit faster), the compilation time has been cut quite significantly.  On my machine an LLVM -O2 compile dropped from 18m to 8m.  The -O0 compile time is even sweeter, about 3s, which is actually faster than Irken itself, which hovers right around 5s to self-compile.  So a roughly 10s edit-compile cycle will definitely speed the development of Irken up.

Now the bad news.  The no-nested-funs version causes 'as' on snow leopard to take a lunch break.  It actually nails the CPU for 10 or 15 minutes before finally emerging as if nothing was wrong.  No problems on FreeBSD 7.3 with gcc-4.5.3, though.  This could be a significant annoyance, since I expect many people trying out Irken would prefer to do so with a stock compiler.

More bad news:  clang-2.9 crashes when fed Irken's self-compiled output.  I was going to file a bug against it, but when I checked out the svn version and built it the problem was gone.  Sigh.

This is why I'm trying to figure out how to make branches on github, since it'd be nice to maintain both versions for a while until these issues are ironed out.  I have made a local branch called 'no-nested-funs', I just haven't figured out how to publish it on github.