Tuesday, March 12, 2013

C backend re-architected to SSA form.

Well, one year later, I've managed to back into the CPS output via a completely different direction.  The rewrite began as an experiment, triggered by the completely unpredictable behavior of new versions of clang when compiling Irken's output.  [I also found a gcc bug.]  I thought I'd try to generate 'real' CPS - i.e., separate C functions that only make tail calls.  This turned out to be much easier than my attempts last year starting from the other end.

After only about 3-4 days of work, we now have output like this:


(let loop ((n 1000000))
  (if (zero? n)
      "done"
      (loop (- n 1))))



static void FUN_loop_17_0 (void) {
  O r10 = ((object*) lenv) [2];
  if PXLL_IS_TRUE(PXLL_TEST(unbox(r10)==0)) {
    O r12 = (object*) &constructed_0;
    PXLL_RETURN(12);
  } else {
    O r16 = (object *) 3;
    O r17 = ((object*) lenv) [2];
    O r15 = box((pxll_int)unbox(r17)-unbox(r16));
    lenv[2] = r15;
    FUN_loop_17_0();
  }
}
A few things to note: the loop is achieved with a tail call in C.  This requires that the C compiler perform tail call optimization or the stack will explode rather quickly.

The 'registers' are now emitted as local variables, with a fresh name for every line. This output is very, very close to the SSA form needed for LLVM itself, and is practically begging for an LLVM backend.

There is no longer any dependency on C extensions - the address-of-label extension was never a happy citizen in LLVM land, anyway - and the lexical functions extension (needed by Irken circa 3 years ago) is a distant memory.

The output of the compiler runs about the same speed, but the compilation time itself is much more predictable.  clang -O3 takes about 30 seconds, and gcc -O3 around a minute.  gcc actually seems to do a better job.  I need to play around with clang and gcc's options in order to figure out the best flags for fast edit-compile-link turnaround - I'm missing the 4-second compiles I used to have with clang and the old compiler.

Other great possibilities include separate compilation, JIT, and (possibly) closure conversion / passing args in registers.  An LLVM backend might open up the possibility of efficient floating-point support.

However, the next step is probably to write a real garbage collector.  I'm seriously considering using Irken for some real server work soon, and to survive in that world a two-space collector just isn't going to cut it.  I've read over the "Beltway" paper several times and I think I grok it well enough to implement it.