Monday, April 26, 2010

tak20 benchmark runs in the VM

My favorite benchmark now runs in the VM, since I now have all three types of procedure invocation implemented: tail calls, tail recursive calls, and normal invocation. The last is the most complex, because it actually involves building a 'stack frame' and saving/restoring registers:

(datatype vmcont
  (:nil)
  (:k (vmcont) (lenv) int (vector (object)))
  )

Take a look at the current VM implementation, in Irken.

So a 'VM continuation' consists of the pointer to the next frame, the lexical environment, the PC, and a vector of saved registers.

The 'INVOKE' cps insn is turned into two VM insns, 'CALL' and 'POP'.

Even without any serious tweaking, the tak20 benchmark seems to run about the same speed as in python. But I plan to make specialized versions of CALL and POP for different numbers of saved registers. This will eliminate loops in the insns and thus get rid of allocation for the insn closures.

It's very rewarding to finally bring all this work together. Putting together a functioning VM is the very tip of an iceberg I've spent 2+ years building.

I'm even thinking about documenting the Irken language. Yeah, I know, crazy thoughts.

Next step: start building the python-like language. This would involve rewriting the byte-code compiler in Irken. OR, I could continue the nearly trivial task of supporting Irken as a byte-coded language, and start rewriting everything in Irken. As we all know, only the most fashionable and respectable languages are written in themselves. It's tempting!

Thursday, April 22, 2010

dragon egg

If you've ever tried to even read the instructions for building llvm-gcc (much less try to build it), you understand why this is good news.

http://dragonegg.llvm.org/

Sounds like gcc has gained a plugin architecture, and the llvm folks are putting together a plugin, 'DragonEgg', that uses LLVM as the back end.

Why do I care about this?

1) Clang still does not support lexical functions, which are critically important to Irken's performance.
2) The version of llvm-gcc distributed with Apple's XCode uses an older version of llvm that implements the address-of-label extension with a hack so horrible I would be embarrassed to describe it to you.

first loop in the vm!

I've implemented enough instructions in the VM that I can now do a loop (a.k.a. a tail-recursive function call).

I still need to work on the VM insns, there's a lot of allocation taking place that shouldn't... also, there's range-checking on each and every vector access.. even including stuff like the register file. Given those caveats, it's only about 4X slower than a similar python loop. I'm pretty sure I should be able to make it faster than python, we'll see.

--- cps ---
   0 = new_env [] 1
   -   push_env [0] None
   1 = close [] 'loop_0'
       0 = lit [] 3
       1 = varref [] ((0, 0), False, {n_1})
       0 = primop [0, 1] ('%=',)
       0 = test [0] None
           0 = lit [] 1
               return [0] None
           0 = varref [] ((0, 0), False, {n_1})
           1 = lit [] 2
           0 = primop [0, 1] ('%-',)
               tr_call [0] (1, )
           return [0] None
   -   store_tuple [1, 0] (0, 1, 1)
   0 = new_env [] 1
   1 = lit [] 0
   0 = store_tuple [1, 0] (0, 1, 1)
   1 = varref [] ((0, 0), True, {loop_0})
       invoke_tail [1, 0] 

Monday, April 5, 2010

stirrings of a vm

I'm finally beginning to think seriously about a VM. It's been a long road.

My approach, as usual, is maximally lazy. For now, I've pushed the parser to the side while I figure out what the byte code compiler will put out. That of course depends on what the VM will look like.

To begin, I'll simply re-use the Irken compiler, which already compiles a lambda-calculus based language into a register-based CPS instruction stream.

Here's what it looks like so far.

This bytecode is consumed by the VM, which works in two phases. First, it loads the bytecode into two vectors, one holding instructions, one holding data. (Note: the insn vector is typed as "vector((->undefined))"). Then it simply invokes the first insn. See vm/vm.scm for details.

The vector of insns allows us to build a 'threaded' VM: each insn is implemented as a no-argument function. The last thing each insn does is fetch and execute the next insn.

(define (insn-literal)
  (set! regs[vm-data[pcd]] vm-data[(+ pcd 1)])
  (set! pcd (+ pcd 2))
  ((next-insn)))

"((next-insn))" compiles to a bit of C that looks like this:
lenv = r0[2]; goto *r0[1];

This has the potential to be very fast.

My current stumbling block is where/how to straddle the divide between the typed and untyped universe. The VM is written in a strongly typed language, but I want it to implement an untyped language. In order to do that I need to be able to store arbitrary objects into the VM registers. I'm thinking of a couple of different approaches right now.

1) Add an 'object' type to the Irken compiler - this will act like a normal declared datatype, but when you pattern match or vcase it, the compiler will emit code to detect our run-time tags instead. I'll probably need some way of casting things to the object type.

or

2) *Remove* run-time tagging from Irken. This would make Irken even more like OCaml. The advantages should be obvious. The disadvantage would be I would have to implement some kind of stack map and teach the garbage collector how to use it. Also, Irken would become much more difficult to debug. But I could then implement the VM using fully declared and 'normal' datatypes without adding an extra layer of type tagging.