Monday, April 5, 2010

stirrings of a vm

I'm finally beginning to think seriously about a VM. It's been a long road.

My approach, as usual, is maximally lazy. For now, I've pushed the parser to the side while I figure out what the byte code compiler will put out. That of course depends on what the VM will look like.

To begin, I'll simply re-use the Irken compiler, which already compiles a lambda-calculus based language into a register-based CPS instruction stream.

Here's what it looks like so far.

This bytecode is consumed by the VM, which works in two phases. First, it loads the bytecode into two vectors, one holding instructions, one holding data. (Note: the insn vector is typed as "vector((->undefined))"). Then it simply invokes the first insn. See vm/vm.scm for details.

The vector of insns allows us to build a 'threaded' VM: each insn is implemented as a no-argument function. The last thing each insn does is fetch and execute the next insn.

(define (insn-literal)
  (set! regs[vm-data[pcd]] vm-data[(+ pcd 1)])
  (set! pcd (+ pcd 2))

"((next-insn))" compiles to a bit of C that looks like this:
lenv = r0[2]; goto *r0[1];

This has the potential to be very fast.

My current stumbling block is where/how to straddle the divide between the typed and untyped universe. The VM is written in a strongly typed language, but I want it to implement an untyped language. In order to do that I need to be able to store arbitrary objects into the VM registers. I'm thinking of a couple of different approaches right now.

1) Add an 'object' type to the Irken compiler - this will act like a normal declared datatype, but when you pattern match or vcase it, the compiler will emit code to detect our run-time tags instead. I'll probably need some way of casting things to the object type.


2) *Remove* run-time tagging from Irken. This would make Irken even more like OCaml. The advantages should be obvious. The disadvantage would be I would have to implement some kind of stack map and teach the garbage collector how to use it. Also, Irken would become much more difficult to debug. But I could then implement the VM using fully declared and 'normal' datatypes without adding an extra layer of type tagging.

1 comment:

  1. I might be able to compromise between the two approaches. How about extending 'datatype' to automatically detect and 'optimize' the tagging of single-immediate-objects in sum types?

    For example,

    (datatype number (:int int) (:complex int int))

    In the first alternative, rather than treating that like a product of one element, the back end will just treat it like a normal tagged integer.

    The only time this would *not* work would be if you had more than one alternative with the same immediate type:

    (datatype number (:int0 int) (:int1 int))

    Since there would be no way of distinguishing the two without a tag.