Thursday, January 20, 2011

The datatype + record idiom

One of the difficult decisions when working on a compiler is how to represent nodes.  There's a strain between the pure-functional and object-oriented styles that can be really painful.  I'm certainly not happy with the way I did it in Irken - a combination of pure-functional class with a bunch of attribute hacks.

While working to solve the same problem in self-hosted Irken, I've come across a pretty good idiom that captures the best of both worlds; giving me exactly the right combination of strong typing, mutability, and flexibility that I need.

The trick is to use a record as the node representation, but inside it put a slot that holds all the strongly-typed metadata that you need.  Here's what it looks like so far:

;; node type holds metadata related to the node,
;;  but sub-nodes are held with the record.
(datatype node
  (:varref symbol)
  (:varset symbol)
  (:literal literal)
  (:cexp type string)
  (:nvcase symbol (list symbol))
  (:sequence)
  (:if)
  (:function symbol (list symbol)) ;; name formals
  (:call)
  )

(define node-counter (make-counter 0))

;; given a list of nodes, add up their sizes (+1)
(define (sum-size l)
  (fold (lambda (n acc) (+ n.s acc)) 1 l))

(define (node t subs)
  {t=t subs=subs s=(+ (sum-size subs) 1) id=(node-counter.inc) type=(type:base '?)}
  )

(define (node/varref name)
  (node (node:varref name) '()))

(define (node/varset name val)
  (node (node:varset name) (LIST val)))

(define (node/literal lit)
  (node (node:literal lit) '()))

(define (node/cexp type template args)
  (node (node:cexp type template) args))

(define (node/sequence subs)
  (node (node:sequence) subs))

Note how the node constructor initializes the other slots in the record.  The interesting one for me today is the type field.   This is later filled in by the typer, succinctly:

(define (type-of exp tenv)
  (let ((t (type-of* exp tenv)))
    (set! exp.type t)
    t))

(define (apply-subst-to-program n)
  (set! n.type (apply-subst-to-type n.type))
  (for-each apply-subst-to-program n.subs))

(define (type-program node)
  (let ((t (type-of node (alist/make))))
    (apply-subst-to-program node)
    t))

We've still preserved the strong typing of nodes, though: here's what a typical node-visiting function looks like:

(define (compile tail? exp lenv k)

  ;; override continuation when in tail position
  (if tail?
      (set! k (cont (k/free k) gen-return)))
  
  (match exp.t with
    (node:literal lit)           -> (c-literal lit k)
    (node:sequence)              -> (c-sequence tail? exp.subs lenv k)
    (node:if)                    -> (c-conditional tail? exp lenv k)
    (node:function name formals) -> (c-function name formals (car exp.subs) lenv k)
    (node:varref name)           -> (c-varref name lenv k)
    (node:varset name)           -> (c-varset name (car exp.subs) lenv k)
    (node:cexp sig template)     -> (c-cexp sig template exp.subs lenv k)
    (node:call)                  -> (c-call tail? exp lenv k)
    _ -> (begin (pp-node exp 0) (error1 "NYI" exp))
    )
  )

Note how metadata that's specific to each node type is stored in the datatype (exp.t), but generic info (like the list of sub-nodes, or the size) is stored in the record.

No comments:

Post a Comment