Monday, January 31, 2011

simple approach to OO and exploding equirecursive types

While trying to port the graph algorithms (Kosaraju's algorithm / topological sort) I got to a point where I really wanted some imperative set/map objects, so I combined a 'class' using alists (for map) and lists (for set) with a few 'methods' put into records.

After playing with it for a while, I found a fairly neat way to do method calls and object layouts:

(define (set-class)

  (define (in self x)
    (let loop ((l self.list))
      (match l with
        () -> #f
        (hd . tl) -> (if (eq? hd x)
                         #t
                         (loop tl)))))
  (define (add self x)
    (if (self::in x)
        #u
        (set! self.list (list:cons x self.list))))
  
  (define (get self)
    self.list)

  (let ((methods {in=in add=add get=get}))
    (define (new l)
      {o=methods list=l})
    new
    ))

This leads to an efficient object layout very similar to Python's: one slot identifies the object type, the other slots hold instance variables.  To support this, I added a reader hack that translates


(var::method arg0)

into

(var.o.method var arg0)

The result is something that I'm pretty sure will act just like Python's duck typing... for example a generic output function may call the 'write' method on an object, and its type will reflect that need: for a write method of a particular signature.  Instead of 'o' I'd probably use a slot name like '__methods__'... this suggests standard repr/str interfaces, and the other duck-typed protocols that Python makes such good use of.

Pretty cool, huh?

Well, as soon as I tried actually using this code, the inferencer fell to its knees. I'm not kidding. It went from a 30 second compile to a 20 minute compile, eating up over a GB of memory.  Hmmmm... looks like my quick-and-easy implementation of μ-types has problems, surprise.  I think I was 'asking for it' by calling each method with a 'self' argument... each method call has to unify the object type twice.  And storing objects inside of other objects doesn't help, either.

Back to the textbook.  I'm hoping that reading this paper again will help.

Wednesday, January 26, 2011

self-hosting dilemma

I have nearly the whole compiler rewritten in itself.  I've saved one of the nastiest bits for last, the inliner.  I've been planning on a new design for the inliner - rather than just copying the Python code.  So of course, what happens in this late stage?  A bug in the inliner!

The program types correctly if inlining is turned off, but fails if it's turned on.  This is a great example of the gains to be had by typing both before and after phases of the compiler - but unfortunately in this case it means I have to fix a bug in some code that I intend to throw away as soon as it works!

Theoretically I could just turn off inlining until I'm done, and then write the new inliner in the self-hosted environment - but I can't be absolutely sure that it's not a more complex interaction between the typer and the inliner.

Thursday, January 20, 2011

bloated C compilers

I've been testing Irken out on my netbook (a Samsung NC10), and discovered a big problem with feeding Irken's output to gcc.

When I tried to compile the self-hosted Irken with -O, the netbook fell on its knees.  After about 15 minutes, I decided to look into it... why hadn't the compile finished yet?

The input Irken source was about 60kB, which exploded into a 920kB C file.  The C output is actually pretty low-level stuff, so it's probably comparable to a 300kB hand-written C file.  But let's just call it 1MB and move on...

The gcc (version 4.4) process had eaten 1.4GB of virtual memory, so it was paging on the netbook.

That's 1600X the size of the C input file.  It's nearly 25,000X the size of the Irken input.  Wow.

So as a test I installed dragonegg and gcc-4.5.  Quite a difference.  The dragonegg compile held at 143MB for most of the compile, and peaked at around 200MB.  Only 226X the size of the C input, and a mere 3400X the size of the Irken input.

But what's an order of magnitude between friends?

The datatype + record idiom

One of the difficult decisions when working on a compiler is how to represent nodes.  There's a strain between the pure-functional and object-oriented styles that can be really painful.  I'm certainly not happy with the way I did it in Irken - a combination of pure-functional class with a bunch of attribute hacks.

While working to solve the same problem in self-hosted Irken, I've come across a pretty good idiom that captures the best of both worlds; giving me exactly the right combination of strong typing, mutability, and flexibility that I need.

The trick is to use a record as the node representation, but inside it put a slot that holds all the strongly-typed metadata that you need.  Here's what it looks like so far:

;; node type holds metadata related to the node,
;;  but sub-nodes are held with the record.
(datatype node
  (:varref symbol)
  (:varset symbol)
  (:literal literal)
  (:cexp type string)
  (:nvcase symbol (list symbol))
  (:sequence)
  (:if)
  (:function symbol (list symbol)) ;; name formals
  (:call)
  )

(define node-counter (make-counter 0))

;; given a list of nodes, add up their sizes (+1)
(define (sum-size l)
  (fold (lambda (n acc) (+ n.s acc)) 1 l))

(define (node t subs)
  {t=t subs=subs s=(+ (sum-size subs) 1) id=(node-counter.inc) type=(type:base '?)}
  )

(define (node/varref name)
  (node (node:varref name) '()))

(define (node/varset name val)
  (node (node:varset name) (LIST val)))

(define (node/literal lit)
  (node (node:literal lit) '()))

(define (node/cexp type template args)
  (node (node:cexp type template) args))

(define (node/sequence subs)
  (node (node:sequence) subs))

Note how the node constructor initializes the other slots in the record.  The interesting one for me today is the type field.   This is later filled in by the typer, succinctly:

(define (type-of exp tenv)
  (let ((t (type-of* exp tenv)))
    (set! exp.type t)
    t))

(define (apply-subst-to-program n)
  (set! n.type (apply-subst-to-type n.type))
  (for-each apply-subst-to-program n.subs))

(define (type-program node)
  (let ((t (type-of node (alist/make))))
    (apply-subst-to-program node)
    t))

We've still preserved the strong typing of nodes, though: here's what a typical node-visiting function looks like:

(define (compile tail? exp lenv k)

  ;; override continuation when in tail position
  (if tail?
      (set! k (cont (k/free k) gen-return)))
  
  (match exp.t with
    (node:literal lit)           -> (c-literal lit k)
    (node:sequence)              -> (c-sequence tail? exp.subs lenv k)
    (node:if)                    -> (c-conditional tail? exp lenv k)
    (node:function name formals) -> (c-function name formals (car exp.subs) lenv k)
    (node:varref name)           -> (c-varref name lenv k)
    (node:varset name)           -> (c-varset name (car exp.subs) lenv k)
    (node:cexp sig template)     -> (c-cexp sig template exp.subs lenv k)
    (node:call)                  -> (c-call tail? exp lenv k)
    _ -> (begin (pp-node exp 0) (error1 "NYI" exp))
    )
  )

Note how metadata that's specific to each node type is stored in the datatype (exp.t), but generic info (like the list of sub-nodes, or the size) is stored in the record.

Tuesday, January 18, 2011

making the match compiler smarter

The match compiler developed a little problem.  It's amazing that it didn't show up earlier.  As described in chapter 5 of Peyton-Jones' book, "The Implementation of Functional Programming Languages", certain 'obvious' or 'optimized' pattern matches actually compile to worse code.

The following function is from the book (pg 88), and is similar to an implementation of map2.

(define demo
  () ys -> (:A ys)
  xs () -> (:B xs)
  (x . xs) (y . ys) -> (:C x xs y ys)
  )

The second match clause will actually compile more efficiently if you use (x . xs) rather than a variable, because the match compiler knows in this case you're matching a cons cell. (it avoids the 'mixture rule' and uses only the 'constructor rule').

Regardless of this optimization, I ran into a more significant issue.  If I use xs, I get an 'incomplete match' error, because the code generated for this match has to examine the first list twice - in the first test it covers only the nil case.  In the second test, it covers the cons case, but has an 'else' clause that generates a match error.

Here's what the node tree looks like:

290 L    primapp [33] ('%%fatbar', None) ? 
262 L      nvcase [5] ('list', ['nil']) ? 
258 L        varref [1] m0_7_i0 ? 
260 L        primapp [2] ('%vcon/A/1', None) ? 
259 L          varref [1] m1_8_i1 ? 
261 L        primapp [1] ('%%fail', None) ? 
289 L      primapp [27] ('%%fatbar', None) ? 
267 L        nvcase [5] ('list', ['nil']) ? 
263 L          varref [1] m1_8_i1 ? 
265 L          primapp [2] ('%vcon/B/1', None) ? 
264 L            varref [1] m0_7_i0 ? 
266 L          primapp [1] ('%%fail', None) ? 
288 L        nvcase [21] ('list', ['cons']) ? 
268 L          varref [1] m0_7_i0 ? 
286 L          let_splat [18] [{m2_9_i0}, {m3_10_i0}] ? 
270 L            primapp [2] ('%nvget/list/cons/0', None) ? 
269 L              varref [1] m0_7_i0 ? 
272 L            primapp [2] ('%nvget/list/cons/1', None) ? 
271 L              varref [1] m0_7_i0 ? 
285 L            nvcase [13] ('list', ['cons']) ? 
273 L              varref [1] m1_8_i1 ? 
283 L              let_splat [10] [{m4_11_i0}, {m5_12_i0}] ? 
275 L                primapp [2] ('%nvget/list/cons/0', None) ? 
274 L                  varref [1] m1_8_i1 ? 
277 L                primapp [2] ('%nvget/list/cons/1', None) ? 
276 L                  varref [1] m1_8_i1 ? 
282 L                primapp [5] ('%vcon/C/4', None) ? 
278 L                  varref [1] m2_9_i0 ? 
279 L                  varref [1] m3_10_i0 ? 
280 L                  varref [1] m4_11_i0 ? 
281 L                  varref [1] m5_12_i0 ? 
284 L              primapp [1] ('%%match-error', None) ? 
287 L          primapp [1] ('%%match-error', None) ? 

...and in a pseudo-python:
if nil(m0):
  A()
elif nil(m1):
  B()
else:
  if cons(m0):
    if cons(m1):
      C()
    else:
      raise Error
  else:
    raise Error

The problem is that the match is actually exhaustive, because the nil case was examined already.  %%match-error in Irken is actually a kind of sentinel, the back end doesn't know how to compile it, so you can't actually compile code that's not exhaustive.

To fix this, I added an extra pass to the analysis phase that builds environments of 'already tested alternatives' for each nvcase variable. This actually killed two birds with one stone. In many cases, the innermost nvcase test is completely unnecessary - for this example, we've already tested for nil, and actually testing for a cons is redundant. In this case we simply remove the nvcase test (and the erroneous match-error), and return the body.

The code then generated for map is closer to what a human might write:

pseudo-python:

if nil(m0):
  A()
elif nil(m1):
  B()
else:
  C()

RTL:

0 = constructed [] 1
   1 = constructed [] 0
   2 = fatbar 'fatbar_0' []
       2 = move [0, None] 'm0_7_i0'
       2 = nvcase 'list' [2]
           2 = move [1, None] 'm1_8_i1'
           2 = primop [2] ('%make-tuple', 'A', 0)
               return [2] None
               fail [] ('fatbar_0', 0)
           return [2] None
       2 = fatbar 'fatbar_1' []
           2 = move [1, None] 'm1_8_i1'
           2 = nvcase 'list' [2]
               2 = move [0, None] 'm0_7_i0'
               2 = primop [2] ('%make-tuple', 'B', 1)
                   return [2] None
                   fail [] ('fatbar_1', 0)
               return [2] None
           2 = move [0, None] 'm0_7_i0'
           2 = primop [2] ('%vget', '0')
           3 = move [0, None] 'm0_7_i0'
           3 = primop [3] ('%vget', '1')
           4 = move [1, None] 'm1_8_i1'
           4 = primop [4] ('%vget', '0')
           5 = move [1, None] 'm1_8_i1'
           5 = primop [5] ('%vget', '1')
           6 = move [2, None] 'm2_9_i0'
           7 = move [3, None] 'm3_10_i0'
           8 = move [4, None] 'm4_11_i0'
           9 = move [5, None] 'm5_12_i0'
           6 = primop [6, 7, 8, 9] ('%make-tuple', 'C', 2)
               return [6] None
           return [2] None
       return [2] None

Friday, January 14, 2011

progress on self-hosting

Making excellent progress on self-hosting.  Today the baby irken compiler spat up its first real code:

beast:irken rushing$ self/backend

-- reader --
(((lambda (x y) x) 3 4))

-- macros --
((function lambda (x y) x) 3 4)

-- node tree --

5 call
  2 function lambda (x y)
    1 varref x
  1 literal {u1 3}
  1 literal {u1 4}

-- RTL --

0 env 2 
1 lit {u1 3}
- stor 1 1 0 0 
1 lit {u1 4}
0 stor 1 1 0 1 
1 close lambda
  0 ref 0 0 
  - ret 0
0 tail 1 0 
- ret 0

-- C output --
  r0 = allocate (TC_TUPLE, 3);
  r1 = (object *) 7;
  r0[2] = r1;
  r1 = (object *) 9;
  r0[3] = r1;
  // def lambda
  goto L0;
  FUN_lambda:
    r0 = varref (0, 0);
    PXLL_RETURN(0);
  L0:
  r1 = allocate (TC_CLOSURE, 2);
  r1[1] = &&FUN_lambda; r1 = lenv;
  r0[1] = r1[2]; lenv = r0; goto *r1;
  PXLL_RETURN(0);
#u
{total ticks: 1870544 gc ticks: 0}

Thursday, January 13, 2011

format macro

How cool is this?

;; -*- Mode: Irken -*-

(include "lib/core.scm")
(include "lib/pair.scm")
(include "lib/string.scm")

(defmacro format
  (format)                       -> (list:nil)
  (format (<int> n) item ...)    -> (list:cons (int->string n) (format item ...))
  (format (<char> ch) item ...)  -> (list:cons (char->string ch) (format item ...))
  (format (<bool> b) item ...)   -> (list:cons (bool->string b) (format item ...))
  (format (<string> s) item ...) -> (list:cons s (format item ...))
  (format x item ...)            -> (list:cons x (format item ...))
  )

(print-string (string-join (format "testing" (int 1) (bool #t) (char #\A) "\n") " "))

beast:irken rushing$ tests/t_format
testing 1 #t A 
#u
{total ticks: 154666 gc ticks: 0}

Let me explain what's going on here.  First, this is a macro definition.  Macros are specified using a pattern language.  To the left of each arrow is an input pattern - on the right is how that input will be transformed; before the code gets to the compiler  (in other words, textual substitution).

Symbols surrounded by angle brackets are keywords - i.e., constant symbols. This is to distinguish them from variables. Every other name in the input pattern is bound to a variable, which can be used in the output form.

So the final line is transformed into this expression:

(print-string
 (string-join
  (list:cons "testing"
             (list:cons (int->string 1)
                        (list:cons (bool->string #t)
                                   (list:cons (char->string #\A)
                                              (list:cons "\n" (list:nil))))))
  " "))

This is actually a very simple macro, the patterns could be much more sophisticated, with embedded list structures and other keywords. But it creates the illusion of a variable-arity, variably-typed format function in a language that is strictly, statically typed and does not support variable arity! And of course other macros may be invoked. I'm sure I'll be expanding it as I work more on Irken's self-hosted back end.

Usually string formatting is an ugly part of any language - one of the classic problems with C is that the format language is ill-specified and dangerous.  Only recently have compilers become smart enough to even do rudimentary checking...

I'm only just beginning to see what's possible with this macro system.  I think the pattern matching syntax combines very nicely with the general pattern matching in the language.

Saturday, January 8, 2011

record type declarations

I finally needed to declare record types in some complex data structures, so I spent about 10 minutes extending the type parser to support it. The syntax is IMHO obvious.

I was pleasantly surprised to see how easy it was to add this feature, and all the testing went quickly and as expected. I think this is all a 'side-effect' (pun intended) of a relatively complete and sane type system.

Here's what the syntax looks like:

;; -*- Mode: Irken -*-

(include "lib/core.scm")
(include "lib/pair.scm")
(include "lib/string.scm")

;; declare a new datatype <thing> with only one alternative <t>
;;   which consists of an open record type... i.e., a record that
;;   may contain other unknown fields.
(datatype thing
  (:t {x=int y=char ...})
  )

(define (test1)
  (thing:t {x=3 y=#\b z=9})
  )

(define (test2)
  (thing:t {x=4 y=#\c z=#\a a=#t b=#f}))

;; make sure it still works with *no* extra fields
(define (test3)
  (thing:t {x=4 y=#\a}))

(printn (test1))
(printn (test2))
(printn (test3))