Saturday, December 19, 2009

Thinking about a new inliner...

I recently found a problem with the inliner. After adding the code to unpack the data attached to variant types - I deliberately left the unpacker in a form like this:

((lambda (f0 f1 f2...) <body>) x.f0 x.f1 x.f2)

Something like that might be generated for a vcase expression like this:

      (define (pop)
(vcase stack
((:nil) (error "stack underflow"))
((:elem item _ tail)
(set! stack tail)

I then made the inliner recognize ((lambda () ...) ...) applications and inline them when it could. This revealed a major problem with my inliner, most easily demonstrated like this:

(let ((n 10))
((lambda (x) (set! n 5) x) n))

This should return 10, not 5. In the stack example above, instead of returning the popped item, it returns the new top of the stack.

Some kind of assignment analysis is needed.

While digging around for a good reference on the subject, I found a really good paper on inlining that seems to cover this and several other issues. It looks pretty straightforward to implement, so there may be another pause-to-work-on-irken phase.

Sunday, December 13, 2009

avoiding address range checks in the copying collector

I added this hack years ago, in another project; so I don't remember if I came up with the idea myself or borrowed it from elsewhere.

The Cheney algorithm leaves a 'forwarding' pointer in fromspace to identify objects which have been evacuated to tospace. When looking at a pointer in fromspace, you need to do an address range check on it to figure out which space it points to. This is relatively expensive. What I do instead is store a sentinel in place of the object header, and store the forwarding address immediately after it. Then the scan can identify forwarded objects by just looking for the sentinel.

Unfortunately, this breaks for zero-element tuples - there's nowhere to store the forwarding pointer.

Normally, zero-element tuples shouldn't exist - it's more efficient to represent such an object by an immediate object (like 'nil'). This works for everything - except for vectors. Vectors are by their nature run-time beasts that must support zero length.

So I've added support for a new immediate type - TC_EMPTY_VECTOR, which will have to be checked for at runtime, for example in "vector-length".

Saturday, December 12, 2009

More speedups for the solver

When fed large chunks of pre-initialized data (in this case parser tables), the type solver fell over. It was taking on the order of minutes to solve about 1000 lines of parser table (which is nowhere near the size of the full parser).

The problem was that the split/unname phase of the solver doesn't kick in until the type of a particular node is completely solved - and this type was running to over 10,000 type variables. This was triggering a lot of nasty N² behavior in the solver, which shunts equations in and out of the unifier as it fuses/decomposes them.

The fix I found is to trigger an 'extra' unname phase whenever 1000 tvars have been added to an s_let() frame. This cut the time down from 1:20 to 8 seconds.

Another thing I'm considering - in order to avoid the 'overly specific type' problem - which really goes crazy when fed literal lists... I may try to type variant constructors as a funcall instead.

One last possibility - to add datatype declarations back, and just give up on the row-based variants. I don't want to do that, because I think they have a very pythonic feel to them.

p.s. I threw up a new tarball a few minutes ago.

Monday, December 7, 2009

Typed call-with-current-continuation?

I ran into an interesting problem today. In certain use cases, make-generator fails to type correctly:

(define (make-generator producer)
(let ((ready #f)
;; just holding useless continuations
(caller (call/cc id))
(saved-point (call/cc id)))

(define (entry-point)
(lambda (k)
(set! caller k)
(if ready
(saved-point #f)
(producer yield)))))

(define (yield v)
(lambda (k)
(set! ready #t)
(set! saved-point k)
(caller v))))

Note the call (saved-point #f). This is returning #f to a normally-ignored continuation:

(define (make-int-generator n)
(lambda (consumer)
(let loop ((n 0))
(consumer n)
(loop (+ n 1))))))

Depending on where you place a call to your consumer function, the continuation may or may not be ignored. I got different results depending on whether I called an external function to produce values, or an inline loop. I thought maybe by rewriting make-generator using lower-level prims putcc and getcc, I could make the problem go away.

Although it has simplified make-generator, the problem hasn't really gone away:

(define (make-generator producer)
(let ((ready #f)
;; holding useless continuations
(caller (getcc))
(saved-point (getcc))
(define (entry-point)
(set! caller (getcc))
(if ready
(putcc saved-point #u)
(producer yield)))
(define (yield v)
(set! saved-point (getcc))
(set! ready #t)
(putcc caller v))

I made the problem go away by changing the type declaration for putcc:

(define (putcc k r)
(%%cexp (continuation 'a -> 'b) "(k=%s, %s)" k r))

It used to read (continuation 'a -> 'a), and if you look at the tiny bit of C code there that's exactly what it does. However. We're talking about continuations here. It's very difficult to wrap my head around what it does 'in the real world'.

I'm sure that my getcc/putcc are not new, they're just the obvious way to save and store continuations. Have I typed this correctly, or have I waved my hand? How do the other typed languages deal with this problem?

One solution I tried was to pass make-generator an extra argument, of the generated type, and use that for the call to saved-point. It feels wrong, but ultimately may be the cleanest approach.

Wednesday, December 2, 2009

Status Update

I've spent some time trying to speed up the type solver. My most complicated test is probably 't20.scm', which tests a generated lexer. Since it uses symbols it includes the entire library, including red-black trees.

I was able to trim the type reconstruction down from 50 seconds to about 7. I did this mainly by 'pruning' the set of equations making up a solved type scheme - keeping only those directly or indirectly referenced by the root type variables for the scheme. This seems to have lost me some of the outrageous type specificity, and seems similar to what Pottier's mini-prototype code does.

The 7 seconds for this is actually closer to 3.5s, since I run the typer twice. I think I can live with this speed for a while... that's to compile about 65KB of raw scheme code.

Next up: implement the LR(1) parse engine, then put the file I/O code, lexer, and parser all together for the first time.