Wednesday, February 2, 2011

recursive types, unification, sharing

I'm having trouble with recursive types in my type reconstruction (type inference) algorithm, some kind of combinatorial explosion.  I think there are multiple triggers for this: at each call to unify(), I'm running a complete apply-subst(), rather than a level at a time.  For the massive types generated by my OO code, this means lots of wasted, repeated work.

But of course there's even more going on.  I think my code that detects recursive types is not working as well as it should, and I think the cause is the lack of 'sharing'.  Remy/Pottier use a different style of unification, that involves preserving shared elements of types, and I think this where I'm screwing up: because parts of types are being copied, I'm missing links between identical parts of the graph.

I found a nice presentation from Remy describing "Core ML", that explains this in (accented) English, rather than a relentless flurry of LaTeX: "Using, Understanding, and Unraveling The OCaml Language: From Practice to Theory and vice versa".

The constraint-oriented solver gave me recursive types for free, (because its output types are naturally graphs), but I'm really hoping to avoid going back to that code, because it's much slower and I frankly don't understand it as well as I'd like.  Maybe I can pull out just the multi-equation unification stuff, and get back recursive types.


  1. Quoth Remy: "In order to obtain maximum sharing, non-variable terms should never be copied. Hence, rule Decompose requires that one of the two terms to be decomposed is a small term—which is the one used to preserve sharing."

    I think this means to always leave a tvar attached to a term. That variable acts like a rendezvous for all the identical uses of that type - including recursive ones.

  2. Found an excellent reference for this, that gives Huet's quai-linear algorithm, *and* handles 'infinite unification'.

    So Huet's algorithm, along with path compression, runs in 'quasi-linear' time which is O(nα(n)), where α is the inverse of Ackerman's function. Never thought I'd see a real-world appearance of Ack.

  3. Ok, I've implemented Huet's algorithm, and -- fingers crossed -- it appears to work, and to correctly handle infinite types. The fledgling self-hosted Irken types in 11s now, so the performance is also looking good. The extra scrutiny was good, I found a few subtle problems that are now fixed.