Sunday, May 15, 2011

nested datatypes, 2-3 trees, numerical encoding

This paper by Hinze & Paterson discusses the use of 'nested datatypes' to implement 2-3 trees, but in a way that uses the type system to automatically handle the invariants.  A blog post from Mark Chu-Carroll is a bit easier to follow.  Here's his example of a sequence encoded as a 2-3 tree (in Haskell):


data Tree t = Zero t | Succ (Tree (Node t))
data Node s = Node2 s s | Node3 s s s

Each level of the tree contains sub-trees that are 'one less' via the numeric encoding scheme.

This sent me back to Okasaki's book, where I'm finally starting to grok those last two chapters.  This bizarre mingling of numerical encoding and data structures lights a fire in the parts of my brain dedicated to data structures.  If feel as if I could automatically see the relationship between red-black trees, 2-3 trees, and binary number representations.... if only I could double my IQ.



I have this cool puzzle called the 'Spin-Out', which is actually a physical demonstration of the properties of Gray Codes.  It seems to me that Gray Codes and balanced trees should be a natural fit.


Unfortunately, I can't experiment with any of these ideas in Irken (or OCaml, or SML) because the nested datatypes require polymorphic recursion.

Type inference with polymorphic recursion is undecidable.  Haskell allows it only if the user provides a manual type signature.  I'm guessing this is impossible (i.e., unsafe) with SML/OCaml because of the presence of assignment?  Could such a feature safely be used with functions that the compiler can verify are pure?





No comments:

Post a Comment