Monday, January 12, 2009

The mess with variable ids.

Variable identification tags can contain four different types of information. In Haskell we would write it as such:

data Id = Empty -- Unused binding. Eg: '\ _ -> ...'.
| Etherial Int -- Internal variable. Only used when type-checking.
| Anonymous Int -- Anonymous variable created by the compiler.
| Named Name -- Named variable created by the user.

However, in LHC this data structure was unrolled and packed into an Int. The encoding went as following:

Empty = zero
Etherial = negative numbers
Anonymous = even, positive numbers
Named = odd, positive numbers, used as keys in a global hash table.

This encoding gives us very fast operations on Sets and Maps but it also punishes mistakes with a vengeance. The increased performance is definitely not worth it and we've been working on untangling the Ids from day-1.

As of today, I'm glad to say that we've finally restored the beautiful ADT and we can now hack without fear of segfaulting.

2 comments:

  1. At least, not at compile time ;-P.

    ReplyDelete
  2. I feel your pain. I made the same mistake in Happy and I'm still regretting it, someday I should change it back.

    ReplyDelete