Notes on the LHC: 2009

Friday, September 4, 2009

Yet another unfair benchmark.

A lot of things has happened in LHC over the last couple of weeks. With the inclusion of Integer and IEEE float support, LHC is finally usable enough for simple benchmarks.

I've excavated the old 'nobench' benchmark and pitched four Haskell implementation up against each other. It should be noted that these benchmark numbers are even more unreliable than usual. UHC's C backend doesn't work on x84-64 and thus it compiles to bytecode. All in all, you should trust benchmarks as much as you trust politicians.

The benchmark results can be found here: http://darcs.haskell.org/~lemmih/nobench/x86_64/results.html.

The results are updated frequently.

The benchmark source can be found here: http://nobench.seize.it/

Saturday, August 15, 2009

Status update: New Integer implementation.

We've finally gotten around to replacing our Integer type with a real bignum implementation. The bignum code was written by Isaac Dupree in wonderfully pure Haskell, and it was a snug fit for our needs. After stripping the Prelude dependency and hooking it up to the Integer interface, it worked without a hitch.

Let's try it out:


david@desktop:lhc$ cat HelloWorld.hs
module Main where
main = do print (2^150)
          print (3*10^13*299792458)
david@desktop:lhc$ ./HelloWorld
1427247692705959881058285969449495136382746624
8993773740000000000000

Monday, June 8, 2009

New backend.

The new C backend has been pushed to the repository and it seems to work without a hitch. No particular effort has been directed at making it efficient (and none will since this backend is only a temporary measure). Initial testing shows it to be around 40-50 times faster than the interpreter.
Writing this backend was surprisingly easy; low-level Grin (LHC's intermediate language) can be directly pretty-printing as C code. By far the hardest part was giving up on LLVM and settling for C.

Future development will focus on grin-to-grin optimizations and a native code generator.

Thursday, May 21, 2009

New release: LHC 0.8

It's been about 5 months but, finally, a new release of LHC has been born and is on hackage - so you should get it now!

This new release has been a lot of hard work on behalf of David especially, and we've spent the past day or two working out a lot of installation issues on my MacBook etc.. But the result is looking really nice, even if premature. There are still some bugs to work out, but for the most part all our installation issues are fixed, and development can steam ahead on more interesting stuff.

Perhaps the biggest change in this release is that LHC is now a backend for GHC instead of its own codebase. Amongst other things, this pretty much means that LHC already has support for all of GHC's language extensions. Also, it shares the exact same command line options (and a few more,) so it's pretty similar to GHC on the hood.

The code base is very small, but it is simple: there is no garbage collection or exceptions, threading etc.. Everything is slow right now and the heap etc. are dummy. The result already works well though, and so we're releasing it now for your pleasure.

There are full installation instructions for LHC + libraries HERE.

Enjoy.

Monday, May 4, 2009

Constructor specialization and laziness.

Edward recently publicised some experiments with constructor specialization and the state monad. You can find the sources here.

What he did was basically to remove laziness and polymorphism from the state monad using a fancy new GHC feature called indexed type families. Benchmarking the different implementation was done by calculating the Fibonacci sequence and printing the 400,000th element.

There are quite a number of such adaptive data types. They range from lists to maps to monads but they all share two fundamental drawbacks: (1) All usage combinations must be explicitly enumerated, (2) laziness must be eliminated. Fortunately for LHC, using whole-program optimization solve both problems (by static analysis and unboxing at a higher granularity).

I believe it's important to realise that polymorphism and laziness are two sides of the same coin. Destroy one and you are likely to inadvertently destroy the other. 'Is this a bad thing?' you might ask. Well, the short answer is "yes!". Laziness, when used correctly, is incredibly powerful. Let's have another look at the State monad benchmark.

The following program implements the above mentioned benchmark. It is 10 times faster and uses 50 times less memory than the most efficient strict version.


{-# OPTIONS_GHC -O2 -XBangPatterns #-}
import Control.Monad
import Control.Monad.State

main = print . last . fst $ fib 400000

fib n = unS (fibN n) (0,1::Int)
fibN n = replicateM' n oneFib
oneFib = (get >>= \(m,n) -> put (n,m+n) >> return m)

replicateM' 0 fn = return []
replicateM' n fn
   = do !e ←  fn
        r ←  replicateM' (n−1 ∷ Int) fn
        return (e:r)

So, in conclusion: Constructor specialization is an interesting technique but its full power can only be realised as an interprocedural optimization pass.

Friday, April 10, 2009

A new beginning.

The LHC project has finally resumed development after a few weeks of inactivity. Things have taken big steps in a new direction, however, and nearly everything except the name has changed.
We're no longer a fork of JHC. Maintaining a complete Haskell front-end was too much of a hassle, especially considering we're only interested in optimization on the GRIN level. For this reason, LHC has reinvented itself as an alternative backend to the Glorious Glasgow Haskell Compiler.

The lack of testability was a major problem in the previous version of LHC but hopefully we've learned from our mistakes. The new development efforts will be structured around a decremental reliance on a GRIN evaluator. In other words, we want to run the full testsuite between each and every significant code transformation. That no transformation should change the external behaviour of a GRIN program is a very simple invariant.

The current toolchain looks as following:


david@desktop:basic$ cat Args.hs
import System

main :: IO ()
main = do
    as <- getArgs
    mapM_ putStrLn as
david@desktop:basic$ ghc -fforce-recomp -O2 -fext-core -c Args.hs 
david@desktop:basic$ lhc compile Args.hcr > Args.lhc
david@desktop:basic$ ./Args.lhc One Two Three
One
Two
Three

The contents of 'Args.lhc' is unoptimized GRIN code. It is not by any means efficient or fast but it serves its purpose.

Development will now focus on creating GRIN transformations that reduces the need for the RTS (our GRIN evaluator serves as the RTS).

Wednesday, April 8, 2009

Hello world!

After weeks of development, lhc is finally able to interpret Hello World!


david@desktop:lhc$ cat HelloWorld.hs
module Main where
main = putStr "Hello world\n"
david@desktop:lhc$ ghc -O2 -fext-core HelloWorld.hs -c
david@desktop:lhc$ lhc build HelloWorld.hcr > HelloWorld.grin
Parsing core files...
Tracking core dependencies...
Translating to grin...
Removing dead code...
Printing grin...
david@desktop:lhc$ wc -l HelloWorld.grin
8054 HelloWorld.grin
david@desktop:lhc$ lhc eval HelloWorld.hcr 
Parsing core files...
Tracking core dependencies...
Translating to grin...
Removing dead code...
Hello world
Node (Aliased 251 "ghc-prim:GHC.Prim.(#,#)") (ConstructorNode 0) [Empty,HeapPointer 263]

Supported primitives include: indexCharOffAddr#, newPinnedByteArray#, *MutVar, *MVar.

Exceptions are currently ignored and the heap is never garbage collected. However, since I'm evaluating the GRIN (as opposed to translating it to LLVM or C), adding these features should be easy as cake.

Sunday, March 22, 2009

Ease of implementation.

Developing a usable compiler for a high-level language such as Haskell isn't a trivial thing to do. Any effort to trade developer time against CPU time is likely to be a wise choice. In this post I will outline a few attempts to deal with the complexity of LHC in high-level ways. Hopefully the end result won't be too slow.

Case short-circuiting.
Since case expressions in GRIN do not force the evaluation of the scrutinized value, they are usually preceded by a call to 'eval'. Then, after the 'eval' calls have been inlined, case-of-case patterns like this are very common:


do val <- case x of
            []         -> unit []
            CCons x xs -> unit (CCons x xs)
            Ffunc a    -> func a
case val of
  []         -> jumpToNilCase
  CCons x xs -> jumpToConsCase x xs

This is obviously inefficient since the case for Nil and Cons will be scrutinized twice. In the GRIN paper, Boquist deals with this by implementing a case short-circuiting optimization after the GRIN code has been translated to machine code. However, dealing with optimizations on the machine code level is quite a tricky thing to do and I'd much rather implement this optimization in GRIN. By making aggressive use of small functions we can do exactly that:


do case x of
   []         -> jumpToNilCase
   CCons x xs -> jumpToConsCase x xs
   Ffunc a    -> do val <- func a; checkCase val

checkCase val =
  case val of
    []         -> jumpToNilCase
    CCons x xs -> jumpToConsCase x xs

Register allocation and tail calls.
Using a fixed calling convention is not necessary for whole-program compilers like LHC. Instead, we choose to create a new calling method for each procedure (this is easier than it sounds).
This has the obvious consequence of requiring the convention for return values to be identical for procedures that invoke each other with tail-calls. This was deemed an unacceptable restriction in the GRIN paper, and all tail-calls were subsequently removed before register allocation took place. Afterwards, another optimization step reintroduced tail-calls where possible.
I believe this is too much trouble for too little gain. The possible performance hit is out-weighed by the ease of implementation and the guarantee of tail-calls.

Simple node layout.
An unevaluated value is represented simply by a function name (or tag) and a fixed number of arguments. This value is then overwritten once it has been evaluated. However, the new value may be bigger than what was allocated to represent the unevaluated function.
One way to deal with this is to have two different node layouts: a fixed size node for small values, and a variable size node for big values. This is the approach taken in the GRIN paper and it understandably adds quite a bit of complexity.
Another method is to use indirections. This trades smaller average node size and ease of implementation against more heap allocations.

Thursday, February 5, 2009

Grin a little.

It has come to my attention that we are not using GRIN to it fullest. More specifically, it seems that the 'eval' and 'update' operations are handled by the RTS. This has unfortunate consequences for both the optimizer and the backend code.
Without an explicit control-flow graph (given by inlining eval/apply), many of our more important transformations cannot be performed. Even worse than the lost optimization opportunities is the increased complexity of the RTS. Dealing with issues of correctness is an annoying distraction from the more enjoyable endeavour of applying optimizations.

Moving away from the magical implementation of 'update' means we have to starting thinking about our memory model. The GRIN paper suggests using a fixed node size with a tail pointer for additional space if necessary. With this scheme we can update common-case nodes without allocating more heap space. However, since we're most likely to encounter complications with respect to concurrency and certain forms of garbage collection, I think a simpler approach is more apt.
Replacing nodes with indirections is very easy to implement, it doesn't clash with any optimizations (the original GRIN approach interfere with fetch movement), and it opens the door for advanced features such as concurrency.

So this is what I'll be working on in the near future. All magic has to be purged from the kingdom so logic and reason can reign supreme.

Tuesday, January 27, 2009

Release notes.

Version 0.6.20090126 has been released. It has been more than a month since our last release and we've made a lot of progress. The code is available from Hackage and can be installed as such:


cabal install lhc -fwith-base

Here's our changelog:

Fixed type classes.
Better variable ids.
Base library reorganization.
Better support for non-Linux systems.
Removed tagging on Int and Word.
Got Control.Arrow and Control.Applicative working by improving the handling of (->) as well as fixing type classes.
More extensive testsuite.
Lots of code clean-up.

Future effort will be directed at adding Integer support in the base library, improving the efficiency of LHC and restoring control-flow analysis.

Cheers,
The LHC Team.

Sunday, January 25, 2009

Thoughts on a new code generator

Code selection through object code optimization by Davidson and Fraser describes a compiler architecture where instead of taking an AST and optimizing it, then generating code for a target machine, we instead take the AST and immediately generate worst-case instructions which we then subsequently optimize, and then emit into assembly language. These instructions we generate and optimize are called register-transfer lists or RTLs.

RTLs are quite simple - a solitary RTL is just a single abstract machine instruction. A sequence of RTLs might look like so:

t[1] := reg[12]
t[2] := mem[8] * reg[0]
t[3] := t[1] + reg[3]
t[4] := t[2] * 10

An RTL must specify a simple property - the machine invariant, which states that any RTL maps to a single instruction on the target machine.

The compilation process is like so, the input language (for example, C--, GRIN or any AST for that matter) is taken and fed to the code expander which takes the input AST and generates a simple sequence of RTLs representing the AST - no attention at this point is paid to the generated code, and this keeps the code expander's job simple and easy. The code expander, however is a machine-dependent part of the backend - it requires knowledge of the target machine so that it may establish the machine invariant.

After expansion, the code is subsequently optimized using e.g peephole optimization, constant subexpression elimination and dead code elimination. These optimizer parts are machine independent. Every optimization must make sure it does not violate the machine invariant.

After optimization, we perform the task of code emission, it being the only other machine-dependent part of the compiler.

In application to LHC, it is quite obvious a new backend is necessary. I believe this compilation method is applicable and preferable. Combined with whole program analysis, and my plans for a small and simple as possible runtime, this makes the compiler easier to retarget, and I wish for LHC to be a cross-compiler as well. It's obvious C is not the way to go, so minimizing burden on retargeting seems like a good strategy. My inevitable plans are for you to be able to take a haskell application, use LHC to compile it and get nice ASM code for whatever architecture you wish.

The overall structure of the new backend I am thinking of is something like this (drawn very badly using Project Draw):

We will perform as many optimizations as possible on the whole-program while it is in GRIN form, leaving the backend to do the task of peephole optimization/DCE/etc. in a machine-independent way, and then emit assembly code.

On the note of the implementing an optimization engine for which to perform these operations, the approach described in An Applicative Control-flow Graph based on Huet's Zipper seems promising as well. Garbage collection as well needs definite addressing before we can get LHC to compile truly serious programs; this will take a little more time to think and talk about with David. Eventually, we may even look into using Joao Dias' thesis research to automatically generate compiler backends, like I believe GHC HQ wants to do.

Thursday, January 22, 2009

Typeclasses are working, now we're missing a bunch of instances...

Well, I finally figured out why the only two test cases that were working were HelloWorld and Kleisli. The compiler had been implicitly generating a lot of hard-wired instances for Int,Word,CInt, and all their numerous cousins -- but it was generating the methods too late (during conversion from HsSyn language to the E intermediate language) for the methods to be properly associated with their classes (which FrontEnd.Class does before typechecking even properly begins). Since none of us much liked the idea of having all this hardwired into the compiler, we decided that rather than try to adjust the machinery to work with the new handling of methods, we'd rather implement the instances in the library. So, that is what we have to do for every type in the following list:

Lhc.Prim.Int
Lhc.Basics.Integer
Data.Int.Int8
Data.Int.Int16
Data.Int.Int32
Data.Int.Int64
Data.Int.IntMax
Data.Int.IntPtr
Data.Word.Word
Data.Word.Word8
Data.Word.Word16
Data.Word.Word32
Data.Word.Word64
Data.Word.WordMax
Data.Word.WordPtr
Foreign.C.Types.CChar
Foreign.C.Types.CShort
Foreign.C.Types.CInt
Foreign.C.Types.CUInt
Foreign.C.Types.CSize
Foreign.C.Types.CWchar
Foreign.C.Types.CWint
Foreign.C.Types.CTime

So bear with us if this takes a while to iron out. We have managed to get mini-base to build and many of the tests to run now, though getArgs apparantly doesn't compile yet.

Monday, January 19, 2009

Functions in Haskell.

Function calls in Haskell are typically far more numerous than in more traditional languages. This is in part due to laziness. Being lazy means that functions in Haskell do as little as possible to return a result. So to get all the data you need, you often have to call the functions multiple times.

Consider the following snippet of code:


upto :: Int -> Int -> [Int]
upto from to
  = if from > to
    then []
    else from : upto (from+1) to

Here's what the compiled function would look like:


-- Arguments and results are (usually) kept in registers.
-- We generate a new calling convention for each function.
upto from to
  = case from `gtInt` to of
     1 -> do -- Return a single tag representing '[]'.
             return [CNil]
     0 -> do -- Allocate 5 heap cells.
             heap <- allocate 5
             -- CInt is the constructor for Int.
             heap[0] := CInt
             heap[1] := from
             -- Fupto represents a suspended call to 'upto'.
             heap[2] := Fupto
             heap[3] := from+1
             heap[4] := to
             -- Return a node as three separate pieces.
             -- &heap[0] is the head of the list and &heap[2] is the tail.
             return [CCons, &heap[0], &heap[2]]

As we can see, calling this function will only give us a single node (the node in this case is either a CNil or a CCons with two arguments). We will have to call it again to get more information out of it. For example, fully computing 'upto 1 10' requires 11 calls to 'upto' (10 CCons nodes and 1 CNil).

Looking at the steps in 'upto' shows us that it isn't doing a whole lot. All variables (even arguments and results) are in registers and the data can easily fit in the cache. We could almost say that calling this function is as fast as looping in C. Let's add a bit more code and see what happens:


main = showMyList (upto 1 10)
showMyList [] = return ()
showMyList (x:xs)
  = do print x
       showMyList xs

The same code, now compiled to our intermediate language:


main = do heap <- allocate 3
          heap[0] := Fupto
          heap[1] := 1
          heap[2] := 10
          showMyList &heap[0]

showMyList lst
  = do -- Read the arguments to 'upto' from the 'lst' pointer.
       Fupto from to <- fetch lst
       -- Call 'upto'. The results are kept in registers.
       -- 'a' and 'b' are undefined if 't' is CNil.
       [t, a, b] <- upto from to
       -- Inspect the node tag.
       case t of
         CNil -> return [CUnit]
         CCons -> do -- Call print on 'a'. This call might invoke the garbage collector.
                     print a
                     -- Recurse on the tail.
                     showMyList b

This looks rather well. We could be proud if this was the end. However, there is one thing that we haven't considered: garbage collection. The garbage collector may run when we call 'print' and if it does then the pointer we have in 'b' will no longer be valid.
A common solution is to push 'b' to a stack (which the GC can walk and modify) and reload it after 'print' has returned. However, a stack allocation cost about as much as the call to 'upto' and hence incurs an overhead of nearly 50%.

Fortunately there's a way around this. We can "simply" have the garbage collector update all registers that contain heap pointers. Doing so isn't exactly easy but it does allow us to keep pointers in registers and to avoid all unnecessary stack allocations.
The details of how to accomplish this will have to wait for another time.

Saturday, January 17, 2009

LLVM is great.

It seems that the LLVM crowd have mistaken my blog posts for criticism of LLVM. Let me make it clear that I have nothing but respect for LLVM.

I've previously mentioned that LLVM doesn't support zero-overhead garbage collection. Big deal. It's about the same as saying LLVM doesn't answer whether P=NP. I apparently failed in conveying that this is an unsolved problem in computer science.

Solving this problem isn't as simple as putting heap pointers in registers (although it is required). The real beef lies in determining which registers are heap pointers when it isn't known statically. Determining at run-time which registers are heap pointers is intimately tied to the data model of the high-level language. Doing this well in an agnostic way is an unsolved problem. (Note that pointer tagging is generally avoided).

Several people apparently took it personally when I mentioned writing yet another NCG. Let it be clear that I'd never rewrite any of LLVM nor wish to belittle the effort it takes to write a general compiler. I was merely talking about writing a non-optimizing translator from an extremely limited IR to machine code.

So:

LLVM is great. I do not wish to criticise any part of it.
What you guys have created is impressive. I do not wish to belittle your efforts.
LLVM is not a silver bullet. It does not solve all open questions in the academic world and no one expects it to.

Why LLVM probably won't replace C--.

It seems that I didn't do a good job at explaining why LHC won't be using LLVM. In this post I will elaborate on why some people think C-- has more promise than LLVM as a substrate for lazy, functional languages.

Let me start by making one thing clear: LLVM does have support for garbage collectors. I am not disputing that. However, as Henderson has shown, so does C and every other language. The question we have to ask is not "Does this environment support garbage collection?" but rather "How efficiently does this environment support garbage collection?".

To recap, Henderson's technique involves placing root pointers (the set of pointers which can be followed to find all live data) on a shadow stack. Since we manage this stack ourself, it shouldn't be a problem for the GC to walk it.
In short, each heap allocation incurs an unnecessary stack allocation and heap pointers are never stored in registers for long.

Now what does this mean for languages like Haskell? Well, unlike programs written in more traditional languages, a Haskell application might very well do between 10 and 20 million heap allocations per second.
Writing Haskell programs is more about producing the correct data stream than it is about performing the right side-effects. It's common for functions in Haskell to manipulate data without execuing any side-effects. (Think spreadsheets.)
This way of computing obviously requires a very cheap method of allocation. Performing 10 million unnecessary stack allocations per second would severely hurt performance, and not having heap pointers in registers could easily be equally devastating.

So what about LLVM? Shouldn't the built-in GC support in LLVM be more efficient than any cheap hack? Well, it turns out it isn't. The conflict between garbage collection and optimizations haven't changed, and neither have the solution: disabling or bypassing optimizations. This in turn means unnecessary stack allocations and sub-optimal use of registers.

That LLVM haven't solved the problem of zero-overhead garbage collection isn't too surprising. Solving this while staying agnostic of the data model is an open question in computer science.
It is here C-- differs from LLVM. C-- is a research project that aims at solving difficult problems such as supporting efficient GCs and cheap concurrency. LLVM, on the other hand, is an engineering project.

In conclusion: garbage collection in LLVM incurs unacceptable overhead, and while C-- and LLVM do have some overlap, the problems they're trying to solve are quite different.

Thursday, January 15, 2009

The case against C/LLVM.

Many people have pointed out that garbage collection can be implemented in a portable fashion in uncooperative environments such as C or LLVM. Specific details of how to do this can be found in Fergus Henderson's paper "Accurate garbage collection in an uncooperative environment".

Henderson's technique is simply to put each pointer to newly allocated data on the stack. This bypasses all concerns about which variables will be put in registers and which will be spilled to the stack. However, the overhead of never using registers and performing unnecessary stack allocations can be quite significant.

Since the goal of LHC is to generate efficient executables, shoehorning our executive model into a "hostile" environment like C or LLVM isn't an option. Writing a simplistic NCG won't be too much trouble, hopefully.

Wednesday, January 14, 2009

Typeclass Blues

Well, I've been having some trouble with the typeclass implementation. I got rid of the long-standing bug where ill-typed rules were generated in E.FromHs, and I've gotten FrontEnd.Class to generate rules for method implementations which appear in instance declarations, but I haven't figured out how to implicitly fall back to the default implementation. The one approach I actually tried was to generate method implementations like:

Lhc.Order./= = Instance@.iLhc.Order./=.default

which would quickly be rewritten to:

Instance@.iLhc.Order./=.Lhc.IO.IOError = Instance@.iLhc.Order./=.default

but unfortunately, the typechecker stated that Instance@.iLhc.Order./=.default was not in the type-checking environment, and I couldn't figure out why :-(.

Resources.

You can follow the LHC development using these resources:

Mailing list: http://projects.haskell.org/cgi-bin/mailman/listinfo/lhc
Wiki: http://lhc.seize.it/
Bug tracker: http://trac.haskell.org/lhc/report/1
IRC channel: irc.freenode.net#lhc-compiler

What is LHC?

It might be difficult to get a sense of how feature-complete LHC is just by looking at the wiki and the mailing list. In this post I will try to answer the common questions of what LHC supports and where LHC is going.

Let me start by introducing LHC itself. The LHC Haskell Compiler (LHC for short) was born out of JHC on a cold Tuesday morning, November 18th, 2008. Her birth was necessitated by a conflict of opinion between John Meacham (the author of JHC) and myself. We simply couldn't agree on which build system would be the most appropriate: make or cabal. Since then we've made many changes that are unlikely to be merged back into JHC.

As of this writing, LHC differs from JHC in the following ways:

Cabalized build system instead of Make.
Extensive use of Haskell libraries. We have six more dependencies than JHC.
We use 'derive' instead of the unwieldy 'DrIFT' package. This, among other things, makes it possible to generate HPC information.
We support both GHC-6.8.x and GHC-6.10.x.
We're available on Hackage.
We've eliminated some usage of unsafe IO. On a project of this magnitude, getting segfaults instead of type-errors isn't acceptable.
We're very liberal in granting commit access. It's easier to undo bad patches than to make people contribute.

Although we are not JHC, we are still similar enough to share many of the same disorders. Most importantly, we don't have support for exceptions, garbage collection or proper type-classes.

The issues with type-classes is purely due to bugs in our code, and hopefully we can resolve them without too much headache. It should be noted that since Monad is a type-class, there are very few programs that LHC can compile at this stage.

Exceptions and garbage collection are more fundamental problems, though. We simply cannot implement those feature efficiently in C. Using C-- is an obvious solution but unfortunately there are no usable C-- compilers. Using LLVM[1] is out of the question for the same reasons as C. We will probably end up writing our own native code generator.

All in all, our future plans are like this:

Disable features until we can correctly compile Haskell98 programs.
Extend our testsuite to cover most of the codebase.
Address performance issues. We are about 5-10 times slower than GHC and there are no fundamental reasons for this.
Write a code generator that supports exceptions and garbage collection.

[1]: On the surface it looks like LLVM has decent support for exceptions and garbage collecting. However, the documentation shows that it is essentially no better than in C.

Monday, January 12, 2009

The mess with variable ids.

Variable identification tags can contain four different types of information. In Haskell we would write it as such:


data Id = Empty            -- Unused binding. Eg: '\ _ -> ...'.
        | Etherial Int     -- Internal variable. Only used when type-checking.
        | Anonymous Int    -- Anonymous variable created by the compiler.
        | Named Name       -- Named variable created by the user.

However, in LHC this data structure was unrolled and packed into an Int. The encoding went as following:


  Empty        = zero
  Etherial     = negative numbers
  Anonymous    = even, positive numbers
  Named        = odd, positive numbers, used as keys in a global hash table.

This encoding gives us very fast operations on Sets and Maps but it also punishes mistakes with a vengeance. The increased performance is definitely not worth it and we've been working on untangling the Ids from day-1.

As of today, I'm glad to say that we've finally restored the beautiful ADT and we can now hack without fear of segfaulting.