Wednesday, January 14, 2009

What is LHC?

It might be difficult to get a sense of how feature-complete LHC is just by looking at the wiki and the mailing list. In this post I will try to answer the common questions of what LHC supports and where LHC is going.

Let me start by introducing LHC itself. The LHC Haskell Compiler (LHC for short) was born out of JHC on a cold Tuesday morning, November 18th, 2008. Her birth was necessitated by a conflict of opinion between John Meacham (the author of JHC) and myself. We simply couldn't agree on which build system would be the most appropriate: make or cabal. Since then we've made many changes that are unlikely to be merged back into JHC.

As of this writing, LHC differs from JHC in the following ways:
  • Cabalized build system instead of Make.
  • Extensive use of Haskell libraries. We have six more dependencies than JHC.
  • We use 'derive' instead of the unwieldy 'DrIFT' package. This, among other things, makes it possible to generate HPC information.
  • We support both GHC-6.8.x and GHC-6.10.x.
  • We're available on Hackage.
  • We've eliminated some usage of unsafe IO. On a project of this magnitude, getting segfaults instead of type-errors isn't acceptable.
  • We're very liberal in granting commit access. It's easier to undo bad patches than to make people contribute.
Although we are not JHC, we are still similar enough to share many of the same disorders. Most importantly, we don't have support for exceptions, garbage collection or proper type-classes.

The issues with type-classes is purely due to bugs in our code, and hopefully we can resolve them without too much headache. It should be noted that since Monad is a type-class, there are very few programs that LHC can compile at this stage.

Exceptions and garbage collection are more fundamental problems, though. We simply cannot implement those feature efficiently in C. Using C-- is an obvious solution but unfortunately there are no usable C-- compilers. Using LLVM[1] is out of the question for the same reasons as C. We will probably end up writing our own native code generator.

All in all, our future plans are like this:
  1. Disable features until we can correctly compile Haskell98 programs.
  2. Extend our testsuite to cover most of the codebase.
  3. Address performance issues. We are about 5-10 times slower than GHC and there are no fundamental reasons for this.
  4. Write a code generator that supports exceptions and garbage collection.

[1]: On the surface it looks like LLVM has decent support for exceptions and garbage collecting. However, the documentation shows that it is essentially no better than in C.


  1. By 'code generator', do you mean "native code" generator?

  2. I'm not sure what counts as "efficient" but I used C as the backend language for DDC. Garbage collection can be implemented by keeping a shadow stack containing the pointers / GC roots. Exceptions can be implemented by using setjmp/longjmp.

    I'm not sure how to do true tail calls in C, but you can do tail recursion just by adding a label at the top of the function and using goto in the recursive call.. depends on how you handle laziness though..

  3. Roman: Yes, I meant native code generator.

    Ben: Yeah, that's not efficient enough for us. We want to take full advantage of CPU registers and only spill arguments to the stack if absolutely necessary.

  4. What needs to be added to LLVM to make it a suitable target for LHC?

  5. 23Skidoo:
    We need to mark variable as GC roots and be able to modify them when garbage collecting. Adding all gc roots on the stack is not an option.

  6. If I understand correctly, this will make the IR not valid SSA.
    Anyway, I think you should post your suggestions on the llvm-dev mailing list, the people there will surely be interested.

  7. 23Skidoo: what will make the IR not SSA?

  8. 23Skidoo:
    I don't know what to suggest. This is a hard problem in computer science. If I knew how to go about solving it, I'd write a paper and collect my degree.

    Variables that point to heap objects may be changed by the garbage collector. This may or may not violate SSA since you shouldn't be able to tell the difference between the old address and the new. They still point to the same object so in some sense they haven't changed.