Notes on the LHC: May 2010

Saturday, May 29, 2010

The new interface

After hacking away for a little bit, I've finally gotten the new user interface for LHC working!


a ~/code/lhc/test $ lhc --help
The LHC Haskell Compiler, v0.11, (C) 2009-2010 David Himmelstrup, Austin Seipp

lhc [FLAG] [FILE]
Compile Haskell code

-? --help[=FORMAT]        Show usage information (optional format)
-V --version              Show version information
-v --verbose              Higher verbosity
-q --quiet                Lower verbosity
--llvm                 Use LLVM backend
--ghc-opts=VALUE       Give GHC frontend options
-i --install-library      Don't compile; install modules under a library
-b --build-library        Used when compiling a library (cabal only)
-O =VALUE                 Set optimization level (default=1)
--numeric-version      Raw numeric version output
--supported-languages  List supported LANGUAGE pragmas
-c                        Do not link, only compile
-o =VALUE                 output file for binary (default=a.out)
--src-dir=VALUE        source code directory
a ~/code/lhc/test $ lhc HelloWorld.hs
[1 of 1] Compiling Main             ( HelloWorld.hs, HelloWorld.o )
.....................
Found fixpoint in 7 iterations.
Lowering apply primitives...            done in   0.09s
Heap points-to analysis...              ...........................done in   0.95s
HPT fixpoint found in 27 iterations.
..................................................................................
Found fixpoint in 11 iterations.
Compiling C code...                     done in   0.11s
a ~/code/lhc/test $ ./HelloWorld
Hello, world!
a ~/code/lhc/test $

The changes should be landing shortly. It will require a patch to Cabal. There is also a bug in cabal/cmdargs that I have not yet tracked down which makes installing cabal packages difficult, although still possible, with this new scheme.

Edit 6-2-2010: all of the necessary patches have been pushed to both LHC and Cabal to make the new user interface work. Try it out (install using 'cabal install -fwith-libs' provided you have the darcs HEAD version of Cabal,) and tell us of any corner cases on IRC (#lhc-compiler on freenode)!

Thursday, May 27, 2010

A new user interface for LHC

The current user interface for LHC is pretty unwieldy - it requires you to invoke lhc twice: once to generate an external core file, and another to generate the executable with LHC itself.

There are a couple of problems with this:

It requires -you- to keep track of the generated .hcr files, which is a PITA.
It makes the test suite complicated, as we currently our own regression tool to handle things like #1. I would like to use Simon Michael's excellent shelltestrunner library, but the two-step compilation process would make the test files nastier than they would be, and it so we currently maintain our own regression tool.
It made some of LHC's code very gross: we basically copied GHC's "Main.hs" file and stuck it in our source tree with some modifications, because we need to be able to accept all GHC options, even "insert arbitrary ghc option here" (for general usage, and cabal install support.) This was - as you could guess, incredibly fragile in terms of maintenance and fowards/backwards compatibility.

So now I've devised a new approach. We will instead run GHC in the background, twice: the first, we will call GHC to compile your code with your provided options, and we will generally always stick something like '--make -fext-core -c' onto your command line to generate external core. The second time, we will call GHC again, but instead we will call ghc with the '-M' command line flag. This flag calls GHC to generate a Makefile that describes the dependency information between modules. Running it on Tom Hawkin's atom project, you get something like this:

# DO NOT DELETE: Beginning of Haskell dependencies
Language/Atom/Expressions.o : Language/Atom/Expressions.hs
Language/Atom/Elaboration.o : Language/Atom/Elaboration.hs
Language/Atom/Elaboration.o : Language/Atom/Expressions.hi
Language/Atom/Analysis.o : Language/Atom/Analysis.hs
Language/Atom/Analysis.o : Language/Atom/Expressions.hi
Language/Atom/Analysis.o : Language/Atom/Elaboration.hi
Language/Atom/Scheduling.o : Language/Atom/Scheduling.hs
Language/Atom/Scheduling.o : Language/Atom/Elaboration.hi
Language/Atom/Scheduling.o : Language/Atom/Analysis.hi
Language/Atom/Language.o : Language/Atom/Language.hs
Language/Atom/Language.o : Language/Atom/Expressions.hi
Language/Atom/Language.o : Language/Atom/Elaboration.hi
Language/Atom/Language.o : Language/Atom/Elaboration.hi
Language/Atom/Common.o : Language/Atom/Common.hs
Language/Atom/Common.o : Language/Atom/Language.hi
Language/Atom/Code.o : Language/Atom/Code.hs
Language/Atom/Code.o : Language/Atom/Scheduling.hi
Language/Atom/Code.o : Language/Atom/Expressions.hi
Language/Atom/Code.o : Language/Atom/Elaboration.hi
Language/Atom/Code.o : Language/Atom/Analysis.hi
Language/Atom/Compile.o : Language/Atom/Compile.hs
Language/Atom/Compile.o : Language/Atom/Language.hi
Language/Atom/Compile.o : Language/Atom/Elaboration.hi
Language/Atom/Compile.o : Language/Atom/Scheduling.hi
Language/Atom/Compile.o : Language/Atom/Code.hi
Language/Atom.o : Language/Atom.hs
Language/Atom.o : Language/Atom/Language.hi
Language/Atom.o : Language/Atom/Common.hi
Language/Atom.o : Language/Atom/Compile.hi
Language/Atom.o : Language/Atom/Code.hi
# DO NOT DELETE: End of Haskell dependencies

This tells us the location of where all the generated object files are. GHC will put external core files next to these other object files (in all cases, as you cannot redirect the output location of external core files.) So we can just parse this simple Makefile, remove duplicates, and substitute '.o' files for '.hcr' files. LHC takes care of the rest.

This is of course in the event you want to compile an executable. If you want to compile a library, it's mostly the same, only when we parse the files we just store them for later.

But what about "obscure ghc option"? No fear! We'll just provide something like a --ghc-options flag which will get passed onto GHC's invocations. LHC can then have it's own, more general command line interface to control various options in the whole-program stages (on this note, Neil Mitchell's cmdargs library is amazing for this stuff!)

For default options to GHC, I think we should perhaps stick to the Haskell 2010 standard - that is, by default, LHC will run GHC with language options to enable compilation of compliant Haskell 2010 code without any OPTIONS_GHC or LANGUAGE pragmas. Optimization levels for GHC can be implied by LHC's supplied optimization level or explicitly via --ghc-options.

Comments are always welcome.

Monday, May 24, 2010

Limited release.

This release of lhc-0.10 marks the move to GHC-6.12 and hopefully a more stable build infrastructure. As it stands, lhc-0.10 still lacks support for several important features, such as floating point values and large parts of the FFI.

To install LHC you need the development versions of Cabal and cabal-install. They can be fetched from these darcs repositories:


  darcs get --lazy http://darcs.haskell.org/cabal
  darcs get --lazy http://darcs.haskell.org/cabal-install

Once you've installed both Cabal and cabal-install, lhc-0.10 can be installed with the following command:


  cabal install lhc-0.10

Here's how to use LHC once it has been successfully installed:


  lhc -c SourceFile.hs        # This compiles SourceFile.hs to SourceFile.hcr
  lhc compile SourceFile.hcr  # This compiles SourceFile.hcr to the executable SourceFile.
  ./SourceFile

Happy Hacking.

Tuesday, May 18, 2010

Laziness and polymorphism.

This may be obvious to some but I truly didn't grok the relationship between laziness and polymorphism before I started work on LHC.

The Haskell language has two very distinguishing features: Laziness and parametric polymorphism. At a glance, these two features may not seem to have that much in common. However, laziness can be seen as a form of implicit polymorphism (and it tends to be implemented as such). Consider the function with the following type signature:

f :: Integer -> Integer

One could say this function is polymorphic in the first argument: The argument can either be an actual Integer or it can be something that evaluates to an Integer. When we look at laziness as a form of polymorphism, it becomes clear that eliminating polymorphism will also eliminate laziness.
This is largely irrelevant for the average Jane Doe hacker. But if you're working on optimizations aimed at improving the time or space characteristics by eliminating "unwanted" polymorphism, it becomes important to keep laziness in mind. The hint here is for adaptive containers.

Well, to make a short story even shorter: Laziness and polymorphism are different sides of the same coin. If you optimize away polymorphism, you will (perhaps inadvertently) also squash laziness.

All this is obvious in retrospect but I didn't get it until it was right in front of me.