Tuesday, October 19, 2010

Rough organizational overview.

The exact details are constantly changing but here's a rough overview of the LHC pipeline.
  1. External Core.
    We've designed our compiler to use GHC as its frontend. This means that GHC will handle the parsing and type-checking of the Haskell code in addition to some of the optimization (GHC particularly excels at high-level local optimizations). LHC benefits greatly by automatically supporting many of the Haskell extensions offered by GHC.
    Notable characteristics: Non-strict, local functions, complex let-bindings. Pretty much just Haskell code with zero syntactic sugar.
    Example snippet:

    base:Data.Either.$fShowEither :: ghc-prim:GHC.Types.Int =
    ghc-prim:GHC.Types.I# (11::ghc-prim:GHC.Prim.Int#);

  2. Simple Core.
    Since External Core isn't immediately ready to be processed into GRIN code, we first translate it to Simple Core by removing or simplifying out a couple of features. The most noticeable feature of External Core is locally scoped functions which simply does not fit in with the GRIN model. When translating to Simple Core, we hoist out all local functions to the top-level.
    Notable characteristics: Non-strict, no local functions, simplified let-bindings.
  3. Grin Stage 1.
    Let me start by introducing GRIN: GRIN (Graph Reduction Intermediate Notation) is a first order, strict, (somewhat) functional language.
    The purpose of this first stage of grin code is to encode the laziness explicitly. It turns out that you can translate a lazy language (like Simple Core) to a strict language (like GRIN) using only two primitives: Eval and apply. The 'eval' primitives takes a closure, evaluates it if need be and returns the resulting object. The 'apply' primitives simply adds an argument to a closure. Haskell compilers such as GHC, JHC and UHC all use this model for implementing laziness.
    Notable characteristics: Strict, explicit laziness, opaque closures.
    Example snippet:

    base:Foreign.C.Types.@lifted_exp@ w ws =
    do x2508 <- @eval ws
    case x2508 of
    (Cbase:GHC.Int.I32# x#)
    -> do x2510 <- unit 11
    base:GHC.Show.$wshowSignedInt x2510 x# w

  4. Grin Stage 2.
    At the time of writing, each of the mentioned compilers stop at the previous stage (or at what would be their equivalent of that stage).[1] LHC follows in the footsteps of the original GRIN compiler and applies a global control-flow analysis to eliminate/inline all eval/apply primitives. In the end, a lazy/suspended function taking, say, two arguments simply becomes a data constructor with two fields.
    Notable characteristics: Strict, transparent closures.
    Example snippet:

    base:Foreign.Marshal.Utils.toBool1_caf =
    do [x2422] <- constant 0
    [x2423] <- @realWorld#
    [x2424 x2425] <- (foreign lhc_mp_from_int) x2422 x2423
    [x2426] <- constant Cinteger-gmp:GHC.Integer.Type.Integer
    unit [x2426 x2425]

  5. Grin Stage 3.
    Things are starting to get fairly low-level already at stage 2. However, stage 2 is still a bit too high-level for some optimizations to be easily implemented. Stage 3 breaks the code into smaller blocks that can easily be moved, inlined and short-circuited. The code is now sufficiently low-level that it can be pretty-printed as C.
    Notable characteristics: Functions are broken down to functional units. Otherwise same as stage 2.
    Example snippet:

    base:GHC.IO.Encoding.Iconv.@lifted@_lvl60swYU38 rb3 rb4 =
    do [x21578] <- @-# rb4 rb3
    case x21578 of
    0 -> constant Cghc-prim:GHC.Bool.False
    () -> constant Cghc-prim:GHC.Bool.True

  6. Grin--.
    Grin-- is the latest addition to the heap and not much is known about it for certain. It is even up for debate whether it belongs to the GRIN family at all since it diverge from the SSA style.
    The purpose of Grin-- is to provide a vessel for expressing stack operations.
    Notable characteristics: Operates on global virtual registers, enables explicit stack management.
    Example snippet:

    do x21578 := -# rb4 rb3
    case x21578 of
    0 -> do x88175 := Cghc-prim:GHC.Bool.False
    () -> do x88175 := Cghc-prim:GHC.Bool.True

Feel free to ask if you have any questions on the how and why of LHC.

[1] UHC does have the mechanics for lowering the eval/apply primitives but it is not enabled by default.

Saturday, October 16, 2010

Accurate garbage collection.

So, let's talk about garbage collection. Garbage collection is a very interesting topic because it is exceedingly simple in theory but very difficult in practice.

To support garbage collection, the key thing a language implementor has to do is to provide a way for the GC to find all live heap pointers (called root pointers). This sounds fairly easy to do but can get quite complicated in the presence of aggressive optimizations and register allocation. A tempting (and often used) solution would be to break encapsulation and make the optimizations aware of the GC requirements. This of course becomes harder the more advanced the optimizations are and with LHC it is pretty much impossible. Consider the following GRIN code:

-- 'otherFunction' returns an object of type 'Maybe Int' using two virtual registers.
-- If 'x' is 'Nothing' then 'y' is undefined.
-- If 'x' is 'Just' then 'y' is a root pointer.
= do x, y <- otherFunction; ....

The above function illustrates that it is not always straightforward to figure out if a variable contains a root pointer. Sometimes determining that requires looking at other variables.

So how might we get around this hurdle, you might ask. Well, if the code for marking roots resides in user-code instead of in the RTS then it can be as complex as it needs be. This fits well with the GRIN ideology of expressing an much in user-code as possible.

Now that we're familiar with the problem and the general concept of the solution, let's work out some of the details. Here's what happens when a GC event is triggered, described algorithmically:
  1. Save registers to memory.
    This is to avoid clobbering the registers and to make them accessible from the GC code.
  2. Save stack pointer.
  3. Initiate temporary stack.
    Local variables from the GC code will be placed on this stack.
  4. Jump to code for marking root pointers.
    This will peel back each stack frame until the bottom of the call graph has been reached.
  5. Discard temporary stack.
  6. Restore stack pointer
  7. Restore registers.
Using the approach for exception involves stack cutting and a more advanced transfer of control which will be discussed in a later post.

In conclusion, these are the advantages native-code stack walking:
  • Allows for objects to span registers as well as stack slots.
  • Separates the concerns of the optimizer, the garbage collector and the code generator.
  • Might be a little bit faster than dynamic stack walking since the stack layout is statically encoded.

A few updates.

Not much has been put up on this blog lately but work is still going on under the hood. The most significant changes in the pipeline are proper tail-calls and a copying garbage collector.

As it stands now, LHC uses the C stack extensively but this is obviously not ideal as it makes garbage collection, exceptions and tail-calls nigh impossible to implement. Since the ideal solution of using a third party target language isn't available (neither LLVM or C-- supports arbitrary object models), I've decided to slowly inch closer to a native code generator for LHC. It is fortunate that I find Joao Dias' dissertation nearly as interesting as the GRIN paper.

The first step would be to make the stack layout explicit in the GRIN code. This is necessary but not sufficient for tail-calls (some register coalescing is also required. More on this later). More importantly, accurate garbage collection now becomes a possibility. The way I want to implement garbage collection (and exceptions for that matter) is through alternative return points. This is one of three methods discussed in a C-- paper by Norman Ramsey and Simon Peyton Jones for implementing exceptions. I believe this method is versatile enough for garbage collection as well.

The concept revolves around using specialized code at each call site that knows enough about the stack layout to mark root pointers and to jump to the next stack frame. I will describe the details in another blog post. An interesting point is that the garbage collectors could be written in user-code instead of in the RTS.

So, to recap: Accurate garbage collection is just around the corner and proper tail-calls will follow in its heels. These two missing features are the reason that so many of the benchmarks fail to run for LHC.

Sunday, July 25, 2010

The Great Haskell Compiler shootout

David has been working very hard recently on improving LHC and the nobench benchmark suite, in order to actively benchmark LHC against several other haskell compilers: GHC, JHC and UHC.

I figured I'd pop in and announce the benchmarks now that the latest HEAD version of LHC can compile many of them and we can get some meaningful numbers. You can see the results here. Note there are two columns for JHC: one using john's new garbage collector (via `-fjgc`) and one without, because naturally it tends to affect performance characteristics quite a bit. Some of the numbers are quite interesting: in particular, JHC is nearly 400x faster(!) than GHC on the atom benchmark, and UHC dominates the calendar benchmark currently (we haven't investigated why GHC isn't winning here.) There are also some interesting variations in JHC, where the garbage collector wins in some cases, loses in others, but makes the difference in some between actually running and running out of memory (as you would expect.)

Perhaps somewhat regrettably, LHC loses in every benchmark. But we're only just beginning to implement real optimizations now that things are somewhat stabilized and the interface isn't arcane - prior to this we had very few optimizations outside of dead code elimination, simplification and a bit of inlining. David's begun implementing some more exotic optimizations recently, so it'll be interesting to see the results soon.

Now we'll be able to more actively see the progress of optimizing haskell compilers. Should be fun!

Friday, June 11, 2010

Mirroring Boquists' GRIN papers

Urban Boquists PhD thesis, Code Optimization Techniques for Lazy Functional Languages, and its accompanying website have gone offline. Because these papers describe GRIN, the intermediate language we use in LHC, I have decided to mirror Boquist's 3 GRIN-related papers on my own server here.

If someone inside Chalmers could explain where his papers moved and if we could get them back online, that would be great!

Saturday, May 29, 2010

The new interface

After hacking away for a little bit, I've finally gotten the new user interface for LHC working!

a ~/code/lhc/test $ lhc --help
The LHC Haskell Compiler, v0.11, (C) 2009-2010 David Himmelstrup, Austin Seipp

lhc [FLAG] [FILE]
Compile Haskell code

-? --help[=FORMAT] Show usage information (optional format)
-V --version Show version information
-v --verbose Higher verbosity
-q --quiet Lower verbosity
--llvm Use LLVM backend
--ghc-opts=VALUE Give GHC frontend options
-i --install-library Don't compile; install modules under a library
-b --build-library Used when compiling a library (cabal only)
-O =VALUE Set optimization level (default=1)
--numeric-version Raw numeric version output
--supported-languages List supported LANGUAGE pragmas
-c Do not link, only compile
-o =VALUE output file for binary (default=a.out)
--src-dir=VALUE source code directory
a ~/code/lhc/test $ lhc HelloWorld.hs
[1 of 1] Compiling Main ( HelloWorld.hs, HelloWorld.o )
Found fixpoint in 7 iterations.
Lowering apply primitives... done in 0.09s
Heap points-to analysis... ...........................done in 0.95s
HPT fixpoint found in 27 iterations.
Found fixpoint in 11 iterations.
Compiling C code... done in 0.11s
a ~/code/lhc/test $ ./HelloWorld
Hello, world!
a ~/code/lhc/test $

The changes should be landing shortly. It will require a patch to Cabal. There is also a bug in cabal/cmdargs that I have not yet tracked down which makes installing cabal packages difficult, although still possible, with this new scheme.

Edit 6-2-2010: all of the necessary patches have been pushed to both LHC and Cabal to make the new user interface work. Try it out (install using 'cabal install -fwith-libs' provided you have the darcs HEAD version of Cabal,) and tell us of any corner cases on IRC (#lhc-compiler on freenode)!

Thursday, May 27, 2010

A new user interface for LHC

The current user interface for LHC is pretty unwieldy - it requires you to invoke lhc twice: once to generate an external core file, and another to generate the executable with LHC itself.

There are a couple of problems with this:
  1. It requires -you- to keep track of the generated .hcr files, which is a PITA.
  2. It makes the test suite complicated, as we currently our own regression tool to handle things like #1. I would like to use Simon Michael's excellent shelltestrunner library, but the two-step compilation process would make the test files nastier than they would be, and it so we currently maintain our own regression tool.
  3. It made some of LHC's code very gross: we basically copied GHC's "Main.hs" file and stuck it in our source tree with some modifications, because we need to be able to accept all GHC options, even "insert arbitrary ghc option here" (for general usage, and cabal install support.) This was - as you could guess, incredibly fragile in terms of maintenance and fowards/backwards compatibility.
So now I've devised a new approach. We will instead run GHC in the background, twice: the first, we will call GHC to compile your code with your provided options, and we will generally always stick something like '--make -fext-core -c' onto your command line to generate external core. The second time, we will call GHC again, but instead we will call ghc with the '-M' command line flag. This flag calls GHC to generate a Makefile that describes the dependency information between modules. Running it on Tom Hawkin's atom project, you get something like this:

# DO NOT DELETE: Beginning of Haskell dependencies
Language/Atom/Expressions.o : Language/Atom/Expressions.hs
Language/Atom/Elaboration.o : Language/Atom/Elaboration.hs
Language/Atom/Elaboration.o : Language/Atom/Expressions.hi
Language/Atom/Analysis.o : Language/Atom/Analysis.hs
Language/Atom/Analysis.o : Language/Atom/Expressions.hi
Language/Atom/Analysis.o : Language/Atom/Elaboration.hi
Language/Atom/Scheduling.o : Language/Atom/Scheduling.hs
Language/Atom/Scheduling.o : Language/Atom/Elaboration.hi
Language/Atom/Scheduling.o : Language/Atom/Analysis.hi
Language/Atom/Language.o : Language/Atom/Language.hs
Language/Atom/Language.o : Language/Atom/Expressions.hi
Language/Atom/Language.o : Language/Atom/Elaboration.hi
Language/Atom/Language.o : Language/Atom/Elaboration.hi
Language/Atom/Common.o : Language/Atom/Common.hs
Language/Atom/Common.o : Language/Atom/Language.hi
Language/Atom/Code.o : Language/Atom/Code.hs
Language/Atom/Code.o : Language/Atom/Scheduling.hi
Language/Atom/Code.o : Language/Atom/Expressions.hi
Language/Atom/Code.o : Language/Atom/Elaboration.hi
Language/Atom/Code.o : Language/Atom/Analysis.hi
Language/Atom/Compile.o : Language/Atom/Compile.hs
Language/Atom/Compile.o : Language/Atom/Language.hi
Language/Atom/Compile.o : Language/Atom/Elaboration.hi
Language/Atom/Compile.o : Language/Atom/Scheduling.hi
Language/Atom/Compile.o : Language/Atom/Code.hi
Language/Atom.o : Language/Atom.hs
Language/Atom.o : Language/Atom/Language.hi
Language/Atom.o : Language/Atom/Common.hi
Language/Atom.o : Language/Atom/Compile.hi
Language/Atom.o : Language/Atom/Code.hi
# DO NOT DELETE: End of Haskell dependencies

This tells us the location of where all the generated object files are. GHC will put external core files next to these other object files (in all cases, as you cannot redirect the output location of external core files.) So we can just parse this simple Makefile, remove duplicates, and substitute '.o' files for '.hcr' files. LHC takes care of the rest.

This is of course in the event you want to compile an executable. If you want to compile a library, it's mostly the same, only when we parse the files we just store them for later.

But what about "obscure ghc option"? No fear! We'll just provide something like a --ghc-options flag which will get passed onto GHC's invocations. LHC can then have it's own, more general command line interface to control various options in the whole-program stages (on this note, Neil Mitchell's cmdargs library is amazing for this stuff!)

For default options to GHC, I think we should perhaps stick to the Haskell 2010 standard - that is, by default, LHC will run GHC with language options to enable compilation of compliant Haskell 2010 code without any OPTIONS_GHC or LANGUAGE pragmas. Optimization levels for GHC can be implied by LHC's supplied optimization level or explicitly via --ghc-options.

Comments are always welcome.

Monday, May 24, 2010

Limited release.

This release of lhc-0.10 marks the move to GHC-6.12 and hopefully a more stable build infrastructure. As it stands, lhc-0.10 still lacks support for several important features, such as floating point values and large parts of the FFI.

To install LHC you need the development versions of Cabal and cabal-install. They can be fetched from these darcs repositories:

darcs get --lazy http://darcs.haskell.org/cabal
darcs get --lazy http://darcs.haskell.org/cabal-install

Once you've installed both Cabal and cabal-install, lhc-0.10 can be installed with the following command:

cabal install lhc-0.10

Here's how to use LHC once it has been successfully installed:

lhc -c SourceFile.hs # This compiles SourceFile.hs to SourceFile.hcr
lhc compile SourceFile.hcr # This compiles SourceFile.hcr to the executable SourceFile.

Happy Hacking.

Tuesday, May 18, 2010

Laziness and polymorphism.

This may be obvious to some but I truly didn't grok the relationship between laziness and polymorphism before I started work on LHC.

The Haskell language has two very distinguishing features: Laziness and parametric polymorphism. At a glance, these two features may not seem to have that much in common. However, laziness can be seen as a form of implicit polymorphism (and it tends to be implemented as such). Consider the function with the following type signature:

f :: Integer -> Integer

One could say this function is polymorphic in the first argument: The argument can either be an actual Integer or it can be something that evaluates to an Integer. When we look at laziness as a form of polymorphism, it becomes clear that eliminating polymorphism will also eliminate laziness.
This is largely irrelevant for the average Jane Doe hacker. But if you're working on optimizations aimed at improving the time or space characteristics by eliminating "unwanted" polymorphism, it becomes important to keep laziness in mind. The hint here is for adaptive containers.

Well, to make a short story even shorter: Laziness and polymorphism are different sides of the same coin. If you optimize away polymorphism, you will (perhaps inadvertently) also squash laziness.

All this is obvious in retrospect but I didn't get it until it was right in front of me.