Notes on the LHC

Saturday, September 10, 2016

Haskell Suite: Type inference.

Disclaimer: I am not an academic. The following post is akin to a plumber's guide to brain surgery.

Type inference is complex and has evolved over time. In this post, I will try to explain how I see the landscape and where LHC and other Haskell compilers fit into this landscape.

The beginning: The Hindley–Milner type system. 1982.
The typing rules here are quite simple and every Haskeller seem to learn them intuitively. They include things like: if 'f :: A → B' and 'a :: A' then 'f a :: B'.
In this system, types are always inferred and there must always be a single, most general type for every expression. This becomes a problem when we want higher ranked types because here a single, most general type cannot be inferred. There may be many equally valid type solutions and it has to be up to the programmer to select the appropriate one. But this cannot happen in plain HM as type signatures are only used to make inferred types less general (eg. [a] was inferred but the programmer wanted [Int]).
Omitting the type signature in the following code can show us what plain HM would be like:
In GHC, the snippet will run fine with the type signature but not without it.

Version two: Bidirectional type system. 2000.
People realised that checking the correctness of a given type is much easier than inferring a correct type. Armed with this knowledge, a new type checker was born that had two modes usually called 'up' and 'down'. The 'up' mode lifts a new correct type up from an expression and the 'down' mode that checks the correctness of a type. Because of these two modes, this kind of system was called bidirectional and it deals with higher ranked types quite well.
LHC current implements this.

Version three: Boxy types. 2006.
At this point it had become apparent that higher ranked types didn't really play well with higher order functions. People often found themselves in situations where slight, seemingly innocent changes caused the type-checker to reject their programs. An example of this can be seen in this gist:
Impredicative polymorphism is required for the above code and boxy types is a stab in that direction. Bidirectional type checking was a big improvement over plain HM but it lacked granularity. Types are either 100% inferred or 100% checked with no middle ground. What if you wanted to check parts of a type and infer the rest? Well, boxy types solves exactly that problem. Boxes are added (internally, we're not making changes to Haskell here) to types and they signify an unknown that should be inferred. Now parts of types can be checked while the boxes are inferred and we're left with the best of both worlds. This is what JHC implements, btw. Boxy types was also implemented in GHC but was deemed to be too complicated.

Version four: FPH, First-class Polymorphism for Haskell. 2008.
Impredicative polymorphism, second attempt from the authors of boxy types. Improvements were made but the problem is still not solved.

Version five: OutsideIn(X). 2011.
GHC is a hotbed for experimentation in type checkers. GADTs, multi-parameter type classes, type families. These are just some of the features that makes the type-checker the largest and most complicated component of GHC. To deal with all of this, researchers came up with OutsideIn, described in a paper longer than all the previous papers put together. The algorithm is relatively simple, but, for practical reasons, implementations must reject some programs that are valid according to the specification.

Friday, September 9, 2016

Haskell Suite: Scoping.

This post answers why I created 'haskell-scope' even though there's already another library that addresses the same problem.

There are two libraries for resolving references in Haskell source code on Hackage: haskell-names and haskell-scope. Of the two, haskell-names is the oldest, the most feature complete, and the most ambitious. It uses a very innovative scheme that allows the scope to be inspected at any point in the syntax tree. You can read more about it in the linked article. Unfortunately, all this innovation comes at a price of complexity.

Here's the complete list of extensions used by haskell-names: CPP, ConstraintKinds, DefaultSignatures, DeriveDataTypeable, DeriveFoldable, DeriveFunctor, DeriveTraversable, FlexibleContexts, FlexibleInstances, FunctionalDependencies, GADTs, GeneralizedNewtypeDeriving, ImplicitParams, KindSignatures, MultiParamTypeClasses, NamedFieldPuns, OverlappingInstances, OverloadedStrings, RankNTypes, ScopedTypeVariables, StandaloneDeriving, TemplateHaskell, TupleSections, TypeFamilies, TypeOperators, UndecidableInstances, and ViewPatterns.

A total of 27 extensions and many of them will never be implemented by LHC. If LHC is to compile itself one day, this obviously won't do. Enter haskell-scope: a library more plain than bread without butter. Give it an AST and it will annotate all the references. Nothing more, nothing less.

Monday, December 29, 2014

Nursery sizes.

Intel i5-3210M cpu, 3072 KB L3 cache. Not sure why the CPU stalls with the tiny nurseries.

Friday, December 12, 2014

Test suite for Haskell2010

To keep track of progress and to ward off regressions, the test suite now have a section for Haskell2010 compatibility checks:

# runhaskell Main.hs -t Haskell2010 --plain | tail -n 4
         Test Cases  Total
 Passed  0           0
 Failed  6           6
 Total   6           6

The tests only cover a small part of the Haskell2010 specification and none of them pass yet.

Thursday, December 4, 2014

Compiling to JavaScript.

Lots of very interesting things are possible when everything (including the runtime system) is translated to LLVM IR. For example, compiling to JavaScript becomes trivial. Consider this ugly version of Hello World:

{-# LANGUAGE MagicHash #-}
module Main (main) where

import LHC.Prim

putStrLn :: List Char -> IO Unit
putStrLn msg = putStr msg `thenIO` putStr (unpackString# "\n"#)

main :: IO Unit
main = putStrLn (unpackString# "Hello World!"#)

entrypoint :: Unit
entrypoint = unsafePerformIO main

Notice the 'List' and 'Unit' types, and the 'thenIO' and 'unpackString#' functions. There's no syntactic sugar in LHC yet. You can get everything sugar-free these days, even Haskell compilers.

Running the code through the LLVM dynamic compiler gives us the expected output:

# lli Hello.ll
Hello World!

Neato, we have a complete Haskell application as a single LLVM file. Now we can compile it to JavaScript without having to worry about the garbage collector or the RTS; Everything has been packed away in this self-contained file.

$ emcc -O2 Hello.ll -o Hello.js # Compile to JavaScript using
                                # emscripten.
$ node Hello.js                 # Run our code with NodeJS.
Hello World!

$ ls -lh Hello.js               # JavaScript isn't known to be
                                # terse but we're still smaller
                                # than HelloWorld compiled with GHC.
-rw-r--r--  1 lemmih  staff   177K Dec  4 23:33 Hello.js

Friday, November 28, 2014

The New LHC.

What is LHC?

The LLVM Haskell Compiler (LHC) is a newly reborn project to build a working Haskell2010 compiler out of reusable blocks. The umbrella organisation for these blocks is the haskell-suite. The hope is that with enough code reuse, even the daunting task of writing a Haskell compiler becomes manageable.

Has it always been like that?

No, LHC got started as a fork of the JHC compiler. A bit later, LHC was reimagined as a backend to the GHC compiler.

Can LHC compile my code?

LHC can only compile very simple programs for now. Stay tuned, though.

Where's development going next?

Better support for Haskell2010.
Reusable libraries for name resolution and type-checking.
Human-readable compiler output. With LLVM, optimisations are less important. We instead focus on generating pretty code.

Tuesday, November 25, 2014

Very minimal Hello World.

The LLVM Haskell Compiler finally coming together. From Haskell parser to name resolution to type checker to desugarer to LLVM backend to GC. Everything is held together with duct tape but it feels great to finally compile and run Hello World.

# cat Hello.hs
{-# LANGUAGE MagicHash #-}
module Main (main) where

import LHC.Prim

main :: IO Unit
main =
  puts "Hello Haskell!"# `thenIO`
  return Unit

entrypoint :: Unit
entrypoint = unsafePerformIO main

Compiling the above file yields a single LLVM program, containing user code and the RTS.

# lli Hello.ll
Hello Haskell!