Sunday, March 22, 2009

Ease of implementation.

Developing a usable compiler for a high-level language such as Haskell isn't a trivial thing to do. Any effort to trade developer time against CPU time is likely to be a wise choice. In this post I will outline a few attempts to deal with the complexity of LHC in high-level ways. Hopefully the end result won't be too slow.

Case short-circuiting.
Since case expressions in GRIN do not force the evaluation of the scrutinized value, they are usually preceded by a call to 'eval'. Then, after the 'eval' calls have been inlined, case-of-case patterns like this are very common:

do val <- case x of
[] -> unit []
CCons x xs -> unit (CCons x xs)
Ffunc a -> func a
case val of
[] -> jumpToNilCase
CCons x xs -> jumpToConsCase x xs

This is obviously inefficient since the case for Nil and Cons will be scrutinized twice. In the GRIN paper, Boquist deals with this by implementing a case short-circuiting optimization after the GRIN code has been translated to machine code. However, dealing with optimizations on the machine code level is quite a tricky thing to do and I'd much rather implement this optimization in GRIN. By making aggressive use of small functions we can do exactly that:

do case x of
[] -> jumpToNilCase
CCons x xs -> jumpToConsCase x xs
Ffunc a -> do val <- func a; checkCase val

checkCase val =
case val of
[] -> jumpToNilCase
CCons x xs -> jumpToConsCase x xs

Register allocation and tail calls.
Using a fixed calling convention is not necessary for whole-program compilers like LHC. Instead, we choose to create a new calling method for each procedure (this is easier than it sounds).
This has the obvious consequence of requiring the convention for return values to be identical for procedures that invoke each other with tail-calls. This was deemed an unacceptable restriction in the GRIN paper, and all tail-calls were subsequently removed before register allocation took place. Afterwards, another optimization step reintroduced tail-calls where possible.
I believe this is too much trouble for too little gain. The possible performance hit is out-weighed by the ease of implementation and the guarantee of tail-calls.

Simple node layout.
An unevaluated value is represented simply by a function name (or tag) and a fixed number of arguments. This value is then overwritten once it has been evaluated. However, the new value may be bigger than what was allocated to represent the unevaluated function.
One way to deal with this is to have two different node layouts: a fixed size node for small values, and a variable size node for big values. This is the approach taken in the GRIN paper and it understandably adds quite a bit of complexity.
Another method is to use indirections. This trades smaller average node size and ease of implementation against more heap allocations.