Consider the following snippet of code:
upto :: Int -> Int -> [Int]
upto from to
= if from > to
else from : upto (from+1) to
Here's what the compiled function would look like:
-- Arguments and results are (usually) kept in registers.
-- We generate a new calling convention for each function.
upto from to
= case from `gtInt` to of
1 -> do -- Return a single tag representing ''.
0 -> do -- Allocate 5 heap cells.
heap <- allocate 5
-- CInt is the constructor for Int.
heap := CInt
heap := from
-- Fupto represents a suspended call to 'upto'.
heap := Fupto
heap := from+1
heap := to
-- Return a node as three separate pieces.
-- &heap is the head of the list and &heap is the tail.
return [CCons, &heap, &heap]
As we can see, calling this function will only give us a single node (the node in this case is either a CNil or a CCons with two arguments). We will have to call it again to get more information out of it. For example, fully computing 'upto 1 10' requires 11 calls to 'upto' (10 CCons nodes and 1 CNil).
Looking at the steps in 'upto' shows us that it isn't doing a whole lot. All variables (even arguments and results) are in registers and the data can easily fit in the cache. We could almost say that calling this function is as fast as looping in C. Let's add a bit more code and see what happens:
main = showMyList (upto 1 10)
showMyList  = return ()
= do print x
The same code, now compiled to our intermediate language:
main = do heap <- allocate 3
heap := Fupto
heap := 1
heap := 10
= do -- Read the arguments to 'upto' from the 'lst' pointer.
Fupto from to <- fetch lst
-- Call 'upto'. The results are kept in registers.
-- 'a' and 'b' are undefined if 't' is CNil.
[t, a, b] <- upto from to
-- Inspect the node tag.
case t of
CNil -> return [CUnit]
CCons -> do -- Call print on 'a'. This call might invoke the garbage collector.
-- Recurse on the tail.
This looks rather well. We could be proud if this was the end. However, there is one thing that we haven't considered: garbage collection. The garbage collector may run when we call 'print' and if it does then the pointer we have in 'b' will no longer be valid.
A common solution is to push 'b' to a stack (which the GC can walk and modify) and reload it after 'print' has returned. However, a stack allocation cost about as much as the call to 'upto' and hence incurs an overhead of nearly 50%.
Fortunately there's a way around this. We can "simply" have the garbage collector update all registers that contain heap pointers. Doing so isn't exactly easy but it does allow us to keep pointers in registers and to avoid all unnecessary stack allocations.
The details of how to accomplish this will have to wait for another time.