To support garbage collection, the key thing a language implementor has to do is to provide a way for the GC to find all live heap pointers (called root pointers). This sounds fairly easy to do but can get quite complicated in the presence of aggressive optimizations and register allocation. A tempting (and often used) solution would be to break encapsulation and make the optimizations aware of the GC requirements. This of course becomes harder the more advanced the optimizations are and with LHC it is pretty much impossible. Consider the following GRIN code:
-- 'otherFunction' returns an object of type 'Maybe Int' using two virtual registers.
-- If 'x' is 'Nothing' then 'y' is undefined.
-- If 'x' is 'Just' then 'y' is a root pointer.
= do x, y <- otherFunction; ....
The above function illustrates that it is not always straightforward to figure out if a variable contains a root pointer. Sometimes determining that requires looking at other variables.
So how might we get around this hurdle, you might ask. Well, if the code for marking roots resides in user-code instead of in the RTS then it can be as complex as it needs be. This fits well with the GRIN ideology of expressing an much in user-code as possible.
Now that we're familiar with the problem and the general concept of the solution, let's work out some of the details. Here's what happens when a GC event is triggered, described algorithmically:
- Save registers to memory.
This is to avoid clobbering the registers and to make them accessible from the GC code.
- Save stack pointer.
- Initiate temporary stack.
Local variables from the GC code will be placed on this stack.
- Jump to code for marking root pointers.
This will peel back each stack frame until the bottom of the call graph has been reached.
- Discard temporary stack.
- Restore stack pointer
- Restore registers.
Using the approach for exception involves stack cutting and a more advanced transfer of control which will be discussed in a later post.
In conclusion, these are the advantages native-code stack walking:
- Allows for objects to span registers as well as stack slots.
- Separates the concerns of the optimizer, the garbage collector and the code generator.
- Might be a little bit faster than dynamic stack walking since the stack layout is statically encoded.