Thursday, January 15, 2009

The case against C/LLVM.

Many people have pointed out that garbage collection can be implemented in a portable fashion in uncooperative environments such as C or LLVM. Specific details of how to do this can be found in Fergus Henderson's paper "Accurate garbage collection in an uncooperative environment".

Henderson's technique is simply to put each pointer to newly allocated data on the stack. This bypasses all concerns about which variables will be put in registers and which will be spilled to the stack. However, the overhead of never using registers and performing unnecessary stack allocations can be quite significant.

Since the goal of LHC is to generate efficient executables, shoehorning our executive model into a "hostile" environment like C or LLVM isn't an option. Writing a simplistic NCG won't be too much trouble, hopefully.

17 comments:

  1. I must admit that I know very little about JVM. However, I doubt that it is well suited for lazy, functional languages. We have some quite atypical requirements. Function calls and allocations must be fairly cheap to mention a few.

    ReplyDelete
  2. You don't really make much of any case against LLVM here. What makes LLVM an uncooperative environment as a target?

    ReplyDelete
  3. The approach in the paper sounds slightly over-naive though. For one, you only need to make sure your objects are rooted while calling a function that actually does allocation (and thus may GC) - keep the pointers in local variables when working on them and if you need to, update the pointer from the thing that keeps the roots. And if your GC can trace arbitrary objects just given a pointer to them - just keep the pointers in a stack of gc roots rather than maintain a linked list of frames.

    Also: what precisely is it that you would do in the NCG that you can't do with a C backend?

    ReplyDelete
  4. chanson:
    LLVM requires you to put GC roots on the stack (just like C does). This is unacceptable for a language like Haskell.

    olsner:
    Sure, it might be optimized slightly. It would still lead to horrible register usage, though. Especially since almost all Haskell functions allocate heap.

    By writing our own NCG we can create a GC system that incurs absolutely zero penalty on the mutator. The GC can reconstruct nodes from registers and the stack by looking at information attached at each function call site. This means that we only have to use the stack when we actually run out of registers. Unfortunately, this technique is specific to our memory model and is hence unlikely to be implemented in LLVM.

    ReplyDelete
  5. I posted a link to your article on the llvm-dev mailing list and it generated some interesting discussion:

    http://thread.gmane.org/gmane.comp.compilers.llvm.devel/17728

    ReplyDelete
  6. Chris Lattner of the LLVM team commented on this on the LLVM mailing list:

    http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-January/019460.html

    I do not understand the assertion that LLVM is uncooperative. The direction LLVM takes is driven entirely by contributors. I suggest you embrace this and implement the necessary GC support in LLVM. The devs would likely be happy to help out with any problems; the team is *very* helpful. Furthermore, that support would open the door to implementing other similar functional languages in LLVM, rather making more isolated code islands. In the long run, LHC will win *big* by having that same code used by others (and tested, and expanded.)

    There are many things for which it is reasonable to have NIH (Not Invented Here syndrome). In 2009, a fast code generator is not one of them.

    ReplyDelete
  7. I am not sure to buy the criticism.

    Assuming a copying collector (like I did in Qish or the MELT branch of GCC), what disallow you to grossly have the following allocation routine in pseudo code

    if (enough space in birth region) {
    increment store pointer
    use its previous value as the newly allocated stuff
    fill it appropriately
    don't bother with registers
    } else /* rare case, not enough space */
    {
    spill all live registers to known location in the current call frame
    call the garbage collector
    reload all live registers appropriately
    increment store pointer
    use its previous value as the newly allocated stuff
    fill it appropriately
    }

    The above pseudocode would be a template in the code generator, for every allocation primitive (eg Lisp's cons, etc...)

    Regards.

    Basile Starynkevitch http://starynkevitch.net/Basile/

    ReplyDelete
  8. 23Skidoo & Keir:
    It saddens me that you think I'm attacking or criticising LLVM. I'm really not. The problem I'm describing has not been solved yet. It's like infinite scalability, the halting problem or P=NP. (ok, it may be less difficult than those problem but it is still unsolved). I'm truly sorry that I didn't explain that properly.

    ReplyDelete
  9. Basile_S:
    Determining which registers contain root pointers is harder than it seems. Especially since it isn't known at compile-time.

    ReplyDelete
  10. David:

    Whether or not the problem is unsolved is not the issue. The issue is reinventing a code generator, which is not a trivial task. Why not extend LLVM for your needs, even if it means writing a bit of C++? If you use LLVM, you will get nice stuff like PowerPC, ARM, x86 and x86-64 support for free.

    If the problem is unsolved, then how is it any better to build your own native code generator? I find it hard to believe that the shortest path to a maintainable compiler is to invent your own code generator.

    ReplyDelete
  11. Keir:

    It's unsolved in the academic sense of the word. Solving it requires research and not engineering. If I knew how to solve it, I definitely would add it to LLVM.

    It's only unsolved in the general case. I doubt, however, that LLVM is interested in my specific data model (which is in a state of flux, even).

    ReplyDelete
  12. Disclaimer: I used to have the desk next to Fergus.

    Mercury's solution to this problem circa 1995 was quite clever. The virtual machine had a large number of "registers" (about 1024, IIRC; since gaining multithreading it's probably fewer), the top few of which are implemented using GCC's global register variables. You then just ensure that the more important a variable is, the lower the register number that it's stored in.

    ReplyDelete
  13. David Himmelstrup:
    LLVM requires you to put GC roots on the stack (just like C does). This is unacceptable for a language like Haskell.

    LLVM only requires you to put the live heap pointers on the stack when you're at a safe point. To mimic ghc you'd only use safe points at call returns (loops would be a good idea too).
    So I wonder why you think this is unacceptable for Haskell?

    You could very well be right, but you can't make such claims without empirical evidence.

    ReplyDelete
  14. augustss:

    It's obviously acceptable for Haskell. I apologies unreservedly if I said otherwise. It probably doesn't make much of a difference unless all function calls are known at compile-time.
    However, I do believe that making function calls as cheap as possible is important in LHC.

    I'm not making any bold claims. I'm merely pointing out that unnecessarily pushing data to the stack is suboptimal. This is true even in the absence of empirical data. Why everyone makes such a big fuss out of it goes over my head.

    ReplyDelete
  15. Pushing unnecessarily is suboptimal, no doubt about that. But if that's one extra push per 1000 instructions it doesn't matter, if it's one extra push per 10 it does.

    So then the question becomes, how easy is it to write a code generator that is as good as LLVM's when you take those extra pushes into account?

    It's hard to say without making the experiment. You should write a NCG if you want, but you should do it because you think it's fun, and not because you aim to do better than LLVM. :)

    ReplyDelete
  16. augustss:

    Let's make it interesting. If my NCG beats LLVM by the next ICFP, you buy dinner. If not, I buy dinner. (:

    We will use nofib as our benchmarking suite. The testing machine will be an AMD64 with sufficient memory.

    ReplyDelete