It seems that I didn't do a good job at explaining why LHC won't be using LLVM. In this post I will elaborate on why some people think C-- has more promise than LLVM as a substrate for lazy, functional languages.
Let me start by making one thing clear: LLVM does have support for garbage collectors. I am not disputing that. However, as Henderson has shown, so does C and every other language. The question we have to ask is not "Does this environment support garbage collection?" but rather "How efficiently does this environment support garbage collection?".
To recap, Henderson's technique involves placing root pointers (the set of pointers which can be followed to find all live data) on a shadow stack. Since we manage this stack ourself, it shouldn't be a problem for the GC to walk it.
In short, each heap allocation incurs an unnecessary stack allocation and heap pointers are never stored in registers for long.
Now what does this mean for languages like Haskell? Well, unlike programs written in more traditional languages, a Haskell application might very well do between 10 and 20 million heap allocations per second.
Writing Haskell programs is more about producing the correct data stream than it is about performing the right side-effects. It's common for functions in Haskell to manipulate data without execuing any side-effects. (Think spreadsheets.)
This way of computing obviously requires a very cheap method of allocation. Performing 10 million unnecessary stack allocations per second would severely hurt performance, and not having heap pointers in registers could easily be equally devastating.
So what about LLVM? Shouldn't the built-in GC support in LLVM be more efficient than any cheap hack? Well, it turns out it isn't. The conflict between garbage collection and optimizations haven't changed, and neither have the solution: disabling or bypassing optimizations. This in turn means unnecessary stack allocations and sub-optimal use of registers.
That LLVM haven't solved the problem of zero-overhead garbage collection isn't too surprising. Solving this while staying agnostic of the data model is an open question in computer science.
It is here C-- differs from LLVM. C-- is a research project that aims at solving difficult problems such as supporting efficient GCs and cheap concurrency. LLVM, on the other hand, is an engineering project.
In conclusion: garbage collection in LLVM incurs unacceptable overhead, and while C-- and LLVM do have some overlap, the problems they're trying to solve are quite different.