Why compiler technology matters – Compilation speed
When discussing the pros and cons of programming languages, or when picking the technologies in which to do a project, one thing that is often overlooked is the compiler itself. Since it isn't simple to see the differences between two compilers based on language documentation alone, we don't always notice how it impacts us. But the compiler is ultimately the tool that translates your code into an executable program — it is arguably one of the most important tools in your workflow. And just like any other tool (e.g. the IDE you are working with, a new LLM that acts as your quick documentation, a linter, an auto formatter, etc.) it will have an impact on the quality and efficiency of your coding. At the same time, just like all the other tools, the technology of compilers and programming languages grows, and new ideas are conceived and executed all the time.
In this post I will discuss, from the point of view of the user, one of the aspects of compiler technology that is not directly dictated by a given programming language, but more so by the compiler itself – compilation speed.
Compilation Speed
Say you downloaded some large project and now want to compile it. You follow the steps and then… you wait. Unfortunately, you may be left waiting for a very long while. To give you some perspective, the Duckling toolset currently takes over 15 minutes to compile from scratch 1 on what I believe to be decent hardware. And it's not yet the biggest project in terms on lines of code2. When projects get even larger, compile times can easily stretch into hours, especially on older hardware. This has a direct impact on the speed at which changes to a program can be efficiently made.
On top of that, as compilers and programming languages develop, we only increase the amount of features, static checks, optimizations, etc. All of that takes more and more time to compute. Fortunately we are not hopeless against this problem. There are two main ways to get around long compile times. One way is to design a language that can be compiled very quickly and create a super-optimized compiler around it. There are projects going in that direction, however the problem with this approach from Duckling's perspective is that it limits what language features you can introduce. Another option is to introduce techniques that cleverly speed up or cut compilation times. I will discuss three such techniques which increased in popularity in the recent years.
Incremental Compilation
One way to cut compilation time is by using incremental compilation. This is a technique that boils down to the following: we want small changes to the code to be recompiled quickly. It becomes clear why this can be useful when you consider a typical programmer's workflow: change code, compile, test, repeat. If the changes are small and the tests are quick, a lot of time is wasted on waiting for recompilation to finish. The less time is wasted, the faster the feedback, and the sooner the desired changes are made. Generally speaking, incremental compilation techniques rely on saving some of the compiler outputs in memory or on the disk, and somehow reusing them later, so that only a small part of the codebase needs to be recompiled.
Incremental Compilation in the case of C++
This technique is not new and some of you may already be shouting: C++ has had incremental compilation for decades! And yes, that's true. With the definition I provided, C++ does have incremental compilation, based on caching object files, compiled from each source file. Projects like cmake or ccache can even automate it for you.
In fact, many compilers support incremental compilation in some way similar to what C++ is doing (not always on a file level, but for example on module level). The difference between such systems and the incremental compilation systems appearing in newer compilers or scientific literature lies in two critical details. We can demonstrate them by providing our decisions regarding Duckling's compiler as examples:
- Granularity of recompilation. We don't want to recompile whole files. Rather, we want to recompile each function separately.
- Precision of dependency tracking / dependency detection. We don't want to detect code dependencies based on the dependence of singular files or modules. Rather, we want to introduce some clever dependency tracking mechanism within the compiler that would allow us to recompile each function only when its body or the code it truly depends on changes. Being 100% false-positive free here is impossible from a practical point of view, but high accuracy is well within reach.
Some of you may know that one compiler to introduce advanced incremental compilation in recent years is rustc
— the Rust compiler. Here are a few relevant links for those interested3:
- Request for Comments that started in 2015 about adding incremental compilation to Rust
- Blog post from 2016 announcing alpha version of incremental compilation in Rust
- 1.24.0 Rust release notes, to the best of my knowledge the first version to enable incremental compilation by default
- Current
rustc
developer guide on incremental compilation
Modern incremental compilation models can in principle cut compilation times by orders of magnitude, given a small change to the code, especially in debug builds4. Waiting a few minutes after that single interface change or waiting a few seconds does make a difference.
In Duckling, we focus on incremental compilation very early on. We're working on it already, before finishing many of the language's important features.
Multithreaded Compilation
A more direct way to to speed up compilation, is to use multiple cores of the CPU. In a perfect world, you would use N cores, and speed up compilation N times. Unfortunately it's not always that simple. For many reasons, many modern compilers, including Duckling's compiler, are designed to process large parts of the project (or even the entire project) by a single process5. As a result, the real challenge lies in so called intra-process parallelism – the ability to use multiple CPU cores, within a single compilation process. As a side effect, this also means that we might benefit from multithreaded compilation even when compiling a small project, maybe even a single file or when recompiling small parts of a much bigger project. Unfortunately this also means that it suddenly becomes very hard to get a decent speed up with multiple threads, because you somehow need to synchronize the threads, schedule their work, and share data between them. This is also where the area for improvements lies: how much speed up do we actually get? It depends greatly on the multithreading computation model being used6.
We aren't working on parallel compilation in Duckling right now, but we are thinking about it when designing the compiler architecture, and we have done some preliminary research in that area already.
It's also worth mentioning that in most cases there is a clear distinguishment between compiler front-end parallelization and compiler back-end parallelization. Parallelization of the back-end, responsible for things like code-gen and optimizations, is typically much simpler and more effective.
Lazy compilation
Lazy compilation is the simplest technique of the three. It can be implemented in many different ways, but in general it means that we don't compile what is not needed. Imagine you downloaded some large library, but you only need a few functions from there. Those functions might depend on some other code, but it's unlikely that entire library has to be compiled. In some situation this can mean that not only do we not compile certain fragments of the program, but sometimes we even don't read all the source files. Less work means less time spent on compilation. In some cases it can be a double-edged sword, as for example showcased in the GitHub issue in the Zig compiler repo from 2019, where some compile-time errors are hidden, because the function wasn't used.
The design of the Duckling compiler is such that it does not impose any concrete lazy compilation model, but very efficiently allows to only do the work you want it to do. This means that it should be possible (and even perhaps relatively easy) to add some lazy compilation in the future.
Final thoughts
It's worth to keep in mind that improvement in compilation speed is just one of many aspects of the development of compiler technology. Even within the topic of compilation speed, what I wrote about in this post covers only a few areas which we are working on right now in the Duckling compiler. We plan to write more on the topic of compiler technology, especially in relation to the Duckling toolset.
-
Measured on a single thread. ↩
-
As an interesting note, newer compilers seem to be relatively small, below 1 million lines of code. This is partially thanks to the development of tools like LLVM (which itself has over 30 million lines of code) that can "outsource" a lot of the logic, especially regarding the support of a multitude of backends/hardwares, and optimisations. ↩
-
As a side note, I'll add that it (understandably) took Rust quite some time to settle on the incremental compilation system which they use today, partly because in the process they iterated through few approaches. For example, here is an issue from a 2017 roadmap, talking about adapting a new approach. The current system is strongly connected to so-called demand-driven compilation, that I'll definitely write more about, since it has a lot of very interesting implications for Duckling. ↩
-
Ignoring some optimizations like function inlining loosens the dependencies of the compilation output of each function. ↩
-
This might not be 100% true, but the general idea holds. This allows for example to relatively easily read and compile everything exactly once. ↩
-
Note that due to Amdahl's law, if you want to reach high performance improvements, you are forced to introduce parallelization in most of the compiler. And since the computations made by the compiler are quite complex and generally not independent from one another, this can get really tricky. ↩