diff options
Diffstat (limited to 'content/fasterMes/00_intro.md')
-rw-r--r-- | content/fasterMes/00_intro.md | 215 |
1 files changed, 215 insertions, 0 deletions
diff --git a/content/fasterMes/00_intro.md b/content/fasterMes/00_intro.md new file mode 100644 index 0000000..d1b4193 --- /dev/null +++ b/content/fasterMes/00_intro.md @@ -0,0 +1,215 @@ +Title: Intro to the GNU Mes interpreter speedup effort +Date: 2025-03-30 +Category: +Tags: GNU Mes interpreter speedup +Slug: fasterMes0 +Lang: en +Summary: Announcing my next effort in the bootstrapping chain + +It has been recent news that Nlnet and NGI0-Core decided to support me for +working on GNU Mes but I didn't really share the motivation behind this +proposal and my personal goals with it yet. Let's do it here. + +### The problem + +In my previous work in [Bootstrapping GCC in RISC-V][bootstrap] there was a lot +of frustration due to the time the whole process needed. The slowest point in +the process is not really in the GCC compilation step, but surprisingly earlier +in the chain when TinyCC is built. This step is performed by GNU Mes. + +For you to understand how slow this step is, I've been running things at night, +while I was sleep, to get some results the next day, and when I was working on +real RISC-V hardware like the VisionFive2 my friend set up for me[^1], the +process took several days to finish. This supposed a huge delay on the process, +and it forced me often to push hard at night, trying to make sure I could run a +test build so I could have something ready for the next day[^2]. + +It is obvious then that making this very step faster is nothing but good for +the process and even small improvements in speed would mean hours of time +saved. This alone is enough to justify the effort. + +But you are not here just for me telling you that making this faster is good. +So let's get a little bit more into the details. + +During the work we did, we realized that the VisionFive2 was not that slow, +comparing with my laptop, when building with GCC or TinyCC as it was with GNU +Mes. The GNU Mes step is slow in my laptop, don't get me wrong, but in the +VisionFive2 it's **significantly slower**. That's what made us think, and +Andrius suggested that might happen because small Single Board Computers (SBC) +like the VisionFive2 have a very slow memory access, comparing with a laptop or +a desktop computer. + +That was enough for me to trigger. + + +### GNU Mes + +But first we need to understand what we are dealing with. + +The goal of GNU Mes as a whole here is to connect M2-Planet, a compiler that is +able to build a small subset of C, with TinyCC, a fully-featured and fast C +compiler, that is able to build GCC 4. + +GNU Mes has three main ingredients: + +- `mes`: a Scheme interpreter that is compatible with Guile. This is written in + a subset of C that M2-Planet is able to build. +- `mescc`: a C compiler written in Scheme that is mutually-hosted with `mes`. +- `meslibc`: a minimal C standard library. + +The bootstrapping process goes like this: `mes` is compiled by M2-Planet. Once +`mes` is built, it runs `mescc` to build its standard library, and then it +builds TinyCC. Running `mescc` is the part that is really slow. + +`mes` is a tree-walk interpreter. It loads scheme code in an AST and then runs +it directly. Macros are expanded as they are found, and code is processed as it +comes. + +`mescc` uses the NYACC C parser that gives the C AST as a tree structure made +by lists. `mescc`, written in the functional style that schemers like, +processes that tree. + + +### The hypothesis + +With all this in mind, my hypothesis is GNU Mes step is slow is because `mes`, +the scheme interpreter, due to its design, makes the memory bottleneck obvious +in SBCs. Changing to a more memory-compact design that exploits the cache would +make `mes` way faster. + +When I say _"due to its design"_ I mean two things: first that a tree-walk +interpreter requires jumping around the memory, as the scheme AST is not a +compact structure, and, second, generate a huge amounts of lists, the basic +scheme data structure. These two are also happening in `mescc`, which builds +the C AST using scheme lists, and them processes them. + +Having a compact data structure would increase the cache usage, reducing the +access to memory, and making the program faster. + + +### The project + +The project aims to improve the performance of GNU Mes' scheme interpreter, +rewriting it as a **bytecode interpreter**, while keeping it as simple and +readable as it is (or even more!). This would enable faster execution of the +Mes C Compiler (`mescc`) for faster build times, making the bootstrapping chain +more accessible, specially in small single-board computers where memory access +is more expensive. This speedup could also lead to a reduction of steps in the +bootstrapping chain, making it simpler and easier to maintain. + +On the other hand, our work is constrained by the C constructs that M2-Planet +is able to deal with. The project includes a review on the new additions that +have been done to M2-Planet in the most recent releases, that allow us to make +Mes more readable and faster. + + +### Some numbers + +The first step of the project is to create a benchmark that would become useful +later during the rewrite of the core of the interpreter. I didn't finish that +yet but it has already led me to some interesting numbers that talk a little +bit about the feasibility of the project. + +If the list is "the" scheme data structure we could try a simple program and +put it to test: + +``` scheme +(define (fib n) + (if (< n 2) + n + (+ (fib (- n 1)) + (fib (- n 2))))) +(display (fib 35)) +``` + +Here is the wall time of several runs of that file: + +- **Guile:** 0.35s — run as `guile fib.scm` +- **Guile:** 3.33s — run as `guile --no-auto-compile fib.scm` (after cleaning + the cache) +- **Guile:** 8.51s — run as `GUILE_JIT_THRESHOLD=-1 guile --no-auto-compile fib.scm` +- **Chibi:** 1.47s — run as `chibi-scheme -m scheme.small fib.scm` +- **Mes:** 23.33s — when built with GCC +- **Mes:** 56.03s — when built with M2-Planet + +Of course, this is not the only test I have to make but it is quite +self-explanatory. We'll take a deeper look to the numbers in later posts, but I +think there's room for improvement. + + +### Some thoughts + +I don't think I say anything stupid here, and I think the numbers support the +hypothesis. Of course, that's nothing new: bytecode interpreters are known for +long, and the reason why most of modern interpreters are bytecode interpreters +(in the numbers above, both Guile and Chibi are) is exactly what I'm mentioning +here. + +Obviously, bytecode interpreters are not the perfect solution for everything, +and just that is not the reason why things are fast or slow. If we had, say, a +very slow list creation function, we would still be slow. + +Being as slow as GNU Mes currently is, I think just a non-pessimization +strategy should be enough (that's what a bytecode interpreter is to some +degree) to obtain a significant speed improvement. In any case, even small +improvements can be very beneficial when we are talking about long processes. + +I hope to make it around 10x faster. But we'll see if I'm able to achieve that. + +### Personal goals + +Of course my life is not always controlled only by the things I consider +useful. A project has to have something else for me to prepare a proposal and +spend a year or more on it. It has to be interesting and has to be aligned +with my goals. + +I'm not a compiler engineer and I'm not an optimization expert, and I'm not +specially interested in becoming any of them. I'm interested in interpreters +and programming languages as they are tools that I use every day, and I tried +several times to make a programming language implementation but I didn't have +the time or the resources to do so. I have other things to do, after all. + +I have also to admit I have other goals for this project that go a little bit +further than just making GNU Mes faster. I think I should make my changes +*readable* because I think it's a great opportunity for others to learn. This +also means documenting my decisions and design goals, and writing that down +inside of the project's documentation. We are going to learn together, and my +process is going to be there for others to follow. + +In summary, this project is a great example of something I think *I can do*, +that I think *is useful* for the bootstrapping and aligns with my long term +goals of *learning about languages*, specifically about Lisp implementations, +and making the *knowledge more accessible* to others. + +I never did this before, but my whole career has been me doing things I never +did before, so I guess it would be fine. It always happens to be. + +We'll see. + +Thanks for the support, and for believing that I could do this, + +[bootstrap]: /tag/bootstrapping-gcc-in-risc-v.html + +[^1]: I never asked him to, but I happen to have the best friend that one + could possibly have. We'll talk about him another day. + +[^2]: I've been going to bed way past midnight for too long, and I don't + encourage you to do the same. + +--- + +<style> +.container{ + display: flex; + flex-flow: row wrap; + justify-content: center; + gap: 40px; +} +.no-side-margin{ + margin: 0px; +} +</style> +<div class="container"> +<img class="no-side-margin" src="{attach}/fasterMes/nlnet.svg" width=200px> +<img class="no-side-margin" src="{attach}/fasterMes/NGI0Core.svg" width=200px> +</div> |