1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
|
Title: Intro to the GNU Mes interpreter speedup effort
Date: 2025-03-30
Category:
Tags: GNU Mes interpreter speedup
Slug: fasterMes0
Lang: en
Summary: Announcing my next effort in the bootstrapping chain
It has been recent news that Nlnet and NGI0-Core decided to support me for
working on GNU Mes but I didn't really share the motivation behind this
proposal and my personal goals with it yet. Let's do it here.
### The problem
In my previous work in [Bootstrapping GCC in RISC-V][bootstrap] there was a lot
of frustration due to the time the whole process needed. The slowest point in
the process is not really in the GCC compilation step, but surprisingly earlier
in the chain when TinyCC is built. This step is performed by GNU Mes.
For you to understand how slow this step is, I've been running things at night,
while I was sleep, to get some results the next day, and when I was working on
real RISC-V hardware like the VisionFive2 my friend set up for me[^1], the
process took several days to finish. This supposed a huge delay on the process,
and it forced me often to push hard at night, trying to make sure I could run a
test build so I could have something ready for the next day[^2].
It is obvious then that making this very step faster is nothing but good for
the process and even small improvements in speed would mean hours of time
saved. This alone is enough to justify the effort.
But you are not here just for me telling you that making this faster is good.
So let's get a little bit more into the details.
During the work we did, we realized that the VisionFive2 was not that slow,
comparing with my laptop, when building with GCC or TinyCC as it was with GNU
Mes. The GNU Mes step is slow in my laptop, don't get me wrong, but in the
VisionFive2 it's **significantly slower**. That's what made us think, and
Andrius suggested that might happen because small Single Board Computers (SBC)
like the VisionFive2 have a very slow memory access, comparing with a laptop or
a desktop computer.
That was enough for me to trigger.
### GNU Mes
But first we need to understand what we are dealing with.
The goal of GNU Mes as a whole here is to connect M2-Planet, a compiler that is
able to build a small subset of C, with TinyCC, a fully-featured and fast C
compiler, that is able to build GCC 4.
GNU Mes has three main ingredients:
- `mes`: a Scheme interpreter that is compatible with Guile. This is written in
a subset of C that M2-Planet is able to build.
- `mescc`: a C compiler written in Scheme that is mutually-hosted with `mes`.
- `meslibc`: a minimal C standard library.
The bootstrapping process goes like this: `mes` is compiled by M2-Planet. Once
`mes` is built, it runs `mescc` to build its standard library, and then it
builds TinyCC. Running `mescc` is the part that is really slow.
`mes` is a tree-walk interpreter. It loads scheme code in an AST and then runs
it directly. Macros are expanded as they are found, and code is processed as it
comes.
`mescc` uses the NYACC C parser that gives the C AST as a tree structure made
by lists. `mescc`, written in the functional style that schemers like,
processes that tree.
### The hypothesis
With all this in mind, my hypothesis is GNU Mes step is slow is because `mes`,
the scheme interpreter, due to its design, makes the memory bottleneck obvious
in SBCs. Changing to a more memory-compact design that exploits the cache would
make `mes` way faster.
When I say _"due to its design"_ I mean two things: first that a tree-walk
interpreter requires jumping around the memory, as the scheme AST is not a
compact structure, and, second, generate a huge amounts of lists, the basic
scheme data structure. These two are also happening in `mescc`, which builds
the C AST using scheme lists, and them processes them.
Having a compact data structure would increase the cache usage, reducing the
access to memory, and making the program faster.
### The project
The project aims to improve the performance of GNU Mes' scheme interpreter,
rewriting it as a **bytecode interpreter**, while keeping it as simple and
readable as it is (or even more!). This would enable faster execution of the
Mes C Compiler (`mescc`) for faster build times, making the bootstrapping chain
more accessible, specially in small single-board computers where memory access
is more expensive. This speedup could also lead to a reduction of steps in the
bootstrapping chain, making it simpler and easier to maintain.
On the other hand, our work is constrained by the C constructs that M2-Planet
is able to deal with. The project includes a review on the new additions that
have been done to M2-Planet in the most recent releases, that allow us to make
Mes more readable and faster.
### Some numbers
The first step of the project is to create a benchmark that would become useful
later during the rewrite of the core of the interpreter. I didn't finish that
yet but it has already led me to some interesting numbers that talk a little
bit about the feasibility of the project.
Let's try a simple program and put it to test:
``` scheme
(define (fib n)
(if (< n 2)
n
(+ (fib (- n 1))
(fib (- n 2)))))
(display (fib 35))
```
Here is the wall time of several runs of that file:
- **Guile:** 0.35s — run as `guile fib.scm`
- **Guile:** 3.33s — run as `guile --no-auto-compile fib.scm` (after cleaning
the cache)
- **Guile:** 8.51s — run as `GUILE_JIT_THRESHOLD=-1 guile --no-auto-compile fib.scm`
- **Chibi:** 1.47s — run as `chibi-scheme -m scheme.small fib.scm`
- **Mes:** 23.33s — when built with GCC
- **Mes:** 56.03s — when built with M2-Planet
Of course, this is not the only test I have to make but it is quite
self-explanatory. We'll take a deeper look to the numbers in later posts, but I
think there's room for improvement.
### Some thoughts
I don't think I say anything stupid here, and I think the numbers support the
hypothesis. Of course, that's nothing new: bytecode interpreters are known for
long, and the reason why most of modern interpreters are bytecode interpreters
(in the numbers above, both Guile and Chibi are) is exactly what I'm mentioning
here.
Obviously, bytecode interpreters are not the perfect solution for everything,
and just that is not the reason why things are fast or slow. If we had, say, a
very slow list creation function, we would still be slow.
Being as slow as GNU Mes currently is, I think just a non-pessimization
strategy should be enough (that's what a bytecode interpreter is to some
degree) to obtain a significant speed improvement. In any case, even small
improvements can be very beneficial when we are talking about long processes.
I hope to make it around 10x faster. But we'll see if I'm able to achieve that.
### Personal goals
Of course my life is not always controlled only by the things I consider
useful. A project has to have something else for me to prepare a proposal and
spend a year or more on it. It has to be interesting and has to be aligned
with my goals.
I'm not a compiler engineer and I'm not an optimization expert, and I'm not
specially interested in becoming any of them. I'm interested in interpreters
and programming languages as they are tools that I use every day, and I tried
several times to make a programming language implementation but I didn't have
the time or the resources to do so. I have other things to do, after all.
I have also to admit I have other goals for this project that go a little bit
further than just making GNU Mes faster. I think I should make my changes
*readable* because I think it's a great opportunity for others to learn. This
also means documenting my decisions and design goals, and writing that down
inside of the project's documentation. We are going to learn together, and my
process is going to be there for others to follow.
In summary, this project is a great example of something I think *I can do*,
that I think *is useful* for the bootstrapping and aligns with my long term
goals of *learning about languages*, specifically about Lisp implementations,
and making the *knowledge more accessible* to others.
I never did this before, but my whole career has been me doing things I never
did before, so I guess it would be fine. It always happens to be.
We'll see.
Thanks for the support, and for believing that I could do this,
[bootstrap]: /tag/bootstrapping-gcc-in-risc-v.html
[^1]: I never asked him to, but I happen to have the best friend that one
could possibly have. We'll talk about him another day.
[^2]: I've been going to bed way past midnight for too long, and I don't
encourage you to do the same.
---
<style>
.container{
display: flex;
flex-flow: row wrap;
justify-content: center;
gap: 40px;
}
.no-side-margin{
margin: 0px;
}
</style>
<div class="container">
<img class="no-side-margin" src="{attach}/fasterMes/nlnet.svg" width=200px>
<img class="no-side-margin" src="{attach}/fasterMes/NGI0Core.svg" width=200px>
</div>
|