summaryrefslogtreecommitdiff
path: root/content/bootstrapGcc/06_tcc_mes.md
blob: e2a4a0e3da54f6a260c5343210cd0e4dc57ab5d3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
Title: Milestone – RISC-V support in Mes's bootstrappable TinyCC
Date: 2022-09-22
Category:
Tags: Bootstrapping GCC in RISC-V
Slug: bootstrapGcc6
Lang: en
Summary:
    Bringing RISC-V support to the bootstrappable TinyCC Mes forked. Some
    problems and a look into the future.

In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC,
TinyCC, Mes and Mes's TinyCC fork that is designed to be bootstrappable. In
this post we are going to deal with the latter, explain how we made it work for
RISC-V and the challenges we encountered.

### The non-bootstrappable nature of TinyCC

As we introduced in the previous post TinyCC is not compilable from very simple
compilers like Mes's `mescc`. So the Mes project decided to make a [fork that
`mescc` was able to compile](https://gitlab.com/janneke/tinycc). Mes calls it a
*bootstrappable tinycc*.

> There's a in uninteresting philosophical debate about what does
> *bootstrappable* mean, which leads to many errors and
> misunderstandings[^misunderstandings]. Many compilers call themselves
> bootstrappable if they can be compiled with themselves. When **we** talk
> about this, we are looking for a *full-source bootstrappability*, that is,
> that the compilers can be compiled from *source*, or from a *full-source
> bootstrappable* compiler.

TinyCC is supposed to be compilable by itself, but who compiles the version
that compiles TinyCC? Another TinyCC? And who compiles that?

The yogurt problem we always get: how do you make yogurt? Take yogurt, mix with
milk and in some hours you'll get yogurt. See the problem?

If you are a culinary maniac, as I am, you can stretch this metaphor further.
If you know what you are doing, you can obtain yogurt from raw milk[^kefir].

That's what our project is doing: make yogurt from raw milk at some point.

So the compilers normally only care about the latest yogurt, but, we, the
saviors of the ancient milk, those who can acidify the raw pureness, can make
yogurt starter with raw milk.

That's the kind of magic nobody cares about, not in the compiler world nor in
the real life.

The yogurt starter does not make the best yogurt, by the way, it needs
generations and generations of yogurts to make the best. That's what our
project does: start simple (stage-0 and Mes) and go enriching the product
(TinyCC) until reaching a mature yogurt (GCC).

TinyCC does not really care about this bootstrappability concept. They only
want to be compilable with themselves. Nothing else.

That's why [Jan](http://joyofsource.com/), the inventor of this metaphor I just
stretched to the infinite, had to fork the project. He had another choice:
simplify TinyCC's code upstream to be able to be compiled from a simpler step,
but his ideas were rejected and some weird animosity I don't understand
started. More on that later.

[^misunderstandings]: I've reached many misunderstandings about my project too.
Some people have told me all this work is worthless because you can always
bootstrap from an x86_64 machine and then continue the bootstrapping effort in
your RISC-V. And so on. That's why this blog doesn't have a comment section.
People insist to believe that other people's work is worthless or they are able
to do it simpler with no effort. I won't claim that my explanations are the
best, but I can claim to be the laziest person I know, and I'd never spent time
in something that doesn't worth the effort.

[^kefir]: With kefir you are fucked. We don't know where it comes from. Luckily
  we harvested a lot and it's easy to grow.

### The RISC-V support

When the previous blogpost was written, TinyCC had a RV64 backend, but the
TinyCC fork did not have RISC-V support.

My job here was to take the backend from the official TinyCC and bring it to
the bootstrappable one, Jan's fork. I can say that is done. Good for me.

#### The process

I followed the cross-compiler trick again, in order to make this process easier
in my computer and because Mes doesn't support RISC-V output yet. Making a
TinyCC for my x86_64 machine that had RISC-V output sounded more than
reasonable to me. Later I could always move to a full RISC-V machine making
sure that the backend was working.

So first I made a guix package for upstream [TinyCC cross-compiler (for
RISC-V)](https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm#L85)
with GCC. This wasn't really obvious, because there were some variables to set
correctly. Tested everything compiled and worked like expected. Apart from a
couple of issues later corrected upstream, it did.

Next, I made a guix package for [the forked TinyCC with
GCC](https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm#L83). This
also needed some changes, as the forked one is a quite old version of TinyCC.
The process needs here a `libtcc1.a` that can be empty if the process is
compiled with GCC (`libgcc` provides that functionality) but the compilation
process doesn't mention anything about this, and coming up with that by
yourself is hard.

Now the project was compilable, it was time to code. You can see this part in
the `riscv-mes` branch:

<https://github.com/ekaitz-zarraga/tcc/commits/riscv-mes>

I took the backend from the upstream and inserted it in the fork. Of course, it
didn't compile. Many internal structures and APIs changed, so after trying to
stitch all together myself, I headed to the Mailing List. At the beginning I
wanted to think the answers I was getting were because I wasn't explaining my
doubts properly or something but what it was happening was that the animosity
towards our fork (decision I didn't take) appeared and someone tried to
ridicule me in the mailing list for no reason at all.

The funny thing is I'd never needed to contact the mailing list if the project
was as well written as they claim it to be. It's full of functions and
variables with one character, the code is mixed together in a very aggressive
way... It's supersmall, tiny even, but really hard to read. Also, the commits
are not very descriptive for anyone that is not the main maintainer, who,
surprise! Is the same person that gives aggressive answers in the mailing
list... I hope it's only my perception and they are nice with his friends and
family, but the interaction made me feel uncomfortable and I don't want to
touch this code again.

It was a sad moment, I must admit. But I decided I was going to do this with
help or without it. And I think I did it. Removed references here and there and
finally it looks like I reached somewhere.

There are some differences to point out, one of the commits that made me ask in
the mailing list was a huge change on the way that conditionals are handled in
TinyCC. Our fork didn't have that so I needed to split the code in several
pieces and the benefits from that commit (some instruction optimization) are
lost in the backport. Still the branching and jumping is correct, but less
optimal. Not bad.

Code added and compiled, it was time for testing. I made a little script (I
didn't share that, but it's not really relevant either) and a small test case
of simple C files and compiled (not linked) them with the upstream version of
the compiler and the forked one. Disassembled them and compared differences.

You can try it building the upstream TinyCC and the fork and make them compile
(`-c`) a some files. Use `objdump --dissassemble` and see the results. It's not
really hard to test. Here you have an example of a program you can build:

``` clike
// Example file to build
int main (int argc, char *argv[]){
    int a = 19, b = 90;
    if (a && b){
        return 1;
    } else {
        return 45 + 90 << 8;
    }
}
```

And the result it should give in both versions, optimized (upstream) and
unoptimized (our fork):

``` text
OPTIMIZED VERSION                              || UNOPTIMIZED VERSION
===============================================||==================================================
0000000000000000 <main>:                       || 0000000000000000 <main>:
   0:	fd010113	addi	sp,sp,-48          ||    0:	fd010113	addi	sp,sp,-48
   4:	02113423	sd	ra,40(sp)              ||    4:	02113423	sd	ra,40(sp)
   8:	02813023	sd	s0,32(sp)              ||    8:	02813023	sd	s0,32(sp)
   c:	03010413	addi	s0,sp,48           ||    c:	03010413	addi	s0,sp,48
  10:	00000013	nop                        ||   10:	00000013	nop
  14:	fea43423	sd	a0,-24(s0)             ||   14:	fea43423	sd	a0,-24(s0)
  18:	feb43023	sd	a1,-32(s0)             ||   18:	feb43023	sd	a1,-32(s0)
  1c:	0130051b	addiw	a0,zero,19         ||   1c:	0130051b	addiw	a0,zero,19
  20:	fca42e23	sw	a0,-36(s0)             ||   20:	fca42e23	sw	a0,-36(s0)
  24:	05a0051b	addiw	a0,zero,90         ||   24:	05a0051b	addiw	a0,zero,90
  28:	fca42c23	sw	a0,-40(s0)             ||   28:	fca42c23	sw	a0,-40(s0)
  2c:	fdc42503	lw	a0,-36(s0)             ||   2c:	fdc42503	lw	a0,-36(s0)
  30:	00051463	bnez	a0,38 <main+0x38>  ||   30:	00051463	bnez	a0,38 <main+0x38>
  34:	0180006f	j	4c <main+0x4c>         ||   34:	01c0006f	j	50 <main+0x50>
  38:	fd842503	lw	a0,-40(s0)             ||   38:	fd842503	lw	a0,-40(s0)
  3c:	00051463	bnez	a0,44 <main+0x44>  ||   3c:	00051463	bnez	a0,44 <main+0x44>
  40:	00c0006f	j	4c <main+0x4c>         ||   40:	0100006f	j	50 <main+0x50>
  44:	0010051b	addiw	a0,zero,1          ||   44:	0010051b	addiw	a0,zero,1
  48:	0100006f	j	58 <main+0x58>         ||   48:	0140006f	j	5c <main+0x5c>
  4c:	00008537	lui	a0,0x8                 ||   4c:	0100006f	j	5c <main+0x5c>
  50:	7005051b	addiw	a0,a0,1792         ||   50:	00008537	lui	a0,0x8
  54:	00000033	add	zero,zero,zero         ||   54:	7005051b	addiw	a0,a0,1792
  58:	02813083	ld	ra,40(sp)              ||   58:	00000033	add	zero,zero,zero
  5c:	02013403	ld	s0,32(sp)              ||   5c:	02813083	ld	ra,40(sp)
  60:	03010113	addi	sp,sp,48           ||   60:	02013403	ld	s0,32(sp)
  64:	00008067	ret                        ||   64:	03010113	addi	sp,sp,48
                                               ||   68:	00008067	ret

```

In the right you can see there are some `j` instructions duplicated, but it's
not supposed to be a problem, as the rest of the addresses are calculated
properly, and they are never going to be reached.


#### Last step

So the code is added to the fork and it seems to work. That's what I promised
to do, but I wanted to go a little bit further and test if Mes was able to
handle the code I added to the TinyCC fork.

In order to do that I made another branch in the project where I changed the
package and some configuration in order to compile the forked TinyCC using Mes.

You can see what I did here:

<https://github.com/ekaitz-zarraga/tcc/commits/mes-package>

Turns out that I managed to build the thing, using Mes for my x86_64 machine
choosing RISC-V as the backend, but it doesn't work at all.

The resulting compiler generates empty files that have no permissions and fails
instantly.

At least we tested that `mescc` is ok with the C constructs we used in the
backport of the RISC-V support. But there are still many things to test and
this isn't easy at all.

Let me give you some examples on how tricky this process is.

This line in the `guix.scm` file[^line]:

``` clike
    "--extra-cflags=-Dinline= -DONE_SOURCE=1"
```

Does two crazy preprocessor tricks, inserted as C flags. It's equivalent to
adding these macros in the top level of the sources:

``` clike
#define inline 
#define ONE_SOURCE 1
```

The first one removes the word `inline` from the source code, because `mescc`
does not support that. The second, defines `ONE_SOURCE` to a value because if
it's only defined, without a value, like the makefile does by default, it is
not matched properly by de `#ifdef`s. Finding this is not obvious.

[^line]: <https://github.com/ekaitz-zarraga/tcc/blob/mes-package/guix.scm#L196>

That's of course not the only thing, we found out many others. I spent a couple
of weeks making the building process work for `mescc` and when I thought it was
working the result is a broken binary. Pretty fun.

And why all this trouble, you might think?

Jan's fork is not compiled using the `configure` and the `Makefile` the project
comes with, he wrote some shell scripts to build everything. I wanted to try to
build the project directly as it came for several reasons: the scripts are
prepared for native compilers and not for the cross compiler I was building,
they use Mes from source but I just needed to use the upstream one and I
thought integrating all this in the normal building process would be an extra
win.

I lost this time though.

The compilation process might be missing some libraries, or some stubs might be
in use instead of the real code... Maybe the problem is I'm using the x86_64
version of Mes, which is not thoroughly tested... But using the i386 version is
not possible because I'm building for 64bit RISC-V and the i386 doesn't know
how to deal with 64 bit words... Honestly, I don't know what to do.

### Something cool to say

Mes does not compile following the classic process. Mes is integrated with some
tools from the stage-0 project so it uses the M1 macro system, hex0 and all
that kind of things to build the programs.

During the process I found that some of the M1 instructions Mes was generating
were not available by M1, so I had to add a few extra instructions to the M1
macro definitions for Mes. Here's the diff (a little bit simplified) I had to
make:


``` diff
diff --git a/lib/x86_64-mes/x86_64.M1 b/lib/x86_64-mes/x86_64.M1
index 9ffbbf15..64997c55 100644
--- a/lib/x86_64-mes/x86_64.M1
+++ b/lib/x86_64-mes/x86_64.M1
@@ -147,6 +148,10 @@ DEFINE mov____0x8(%rbp),%rsp 488b65
 DEFINE mov____0x8(%rdi),%rax 488b47
 DEFINE mov____0x8(%rdi),%rbp 488b6f
 DEFINE mov____0x8(%rdi),%rsp 488b67
+DEFINE mov____(%rax),%si 668b30
+DEFINE mov____(%rax),%sil 408a30
+DEFINE mov____%si,(%rdi) 668937
+DEFINE mov____%sil,(%rdi) 448837
 DEFINE movl___%eax,0x32 890425
 DEFINE movl___%edi,0x32 893c25
 DEFINE movl___%esi,(%rdi) 8937

base-commit: aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037
```

Now, with those instructions added, my package got a little bit more complex:
I had to extend the Mes package with my patch until that change is accepted
upstream. But this is great! Using software and improving it while you use it
is the best feeling in life![^choco]

[^choco]: Chocolate and hot coffee too.

Let me use this point to show you a little bit how this macro system works. You
can see this `x86_64.M1` file has three columns: `DEFINE`, some text, and some
number in hex. This is kind of an assembler description. There's the M1 program
that receives a file written with instructions that look like the text in the
second column in the `.M1` file and converts them one by one to the numbers in
the third. In short, the `.M1` file is a reference that tells the M1 program
how to do the conversion.

M1 is just a text replacement tool that makes the conversion based on the input
file it gets from the `.M1` file. It helps us write instructions in a way that
looks like they have a meaning (that's what an assembler is after all).

Later, those numbers are converted to binary, using Hex0 or another a little
bit more sophisticated tool.

All these tools are written in a way that can be audited (Hex0 is written in
Hex0...) and they are executed from source at their very beginning.

This is how we make yogurt directly from milk. Cool huh?
Props to <http://bootstrappable.org/>

### Conclusions

Back to the project, considering the fact that I didn't manage to build a fully
working TinyCC with a RISC-V backend using Mes, is this a failure?

I wouldn't say so.

The new RISC-V backend is added and tested in the forked TinyCC, using GCC as a
compiler. That's a big chunk of the work.

On the other hand, I can compile the forked TinyCC with `mescc` even if the
result didn't work, I can say the code I added was processed so it was
technically acceptable for `mescc`. Not bad, but we'll still need to see how
true is this.

In the end, these kind of small steps make progress, and having everything
documented here and in the commits on the git repositories help others continue
with what I just did.

Now, I'm going to leave this as finished, as the code is supposed to work. All
the dots are more or less drawn. Now it's time for another project, one that
connects all the dots of the RISC-V full source bootstrap: from `mescc`
(already has some RISC-V support) to the forked TinyCC (I added the RISC-V
support), next to the mainline TinyCC (has RISC-V support) or/and GCC 4.6.4 (I
added RISC-V support) and from one of those to GCC 7.5 (the first one with
RISC-V support) and then to the world.

My work in this project left all the breadcrumbs in the forest, ready for
anyone to follow[^breadcrumbs].

[^breadcrumbs]: I hope someone follows them before the birds eat them.

That person can be me, anyone else or even a group of people. All I can say is
I won't forget this project, I'll always be reachable for advice and I'd try to
help as much as I can. As I always do.

These days I'll continue to give a couple of tries to this and I may reach
something else, but I won't be as busy on it as I've been. I think I gave
everything I could in this project. There's still a lot to do, but what it's
left is not something I can do alone.

Until next time.