summaryrefslogtreecommitdiff
path: root/content/bootstrapGcc/04_full_compiler.md
blob: 1c04e7a309984484ddbae5c7b3eb1a710e7814c6 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
Title: Milestone — Source to Binary RISC-V support in GCC 4.6.4
Date: 2022-06-20
Category:
Tags: Bootstrapping GCC in RISC-V
Slug: bootstrapGcc4
Lang: en
Summary:
    Description of the changes applied from a minimal compiler that runs and
    generates assembly to something that is actually able to compile,
    interacting with binutils and having a working libgcc.

In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC,
and we already shared how I backported the RISC-V support from the GCC core to
GCC-4.6.4. Now it's time to finish what we left half-done and actually
introduce a *full* RISC-V compiler.

### Where we left last time

The Tuesday, 7th of April, I marked a commit with the `minimal-compiler` tag.
That commit contains all the work we did until that time. In that tag we
describe how we can build a compiler that is only able to assemble files to
RISC-V.

As we already explained around here, GCC is a driver program that calls other
programs to do its work. The GCC core compiles the code to assembly language
and then calls binutils to do the rest of the work: assembly and linking.

At that point, we had to call binutils by hand.


### The changes

The changes applied at the time of writing are available in the
[`working-compiler`][working-compiler] tag. As the tag message describes, they
were split in two different branches: the `guix-package` branch and the `riscv`
branch.

[working-compiler]: https://github.com/ekaitz-zarraga/gcc/releases/tag/working-compiler

The `guix_package` branch is merged in the `riscv` branch but this split lets
us differentiate which changes are related with the compiler itself and which
are related with the tooling around the compiler. That way we'll be able to
choose what to do with the commits easily in the future. We'll probably need to
rearrange some stuff.


### The context is everything: Guix package part

The `guix_package` branch contains all the commits that make the Guix tooling
around the project work. This includes the compilation process definition in a
reproducible way, the environment setup and all that.

As the `working-compiler` tag message describes, this is the way you can
currently make this compiler work and play with it:

``` bash
$ guix shell -m manifest.scm
$ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu
 # This second command will prepare the PATH and other environment
 # variables to make GCC find libraries and executables
```

> If you use this in the future and it fails, it might be because between the
> time this blog post was written and you read it Guix made some changes in the
> core packages that are used. You can always use the `time-machine` utility to
> make sure you use everything like in the moment this post was written:  
> `guix time-machine --channels=channels.scm -- shell -m manifest.scm`


From this point you can directly run the compiler, it will need the `sysroot`
option to be able to find the `crt*` files, but that's something I'm not
worried about at this point, we'll fix that when we integrate this in the
bootstrapping process.

Run the compiler like this now:

``` bash
$ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static]  ...
```

#### Notable changes in the Guix side

The most notable change in the Guix side is the addition of the `manifest.scm`
file and also the `PREPARE_FOR_COMPILATION.sh` file. With the help of my man
Janneke, I realized the problems I had came from the fact that I was calling
the compiler with the wrong environment and it was unable to find the linker
and the assembler. Yes, this kind of things happen a lot in Guix if you are not
careful (and I am *not* careful at all). Adding these tools let me prepare a
working environment where the assembler and the compiler are found and called
properly.

This change also includes the some interesting extras: the GLibC added to the
manifest also contains the static version so we can generate static binaries
that are easier to test in an emulated environment without having to deal with
the dynamic linker. Important stuff.

Also, now the compilation process relies on a newer Guix version, which removed
the `-unknown` part from the triplets (actually *quadruplets*), like
`riscv64-unknown-linux-gnu`. That was a little bit of a pain, because I just
tried to compile everything one day and failed, and in the end it was just that
small change. I decided to update the Guix version needed to keep it up-to-date
with the current Guix, so I didn't need to run `guix time-machine` each time.
It's better like this.

If you want to read more about the change and see how fast Guix
people helped me understand what was going on, [see this mailing list
thread][ml-unknown][^guix]. I have also to mention that I needed to add a small
change to my GCC to be able to work in the case the `-unknown` part was not
added to it: adding `riscv` to `config.sub` was enough for that.

[^guix]: Some people also spent time with me in the IRC. Thanks to all that
  helped!

[ml-unknown]: https://lists.gnu.org/archive/html/bug-guix/2022-06/msg00092.html

I also fixed a couple of extra things but they are not really relevant for
this. Having a working environment preparation is a nice milestone by itself,
but we did some things more on the GCC side!


### Road to a working compiler: The GCC part

The changes in the `riscv` branch contain some commits, most of them are small,
but they are really important. I have to say this is full of details I don't
really understand, so I'll try to focus on those I actually do. The rest of
them are simply things that happened to work in the end. You know, this is
pretty old software and the project is too complex to understand it all...

#### Memory models and fences

First, before doing anything else, we mentioned in the previous post that the
memory models were something we needed to review. We knew this because the code
related to memory models was used in a couple of parts of the RISC-V code we
copied from the GCC 7.5 codebase, but it was not available in GCC 4.6.4. That
API simply did not exist back then.

The commit [`71dc25d`][memmodels] removes the memory models from the code
(which were already commented out but not solved), taking in account the most
conservative approach: always add the `.aq` flag and the `fence` instruction.
This is not optimal, but the performance penalty is negligible and it's not
affecting the functionality.

[memmodels]: https://github.com/ekaitz-zarraga/gcc/commit/71dc25d08354dead26180bd552c0c3e299b012cb

I did not come up with this myself, as I mentioned in the previous post, I
asked the maintainer of the RISC-V support of GCC (who is also one of the big
names of RISC-V) about this and he gave me this solution.

I also had to change the optabs a little bit, using `memory_barrier` instead of
one of the more recent optabs. For this I just compared the code from the MIPS
architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for
many other parts of this work. Easy-peasy.

#### Wrong arguments in the assembler call

As I mentioned in the Guix part, we were unable to call the assembler. This
means we didn't uncover the assembler call was broken until we actually put it
in the `PATH` and tried to call it.

The commit [`7030067`][as-call] shows how I needed to make small changes in the
way the assembler is called by GCC to ensure that it was called correctly.

[as-call]: https://github.com/ekaitz-zarraga/gcc/commit/7030067e6aa54b44a2f2447d4e706e76bc88f696

This issue was easy to fix, but not that easy to catch. First I found the
assembler was complaining because it didn't understand the `-k-march` option. I
spent some time realizing the problem was that those were to options that were
merged together due to a lack of a space. Yes, the space in the end of the line
**is relevant**.

I directly removed the `-k` option from the `ASM_SPEC` because my assembler was
considering it ambiguous. I don't remember where I copied this from but it
works and I don't want to think about it ever again.



#### Libgcc: the core of this change

The biggest thing in this set of changes was the addition of `libgcc`, which
is mandatory if you want to link your programs compiled with GCC. `libgcc` is a
library GCC uses for complex operations: instead of generating the assembly
code directly, it generates calls to `libgcc`, where those complex operations
are defined. You can read further about those operations but they are not
really relevant for this post, the relevant part is we need to add `libgcc` in
order to have a working compiler.

The GCC codebase has different folders for its different blocks, so it's
not surprising to see there's a folder called `gcc` for the core and a folder
called `libgcc` for `libgcc`. Anyone would expect that just cherry picking the
commit that added the `libgcc` support to GCC 7.5 would be enough to have the
backport ready.

Sadly, life is a little bit harder than that.

##### Cherry picking the libgcc support

The first and easiest thing to do is to cherry pick the commit
[`72add2f`][libgcc-commit] and pray. It looked plausible to make it work,
because, if you look at the changes it makes, it's pretty well contained in the
`libgcc/config/riscv/` folder and adds just a couple of lines to the
`libgcc/config.sub` to make it find the `riscv` folder.

[libgcc-commit]: https://github.com/ekaitz-zarraga/gcc/commit/72add2fa4c354af4bf8db0b8dcb50c5b076b3ae5

The contents of the commit are pretty clear:

1. Some assembly files that implement some operations
2. Some header files and C code that implement other things
3. Some weird files called `t-something`

The first two types of files we can understand as the body of the `libgcc`
support: the juice. The `t-something` files are what are called Makefile
Fragments.

The Makefile Fragments are the basis of the GCC build system. The files like
`config.host`, also part of the commit, sets a variable, `tmake_file`, where
all the `t-something`s are added so the compiler generator framework knows how
to build the things according to the rules described in them.

That's how GCC buildsystem works. Now let's talk about the problems.

##### LIB2ADD iteration is broken

First thing I realized when I did the cherry pick of the `libgcc` support was
the whole thing did not build anymore. There was a crazy issue here.

We are not going to talk about `LIB2ADD` variable yet, but we can see this
small change, [`b9c7f39`][lib2add], affects it. The main issue here was the
whole makefile system (`*.mk` files in `libgcc`) was iterating over the values
of the variable wrong, because `libgcc` support commit was appending values to
`LIB2ADD` instead of setting it. The `LIB2ADD` variable was set empty from the
main makefiles, and appending to it was leaving an empty entry, so the
iteration process was trying to compile an empty value.

[lib2add]: https://github.com/ekaitz-zarraga/gcc/commit/b9c7f394b33a60c1e64191b0e31f0cf98d6a5f93

This was superhard to debug, but this small change just made the whole thing
compile and now I was able to test the whole thing further.

##### Still broken

But it was still broken. GCC didn't want to compile. Some weird errors
appeared, mentioning something like the `extra_parts` were not coherent between
`gcc` and `libgcc`. Weird.

Reading `gcc/config.gcc` and `libgcc/config.host` I realized the use of the
`extra_parts` variable and how it was certainly incoherent between the two
files. But why?

This led me to analyze the whole build system, comparing the RISC-V support
with others. I realized here that the buildsystem is mixed in `gcc` and
`libgcc` folders and it's extremely difficult to know what's the line that
separates one from another.

Apart from that, the buildsystem was unable to compile the `crt*` files,
because it didn't know how to do it... The recipes were missing.

This made me go for the most aggressive change possible,
[`9c0f736`][aggressive-fix]: just copy everything from the
`libgcc/config/riscv/` to the `gcc/config/riscv`, add the rules for the `crt*`
files and make the `extra_parts` coherent.

[aggressive-fix]: https://github.com/ekaitz-zarraga/gcc/commit/9c0f7364b89acb38ea3af1cbe1884059671b3c04

Of course, this is not a good change, but it lets us try if the generated
compiler is able to compile anything. *"I'll have time to clean this up later"*
I thought.


##### The buildsystem is just a pain in the butt

Now I was able to compile the GCC, so I could try it for some things.

I build a RISC-V cross compiler and tried to statically compile a small Hello
World program. Errors appeared:

``` unknown
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex':
/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2'
collect2: ld returned 1 exit status
```

The most logical thing to do was to build a MIPS cross compiler and check if
the same issue appeared. Of course, it didn't.

Researching a little bit in the old GCC internals documentation, I found a
couple of interesting things:

<https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment>

- The `LIB2FUNCS_EXTRA` variable is the one that contains what it should be
  compiled and added to `libgcc`.
- **Floating Point Emulation** support is added by generating a couple of files
  with some macros on top: `fp-bit.c` and `dp-bit.c`.

Neither of those were used in the `libgcc` support we backported because the
GCC buildsystem changed a lot since 4.6.4. In fact, there is a commit[^commit],
much later than the 4.6.4 release, that removes the need to generate those
`fp-bit.c` thingies.

[^commit]: `569dc494616700a3cf078da0cc631c36a4f15821`

The `LIB2FUNCS_EXTRA` variable was not used either, but somewhere in the
makefiles I found `LIB2ADD` was set from it. It looks like the whole
buildsystem changed from `LIB2FUNCS_EXTRA` to `LIB2ADD`, which was an internal
variable in the past. I don't know.

I just moved the `LIB2ADD` to `LIB2FUNCS_EXTRA` and set the floating point
emulation in the `t-riscv` makefile fragment and hoped my work was done there.

##### A huge pain in the butt

It still failed, but at least now the `__letf2` symbol was found. The only one
I needed to fix now was `__unordtf2`.

I was disheartened.

The `__unordtf2` name did not appear anywhere in the code, but building
`libgcc` for MIPS had the symbol inside (I checked it with `nm`!). I had no
idea of what was going on.

I asked all my peers about this, and I was sent a program that was actually
compilable and runnable (Janneke is a genius, someone has to say it!):

``` clike
#include <stdio.h>

int
main ()
{
  return printf ("Hello, world!\n");
}

int
__unordtf2 ()
{
  return 0;
}
```

Hah! Still, no solution, but it was a little bit of hope.

This gave me the energy I needed to research further. This `__unordtf2`
function comes from software floating point support but the makefile fragments
in the `libgcc` folder seem to be correctly set...

##### Moxie for the rescue

MIPS architecture was too complex to be understandable for this humble human
being so I decided to go for Moxie this time.

[Moxie](http://moxielogic.org/blog/pages/architecture.html) is a really
interesting thing. But we are not going to spend time on it, but in its support
in GCC 4.6.4. Take a look to the files on both parts of the Moxie support: the
`libgcc` and `gcc`:

``` unknown
gcc/config/moxie
├── constraints.md
├── crti.asm
├── crtn.asm
├── moxie.c
├── moxie.h
├── moxie.md
├── moxie-protos.h
├── predicates.md
├── rtems.h
├── sfp-machine.h
├── t-moxie
├── t-moxie-softfp
└── uclinux.h

libgcc/config/moxie
├── crti.asm
├── crtn.asm
├── sfp-machine.h
├── t-moxie
└── t-moxie-softfp
```

As you can see, some things are repeated, and most of the files are located in
the `gcc` part, which was not the case in the backported commit. I used this as
a reference for a massive cleanup of the previous aggressive duplication and I
ended up with this commit: [`703efe3`][cleanup]

[cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/703efe3e86e68fe05380e996943c831e7ad9a541

But that wasn't enough.

I also found that the `soft-fp` support did not come from the `libgcc`
directory, but from the `gcc` one, so I needed to fix some makefile fragments.
The reference on how to do that was located in `gcc/config/soft-fp/t-softfp`.
This file described all the variables that I needed to set up to make the whole
process find the software floating point functions to add (see how the function
names are built with the `$(m)` variable? That's why I couldn't find where did
the `__unordtf2` came from...).

Those variables were set in `libgcc/config/riscv/t-softp*` files. I replicated
them in `gcc/config/riscv` as in the Moxie target and added referenced to them
to the `gcc/config.gcc` file, copying the lines I had `libgcc/config.host`. The
process was still failing, as the variables were not found by the main
makefile. I decided to hardcode them and give it another go, this time it built
and I was able to build files and the weird errors did not appear anymore.

I realized in the end that the reason why the main makefile wasn't finding the
variables was because I was referring to the `t-softfp*` files through the
variable `host_address`, as it was done in the `libgcc/config.host`. The
problem was that variable was not available in the main `gcc/config.gcc` file
so I had to make a beautiful `switch-case` to deduce the wordsize.

With all this knowledge and with the help from the Moxie support I finally
arranged a new commit, where I duplicated the files that I needed to duplicate,
added the correct references to the makefile fragments and I even fixed some of
the variables in the makefiles: [`f42a214`][final-cleanup]

[final-cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/f42a21427361fb2d6d8481d143258af3237fd232

Yeah, all this was hard to deduce, because this buildsystem is really complex
and makefiles are really hard to debug[^debug-makefile]. Also the fact that I
don't understand why I need to replicate the `t-softp*` files in both places
drives me mad, but I have to learn to deal with the fact that I can't
understand everything.

[^debug-makefile]: Try to run `make --debug` in a project of the size of GCC
  and laugh with me.

In these commits you can see I deleted references to `extra_parts` and some
other things, too. The reason is simple: if other architectures don't need
to set those variables, me neither. In the end, the `crt*` files were generated
anyway.


#### Other changes

I also removed `-latomic` from the calls to the linker because it looks like it
didn't exist back then (we'll see how this explodes in my face in the future),
and fixed a couple of things more, but that's not really interesting in my
opinion[^interesting].

[^interesting]: The rest of the post is not really interesting either, but I
  need to report what I did. It's just me fighting against myself and a very
  complex buildsystem that could've been simpler and/or better documented.


### Missing things

There are many things missing still, but this some I won't even try because
they are out of the scope of the project. Remember: **we just need to be able
to compile a more recent GCC**, not the rest of the world.

Some of the things I left might become mandatory in the near future as we do
proper testing of all this. My goal here was to provide something that can run,
and then I'll collaborate with the different agents in this bootstrapping
effort to fix anything we need to reach the full bootstrapping support.

There are few obvious things missing:

- **Big Endian support**: `riscv64be-linux-gnu` support, basically (note the
  `be` in the target name). I won't add this until we are sure we need it. It
  shouldn't be difficult, I already found some commits in the main GCC where
  this was added and they were simple.
- **Specific device support**: we didn't add support for any specific device
  yet, that's something we'll need to think about in the future, but we
  probably won't add because it will make us maintain more code, and I don't
  think generic RISC-V code is going to have issues in the majority of the
  devices.
- There are also **many commits that came after** the main port that fix some
  relocations and some other things. Many of them are not really relevant,
  because most of them are related with bugs that were introduced later, fix
  things that won't change anything in the only program we need to build (GCC)
  and so on. In order to know which ones are relevant we need...
- **Proper testing!** I didn't do this yet, and I'll probably need help with
  it. Compile your RISC-V software with this and give it a try! Send me the
  errors you get!
- **Libatomic**: was directly removed from the calls to the linker, as I
  mentioned before and we have to make sure it didn't exist back then and so
  on. Boring things...
- I didn't even bother to add the **testsuite support**, our only test has to
  be if we are able to compile GCC with this, which I didn't really try yet
  anyway (because it needs some extra things).

### Conclusion

This part of the project came in the worst moment. I wasn't really motivated
and I had some personal things going on. It was difficult for me to do this.

In contrast with what I did in the previous steps of the project, this part is
really uninteresting because it doesn't give you a lot of chances for learning,
which is the only thing that keeps me alive at this point.

It's also pretty boring and exasperating to feel you'll never understand
something and trying and trying almost in a *trial and error* way is really
boring for someone like me.

Sometimes, working like this makes you feel really alone. You have almost no
people to help you, and the project needs a huge amount of context to be
understood so you can't ask for help to *anyone*, and those who are supposed to
know are really hard to reach. Or what it might be worse: maybe there's none
that understands this thing well, because it's old, it changed a lot and
probably just a handful of people do really took part in the development of the
<del>fucking</del> buildsystem.

In conclusion, this is boring and uninteresting job, but someone has to do
this, and... It was my turn this time.

You go next.