diff options
Diffstat (limited to 'content/bootstrapGcc')
-rw-r--r-- | content/bootstrapGcc/04_full_compiler.md | 513 |
1 files changed, 513 insertions, 0 deletions
diff --git a/content/bootstrapGcc/04_full_compiler.md b/content/bootstrapGcc/04_full_compiler.md new file mode 100644 index 0000000..1c04e7a --- /dev/null +++ b/content/bootstrapGcc/04_full_compiler.md @@ -0,0 +1,513 @@ +Title: Milestone — Source to Binary RISC-V support in GCC 4.6.4 +Date: 2022-06-20 +Category: +Tags: Bootstrapping GCC in RISC-V +Slug: bootstrapGcc4 +Lang: en +Summary: + Description of the changes applied from a minimal compiler that runs and + generates assembly to something that is actually able to compile, + interacting with binutils and having a working libgcc. + +In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC, +and we already shared how I backported the RISC-V support from the GCC core to +GCC-4.6.4. Now it's time to finish what we left half-done and actually +introduce a *full* RISC-V compiler. + +### Where we left last time + +The Tuesday, 7th of April, I marked a commit with the `minimal-compiler` tag. +That commit contains all the work we did until that time. In that tag we +describe how we can build a compiler that is only able to assemble files to +RISC-V. + +As we already explained around here, GCC is a driver program that calls other +programs to do its work. The GCC core compiles the code to assembly language +and then calls binutils to do the rest of the work: assembly and linking. + +At that point, we had to call binutils by hand. + + +### The changes + +The changes applied at the time of writing are available in the +[`working-compiler`][working-compiler] tag. As the tag message describes, they +were split in two different branches: the `guix-package` branch and the `riscv` +branch. + +[working-compiler]: https://github.com/ekaitz-zarraga/gcc/releases/tag/working-compiler + +The `guix_package` branch is merged in the `riscv` branch but this split lets +us differentiate which changes are related with the compiler itself and which +are related with the tooling around the compiler. That way we'll be able to +choose what to do with the commits easily in the future. We'll probably need to +rearrange some stuff. + + +### The context is everything: Guix package part + +The `guix_package` branch contains all the commits that make the Guix tooling +around the project work. This includes the compilation process definition in a +reproducible way, the environment setup and all that. + +As the `working-compiler` tag message describes, this is the way you can +currently make this compiler work and play with it: + +``` bash +$ guix shell -m manifest.scm +$ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu + # This second command will prepare the PATH and other environment + # variables to make GCC find libraries and executables +``` + +> If you use this in the future and it fails, it might be because between the +> time this blog post was written and you read it Guix made some changes in the +> core packages that are used. You can always use the `time-machine` utility to +> make sure you use everything like in the moment this post was written: +> `guix time-machine --channels=channels.scm -- shell -m manifest.scm` + + +From this point you can directly run the compiler, it will need the `sysroot` +option to be able to find the `crt*` files, but that's something I'm not +worried about at this point, we'll fix that when we integrate this in the +bootstrapping process. + +Run the compiler like this now: + +``` bash +$ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static] ... +``` + +#### Notable changes in the Guix side + +The most notable change in the Guix side is the addition of the `manifest.scm` +file and also the `PREPARE_FOR_COMPILATION.sh` file. With the help of my man +Janneke, I realized the problems I had came from the fact that I was calling +the compiler with the wrong environment and it was unable to find the linker +and the assembler. Yes, this kind of things happen a lot in Guix if you are not +careful (and I am *not* careful at all). Adding these tools let me prepare a +working environment where the assembler and the compiler are found and called +properly. + +This change also includes the some interesting extras: the GLibC added to the +manifest also contains the static version so we can generate static binaries +that are easier to test in an emulated environment without having to deal with +the dynamic linker. Important stuff. + +Also, now the compilation process relies on a newer Guix version, which removed +the `-unknown` part from the triplets (actually *quadruplets*), like +`riscv64-unknown-linux-gnu`. That was a little bit of a pain, because I just +tried to compile everything one day and failed, and in the end it was just that +small change. I decided to update the Guix version needed to keep it up-to-date +with the current Guix, so I didn't need to run `guix time-machine` each time. +It's better like this. + +If you want to read more about the change and see how fast Guix +people helped me understand what was going on, [see this mailing list +thread][ml-unknown][^guix]. I have also to mention that I needed to add a small +change to my GCC to be able to work in the case the `-unknown` part was not +added to it: adding `riscv` to `config.sub` was enough for that. + +[^guix]: Some people also spent time with me in the IRC. Thanks to all that + helped! + +[ml-unknown]: https://lists.gnu.org/archive/html/bug-guix/2022-06/msg00092.html + +I also fixed a couple of extra things but they are not really relevant for +this. Having a working environment preparation is a nice milestone by itself, +but we did some things more on the GCC side! + + +### Road to a working compiler: The GCC part + +The changes in the `riscv` branch contain some commits, most of them are small, +but they are really important. I have to say this is full of details I don't +really understand, so I'll try to focus on those I actually do. The rest of +them are simply things that happened to work in the end. You know, this is +pretty old software and the project is too complex to understand it all... + +#### Memory models and fences + +First, before doing anything else, we mentioned in the previous post that the +memory models were something we needed to review. We knew this because the code +related to memory models was used in a couple of parts of the RISC-V code we +copied from the GCC 7.5 codebase, but it was not available in GCC 4.6.4. That +API simply did not exist back then. + +The commit [`71dc25d`][memmodels] removes the memory models from the code +(which were already commented out but not solved), taking in account the most +conservative approach: always add the `.aq` flag and the `fence` instruction. +This is not optimal, but the performance penalty is negligible and it's not +affecting the functionality. + +[memmodels]: https://github.com/ekaitz-zarraga/gcc/commit/71dc25d08354dead26180bd552c0c3e299b012cb + +I did not come up with this myself, as I mentioned in the previous post, I +asked the maintainer of the RISC-V support of GCC (who is also one of the big +names of RISC-V) about this and he gave me this solution. + +I also had to change the optabs a little bit, using `memory_barrier` instead of +one of the more recent optabs. For this I just compared the code from the MIPS +architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for +many other parts of this work. Easy-peasy. + +#### Wrong arguments in the assembler call + +As I mentioned in the Guix part, we were unable to call the assembler. This +means we didn't uncover the assembler call was broken until we actually put it +in the `PATH` and tried to call it. + +The commit [`7030067`][as-call] shows how I needed to make small changes in the +way the assembler is called by GCC to ensure that it was called correctly. + +[as-call]: https://github.com/ekaitz-zarraga/gcc/commit/7030067e6aa54b44a2f2447d4e706e76bc88f696 + +This issue was easy to fix, but not that easy to catch. First I found the +assembler was complaining because it didn't understand the `-k-march` option. I +spent some time realizing the problem was that those were to options that were +merged together due to a lack of a space. Yes, the space in the end of the line +**is relevant**. + +I directly removed the `-k` option from the `ASM_SPEC` because my assembler was +considering it ambiguous. I don't remember where I copied this from but it +works and I don't want to think about it ever again. + + + +#### Libgcc: the core of this change + +The biggest thing in this set of changes was the addition of `libgcc`, which +is mandatory if you want to link your programs compiled with GCC. `libgcc` is a +library GCC uses for complex operations: instead of generating the assembly +code directly, it generates calls to `libgcc`, where those complex operations +are defined. You can read further about those operations but they are not +really relevant for this post, the relevant part is we need to add `libgcc` in +order to have a working compiler. + +The GCC codebase has different folders for its different blocks, so it's +not surprising to see there's a folder called `gcc` for the core and a folder +called `libgcc` for `libgcc`. Anyone would expect that just cherry picking the +commit that added the `libgcc` support to GCC 7.5 would be enough to have the +backport ready. + +Sadly, life is a little bit harder than that. + +##### Cherry picking the libgcc support + +The first and easiest thing to do is to cherry pick the commit +[`72add2f`][libgcc-commit] and pray. It looked plausible to make it work, +because, if you look at the changes it makes, it's pretty well contained in the +`libgcc/config/riscv/` folder and adds just a couple of lines to the +`libgcc/config.sub` to make it find the `riscv` folder. + +[libgcc-commit]: https://github.com/ekaitz-zarraga/gcc/commit/72add2fa4c354af4bf8db0b8dcb50c5b076b3ae5 + +The contents of the commit are pretty clear: + +1. Some assembly files that implement some operations +2. Some header files and C code that implement other things +3. Some weird files called `t-something` + +The first two types of files we can understand as the body of the `libgcc` +support: the juice. The `t-something` files are what are called Makefile +Fragments. + +The Makefile Fragments are the basis of the GCC build system. The files like +`config.host`, also part of the commit, sets a variable, `tmake_file`, where +all the `t-something`s are added so the compiler generator framework knows how +to build the things according to the rules described in them. + +That's how GCC buildsystem works. Now let's talk about the problems. + +##### LIB2ADD iteration is broken + +First thing I realized when I did the cherry pick of the `libgcc` support was +the whole thing did not build anymore. There was a crazy issue here. + +We are not going to talk about `LIB2ADD` variable yet, but we can see this +small change, [`b9c7f39`][lib2add], affects it. The main issue here was the +whole makefile system (`*.mk` files in `libgcc`) was iterating over the values +of the variable wrong, because `libgcc` support commit was appending values to +`LIB2ADD` instead of setting it. The `LIB2ADD` variable was set empty from the +main makefiles, and appending to it was leaving an empty entry, so the +iteration process was trying to compile an empty value. + +[lib2add]: https://github.com/ekaitz-zarraga/gcc/commit/b9c7f394b33a60c1e64191b0e31f0cf98d6a5f93 + +This was superhard to debug, but this small change just made the whole thing +compile and now I was able to test the whole thing further. + +##### Still broken + +But it was still broken. GCC didn't want to compile. Some weird errors +appeared, mentioning something like the `extra_parts` were not coherent between +`gcc` and `libgcc`. Weird. + +Reading `gcc/config.gcc` and `libgcc/config.host` I realized the use of the +`extra_parts` variable and how it was certainly incoherent between the two +files. But why? + +This led me to analyze the whole build system, comparing the RISC-V support +with others. I realized here that the buildsystem is mixed in `gcc` and +`libgcc` folders and it's extremely difficult to know what's the line that +separates one from another. + +Apart from that, the buildsystem was unable to compile the `crt*` files, +because it didn't know how to do it... The recipes were missing. + +This made me go for the most aggressive change possible, +[`9c0f736`][aggressive-fix]: just copy everything from the +`libgcc/config/riscv/` to the `gcc/config/riscv`, add the rules for the `crt*` +files and make the `extra_parts` coherent. + +[aggressive-fix]: https://github.com/ekaitz-zarraga/gcc/commit/9c0f7364b89acb38ea3af1cbe1884059671b3c04 + +Of course, this is not a good change, but it lets us try if the generated +compiler is able to compile anything. *"I'll have time to clean this up later"* +I thought. + + +##### The buildsystem is just a pain in the butt + +Now I was able to compile the GCC, so I could try it for some things. + +I build a RISC-V cross compiler and tried to statically compile a small Hello +World program. Errors appeared: + +``` unknown +/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup': +/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2' +/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l': +/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2' +/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2' +/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex': +/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2' +/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2' +/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2' +collect2: ld returned 1 exit status +``` + +The most logical thing to do was to build a MIPS cross compiler and check if +the same issue appeared. Of course, it didn't. + +Researching a little bit in the old GCC internals documentation, I found a +couple of interesting things: + +<https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment> + +- The `LIB2FUNCS_EXTRA` variable is the one that contains what it should be + compiled and added to `libgcc`. +- **Floating Point Emulation** support is added by generating a couple of files + with some macros on top: `fp-bit.c` and `dp-bit.c`. + +Neither of those were used in the `libgcc` support we backported because the +GCC buildsystem changed a lot since 4.6.4. In fact, there is a commit[^commit], +much later than the 4.6.4 release, that removes the need to generate those +`fp-bit.c` thingies. + +[^commit]: `569dc494616700a3cf078da0cc631c36a4f15821` + +The `LIB2FUNCS_EXTRA` variable was not used either, but somewhere in the +makefiles I found `LIB2ADD` was set from it. It looks like the whole +buildsystem changed from `LIB2FUNCS_EXTRA` to `LIB2ADD`, which was an internal +variable in the past. I don't know. + +I just moved the `LIB2ADD` to `LIB2FUNCS_EXTRA` and set the floating point +emulation in the `t-riscv` makefile fragment and hoped my work was done there. + +##### A huge pain in the butt + +It still failed, but at least now the `__letf2` symbol was found. The only one +I needed to fix now was `__unordtf2`. + +I was disheartened. + +The `__unordtf2` name did not appear anywhere in the code, but building +`libgcc` for MIPS had the symbol inside (I checked it with `nm`!). I had no +idea of what was going on. + +I asked all my peers about this, and I was sent a program that was actually +compilable and runnable (Janneke is a genius, someone has to say it!): + +``` clike +#include <stdio.h> + +int +main () +{ + return printf ("Hello, world!\n"); +} + +int +__unordtf2 () +{ + return 0; +} +``` + +Hah! Still, no solution, but it was a little bit of hope. + +This gave me the energy I needed to research further. This `__unordtf2` +function comes from software floating point support but the makefile fragments +in the `libgcc` folder seem to be correctly set... + +##### Moxie for the rescue + +MIPS architecture was too complex to be understandable for this humble human +being so I decided to go for Moxie this time. + +[Moxie](http://moxielogic.org/blog/pages/architecture.html) is a really +interesting thing. But we are not going to spend time on it, but in its support +in GCC 4.6.4. Take a look to the files on both parts of the Moxie support: the +`libgcc` and `gcc`: + +``` unknown +gcc/config/moxie +├── constraints.md +├── crti.asm +├── crtn.asm +├── moxie.c +├── moxie.h +├── moxie.md +├── moxie-protos.h +├── predicates.md +├── rtems.h +├── sfp-machine.h +├── t-moxie +├── t-moxie-softfp +└── uclinux.h + +libgcc/config/moxie +├── crti.asm +├── crtn.asm +├── sfp-machine.h +├── t-moxie +└── t-moxie-softfp +``` + +As you can see, some things are repeated, and most of the files are located in +the `gcc` part, which was not the case in the backported commit. I used this as +a reference for a massive cleanup of the previous aggressive duplication and I +ended up with this commit: [`703efe3`][cleanup] + +[cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/703efe3e86e68fe05380e996943c831e7ad9a541 + +But that wasn't enough. + +I also found that the `soft-fp` support did not come from the `libgcc` +directory, but from the `gcc` one, so I needed to fix some makefile fragments. +The reference on how to do that was located in `gcc/config/soft-fp/t-softfp`. +This file described all the variables that I needed to set up to make the whole +process find the software floating point functions to add (see how the function +names are built with the `$(m)` variable? That's why I couldn't find where did +the `__unordtf2` came from...). + +Those variables were set in `libgcc/config/riscv/t-softp*` files. I replicated +them in `gcc/config/riscv` as in the Moxie target and added referenced to them +to the `gcc/config.gcc` file, copying the lines I had `libgcc/config.host`. The +process was still failing, as the variables were not found by the main +makefile. I decided to hardcode them and give it another go, this time it built +and I was able to build files and the weird errors did not appear anymore. + +I realized in the end that the reason why the main makefile wasn't finding the +variables was because I was referring to the `t-softfp*` files through the +variable `host_address`, as it was done in the `libgcc/config.host`. The +problem was that variable was not available in the main `gcc/config.gcc` file +so I had to make a beautiful `switch-case` to deduce the wordsize. + +With all this knowledge and with the help from the Moxie support I finally +arranged a new commit, where I duplicated the files that I needed to duplicate, +added the correct references to the makefile fragments and I even fixed some of +the variables in the makefiles: [`f42a214`][final-cleanup] + +[final-cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/f42a21427361fb2d6d8481d143258af3237fd232 + +Yeah, all this was hard to deduce, because this buildsystem is really complex +and makefiles are really hard to debug[^debug-makefile]. Also the fact that I +don't understand why I need to replicate the `t-softp*` files in both places +drives me mad, but I have to learn to deal with the fact that I can't +understand everything. + +[^debug-makefile]: Try to run `make --debug` in a project of the size of GCC + and laugh with me. + +In these commits you can see I deleted references to `extra_parts` and some +other things, too. The reason is simple: if other architectures don't need +to set those variables, me neither. In the end, the `crt*` files were generated +anyway. + + +#### Other changes + +I also removed `-latomic` from the calls to the linker because it looks like it +didn't exist back then (we'll see how this explodes in my face in the future), +and fixed a couple of things more, but that's not really interesting in my +opinion[^interesting]. + +[^interesting]: The rest of the post is not really interesting either, but I + need to report what I did. It's just me fighting against myself and a very + complex buildsystem that could've been simpler and/or better documented. + + +### Missing things + +There are many things missing still, but this some I won't even try because +they are out of the scope of the project. Remember: **we just need to be able +to compile a more recent GCC**, not the rest of the world. + +Some of the things I left might become mandatory in the near future as we do +proper testing of all this. My goal here was to provide something that can run, +and then I'll collaborate with the different agents in this bootstrapping +effort to fix anything we need to reach the full bootstrapping support. + +There are few obvious things missing: + +- **Big Endian support**: `riscv64be-linux-gnu` support, basically (note the + `be` in the target name). I won't add this until we are sure we need it. It + shouldn't be difficult, I already found some commits in the main GCC where + this was added and they were simple. +- **Specific device support**: we didn't add support for any specific device + yet, that's something we'll need to think about in the future, but we + probably won't add because it will make us maintain more code, and I don't + think generic RISC-V code is going to have issues in the majority of the + devices. +- There are also **many commits that came after** the main port that fix some + relocations and some other things. Many of them are not really relevant, + because most of them are related with bugs that were introduced later, fix + things that won't change anything in the only program we need to build (GCC) + and so on. In order to know which ones are relevant we need... +- **Proper testing!** I didn't do this yet, and I'll probably need help with + it. Compile your RISC-V software with this and give it a try! Send me the + errors you get! +- **Libatomic**: was directly removed from the calls to the linker, as I + mentioned before and we have to make sure it didn't exist back then and so + on. Boring things... +- I didn't even bother to add the **testsuite support**, our only test has to + be if we are able to compile GCC with this, which I didn't really try yet + anyway (because it needs some extra things). + +### Conclusion + +This part of the project came in the worst moment. I wasn't really motivated +and I had some personal things going on. It was difficult for me to do this. + +In contrast with what I did in the previous steps of the project, this part is +really uninteresting because it doesn't give you a lot of chances for learning, +which is the only thing that keeps me alive at this point. + +It's also pretty boring and exasperating to feel you'll never understand +something and trying and trying almost in a *trial and error* way is really +boring for someone like me. + +Sometimes, working like this makes you feel really alone. You have almost no +people to help you, and the project needs a huge amount of context to be +understood so you can't ask for help to *anyone*, and those who are supposed to +know are really hard to reach. Or what it might be worse: maybe there's none +that understands this thing well, because it's old, it changed a lot and +probably just a handful of people do really took part in the development of the +<del>fucking</del> buildsystem. + +In conclusion, this is boring and uninteresting job, but someone has to do +this, and... It was my turn this time. + +You go next. |