Title: Milestone — Source to Binary RISC-V support in GCC 4.6.4 Date: 2022-06-20 Category: Tags: Bootstrapping GCC in RISC-V Slug: bootstrapGcc4 Lang: en Summary: Description of the changes applied from a minimal compiler that runs and generates assembly to something that is actually able to compile, interacting with binutils and having a working libgcc. In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC, and we already shared how I backported the RISC-V support from the GCC core to GCC-4.6.4. Now it's time to finish what we left half-done and actually introduce a *full* RISC-V compiler. ### Where we left last time The Tuesday, 7th of April, I marked a commit with the `minimal-compiler` tag. That commit contains all the work we did until that time. In that tag we describe how we can build a compiler that is only able to assemble files to RISC-V. As we already explained around here, GCC is a driver program that calls other programs to do its work. The GCC core compiles the code to assembly language and then calls binutils to do the rest of the work: assembly and linking. At that point, we had to call binutils by hand. ### The changes The changes applied at the time of writing are available in the [`working-compiler`][working-compiler] tag. As the tag message describes, they were split in two different branches: the `guix-package` branch and the `riscv` branch. [working-compiler]: https://github.com/ekaitz-zarraga/gcc/releases/tag/working-compiler The `guix_package` branch is merged in the `riscv` branch but this split lets us differentiate which changes are related with the compiler itself and which are related with the tooling around the compiler. That way we'll be able to choose what to do with the commits easily in the future. We'll probably need to rearrange some stuff. ### The context is everything: Guix package part The `guix_package` branch contains all the commits that make the Guix tooling around the project work. This includes the compilation process definition in a reproducible way, the environment setup and all that. As the `working-compiler` tag message describes, this is the way you can currently make this compiler work and play with it: ``` bash $ guix shell -m manifest.scm $ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu # This second command will prepare the PATH and other environment # variables to make GCC find libraries and executables ``` > If you use this in the future and it fails, it might be because between the > time this blog post was written and you read it Guix made some changes in the > core packages that are used. You can always use the `time-machine` utility to > make sure you use everything like in the moment this post was written: > `guix time-machine --channels=channels.scm -- shell -m manifest.scm` From this point you can directly run the compiler, it will need the `sysroot` option to be able to find the `crt*` files, but that's something I'm not worried about at this point, we'll fix that when we integrate this in the bootstrapping process. Run the compiler like this now: ``` bash $ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static] ... ``` #### Notable changes in the Guix side The most notable change in the Guix side is the addition of the `manifest.scm` file and also the `PREPARE_FOR_COMPILATION.sh` file. With the help of my man Janneke, I realized the problems I had came from the fact that I was calling the compiler with the wrong environment and it was unable to find the linker and the assembler. Yes, this kind of things happen a lot in Guix if you are not careful (and I am *not* careful at all). Adding these tools let me prepare a working environment where the assembler and the compiler are found and called properly. This change also includes the some interesting extras: the GLibC added to the manifest also contains the static version so we can generate static binaries that are easier to test in an emulated environment without having to deal with the dynamic linker. Important stuff. Also, now the compilation process relies on a newer Guix version, which removed the `-unknown` part from the triplets (actually *quadruplets*), like `riscv64-unknown-linux-gnu`. That was a little bit of a pain, because I just tried to compile everything one day and failed, and in the end it was just that small change. I decided to update the Guix version needed to keep it up-to-date with the current Guix, so I didn't need to run `guix time-machine` each time. It's better like this. If you want to read more about the change and see how fast Guix people helped me understand what was going on, [see this mailing list thread][ml-unknown][^guix]. I have also to mention that I needed to add a small change to my GCC to be able to work in the case the `-unknown` part was not added to it: adding `riscv` to `config.sub` was enough for that. [^guix]: Some people also spent time with me in the IRC. Thanks to all that helped! [ml-unknown]: https://lists.gnu.org/archive/html/bug-guix/2022-06/msg00092.html I also fixed a couple of extra things but they are not really relevant for this. Having a working environment preparation is a nice milestone by itself, but we did some things more on the GCC side! ### Road to a working compiler: The GCC part The changes in the `riscv` branch contain some commits, most of them are small, but they are really important. I have to say this is full of details I don't really understand, so I'll try to focus on those I actually do. The rest of them are simply things that happened to work in the end. You know, this is pretty old software and the project is too complex to understand it all... #### Memory models and fences First, before doing anything else, we mentioned in the previous post that the memory models were something we needed to review. We knew this because the code related to memory models was used in a couple of parts of the RISC-V code we copied from the GCC 7.5 codebase, but it was not available in GCC 4.6.4. That API simply did not exist back then. The commit [`71dc25d`][memmodels] removes the memory models from the code (which were already commented out but not solved), taking in account the most conservative approach: always add the `.aq` flag and the `fence` instruction. This is not optimal, but the performance penalty is negligible and it's not affecting the functionality. [memmodels]: https://github.com/ekaitz-zarraga/gcc/commit/71dc25d08354dead26180bd552c0c3e299b012cb I did not come up with this myself, as I mentioned in the previous post, I asked the maintainer of the RISC-V support of GCC (who is also one of the big names of RISC-V) about this and he gave me this solution. I also had to change the optabs a little bit, using `memory_barrier` instead of one of the more recent optabs. For this I just compared the code from the MIPS architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for many other parts of this work. Easy-peasy. #### Wrong arguments in the assembler call As I mentioned in the Guix part, we were unable to call the assembler. This means we didn't uncover the assembler call was broken until we actually put it in the `PATH` and tried to call it. The commit [`7030067`][as-call] shows how I needed to make small changes in the way the assembler is called by GCC to ensure that it was called correctly. [as-call]: https://github.com/ekaitz-zarraga/gcc/commit/7030067e6aa54b44a2f2447d4e706e76bc88f696 This issue was easy to fix, but not that easy to catch. First I found the assembler was complaining because it didn't understand the `-k-march` option. I spent some time realizing the problem was that those were to options that were merged together due to a lack of a space. Yes, the space in the end of the line **is relevant**. I directly removed the `-k` option from the `ASM_SPEC` because my assembler was considering it ambiguous. I don't remember where I copied this from but it works and I don't want to think about it ever again. #### Libgcc: the core of this change The biggest thing in this set of changes was the addition of `libgcc`, which is mandatory if you want to link your programs compiled with GCC. `libgcc` is a library GCC uses for complex operations: instead of generating the assembly code directly, it generates calls to `libgcc`, where those complex operations are defined. You can read further about those operations but they are not really relevant for this post, the relevant part is we need to add `libgcc` in order to have a working compiler. The GCC codebase has different folders for its different blocks, so it's not surprising to see there's a folder called `gcc` for the core and a folder called `libgcc` for `libgcc`. Anyone would expect that just cherry picking the commit that added the `libgcc` support to GCC 7.5 would be enough to have the backport ready. Sadly, life is a little bit harder than that. ##### Cherry picking the libgcc support The first and easiest thing to do is to cherry pick the commit [`72add2f`][libgcc-commit] and pray. It looked plausible to make it work, because, if you look at the changes it makes, it's pretty well contained in the `libgcc/config/riscv/` folder and adds just a couple of lines to the `libgcc/config.sub` to make it find the `riscv` folder. [libgcc-commit]: https://github.com/ekaitz-zarraga/gcc/commit/72add2fa4c354af4bf8db0b8dcb50c5b076b3ae5 The contents of the commit are pretty clear: 1. Some assembly files that implement some operations 2. Some header files and C code that implement other things 3. Some weird files called `t-something` The first two types of files we can understand as the body of the `libgcc` support: the juice. The `t-something` files are what are called Makefile Fragments. The Makefile Fragments are the basis of the GCC build system. The files like `config.host`, also part of the commit, sets a variable, `tmake_file`, where all the `t-something`s are added so the compiler generator framework knows how to build the things according to the rules described in them. That's how GCC buildsystem works. Now let's talk about the problems. ##### LIB2ADD iteration is broken First thing I realized when I did the cherry pick of the `libgcc` support was the whole thing did not build anymore. There was a crazy issue here. We are not going to talk about `LIB2ADD` variable yet, but we can see this small change, [`b9c7f39`][lib2add], affects it. The main issue here was the whole makefile system (`*.mk` files in `libgcc`) was iterating over the values of the variable wrong, because `libgcc` support commit was appending values to `LIB2ADD` instead of setting it. The `LIB2ADD` variable was set empty from the main makefiles, and appending to it was leaving an empty entry, so the iteration process was trying to compile an empty value. [lib2add]: https://github.com/ekaitz-zarraga/gcc/commit/b9c7f394b33a60c1e64191b0e31f0cf98d6a5f93 This was superhard to debug, but this small change just made the whole thing compile and now I was able to test the whole thing further. ##### Still broken But it was still broken. GCC didn't want to compile. Some weird errors appeared, mentioning something like the `extra_parts` were not coherent between `gcc` and `libgcc`. Weird. Reading `gcc/config.gcc` and `libgcc/config.host` I realized the use of the `extra_parts` variable and how it was certainly incoherent between the two files. But why? This led me to analyze the whole build system, comparing the RISC-V support with others. I realized here that the buildsystem is mixed in `gcc` and `libgcc` folders and it's extremely difficult to know what's the line that separates one from another. Apart from that, the buildsystem was unable to compile the `crt*` files, because it didn't know how to do it... The recipes were missing. This made me go for the most aggressive change possible, [`9c0f736`][aggressive-fix]: just copy everything from the `libgcc/config/riscv/` to the `gcc/config/riscv`, add the rules for the `crt*` files and make the `extra_parts` coherent. [aggressive-fix]: https://github.com/ekaitz-zarraga/gcc/commit/9c0f7364b89acb38ea3af1cbe1884059671b3c04 Of course, this is not a good change, but it lets us try if the generated compiler is able to compile anything. *"I'll have time to clean this up later"* I thought. ##### The buildsystem is just a pain in the butt Now I was able to compile the GCC, so I could try it for some things. I build a RISC-V cross compiler and tried to statically compile a small Hello World program. Errors appeared: ``` unknown /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup': /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2' /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l': /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2' /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2' /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex': /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2' /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2' /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2' collect2: ld returned 1 exit status ``` The most logical thing to do was to build a MIPS cross compiler and check if the same issue appeared. Of course, it didn't. Researching a little bit in the old GCC internals documentation, I found a couple of interesting things: - The `LIB2FUNCS_EXTRA` variable is the one that contains what it should be compiled and added to `libgcc`. - **Floating Point Emulation** support is added by generating a couple of files with some macros on top: `fp-bit.c` and `dp-bit.c`. Neither of those were used in the `libgcc` support we backported because the GCC buildsystem changed a lot since 4.6.4. In fact, there is a commit[^commit], much later than the 4.6.4 release, that removes the need to generate those `fp-bit.c` thingies. [^commit]: `569dc494616700a3cf078da0cc631c36a4f15821` The `LIB2FUNCS_EXTRA` variable was not used either, but somewhere in the makefiles I found `LIB2ADD` was set from it. It looks like the whole buildsystem changed from `LIB2FUNCS_EXTRA` to `LIB2ADD`, which was an internal variable in the past. I don't know. I just moved the `LIB2ADD` to `LIB2FUNCS_EXTRA` and set the floating point emulation in the `t-riscv` makefile fragment and hoped my work was done there. ##### A huge pain in the butt It still failed, but at least now the `__letf2` symbol was found. The only one I needed to fix now was `__unordtf2`. I was disheartened. The `__unordtf2` name did not appear anywhere in the code, but building `libgcc` for MIPS had the symbol inside (I checked it with `nm`!). I had no idea of what was going on. I asked all my peers about this, and I was sent a program that was actually compilable and runnable (Janneke is a genius, someone has to say it!): ``` clike #include int main () { return printf ("Hello, world!\n"); } int __unordtf2 () { return 0; } ``` Hah! Still, no solution, but it was a little bit of hope. This gave me the energy I needed to research further. This `__unordtf2` function comes from software floating point support but the makefile fragments in the `libgcc` folder seem to be correctly set... ##### Moxie for the rescue MIPS architecture was too complex to be understandable for this humble human being so I decided to go for Moxie this time. [Moxie](http://moxielogic.org/blog/pages/architecture.html) is a really interesting thing. But we are not going to spend time on it, but in its support in GCC 4.6.4. Take a look to the files on both parts of the Moxie support: the `libgcc` and `gcc`: ``` unknown gcc/config/moxie ├── constraints.md ├── crti.asm ├── crtn.asm ├── moxie.c ├── moxie.h ├── moxie.md ├── moxie-protos.h ├── predicates.md ├── rtems.h ├── sfp-machine.h ├── t-moxie ├── t-moxie-softfp └── uclinux.h libgcc/config/moxie ├── crti.asm ├── crtn.asm ├── sfp-machine.h ├── t-moxie └── t-moxie-softfp ``` As you can see, some things are repeated, and most of the files are located in the `gcc` part, which was not the case in the backported commit. I used this as a reference for a massive cleanup of the previous aggressive duplication and I ended up with this commit: [`703efe3`][cleanup] [cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/703efe3e86e68fe05380e996943c831e7ad9a541 But that wasn't enough. I also found that the `soft-fp` support did not come from the `libgcc` directory, but from the `gcc` one, so I needed to fix some makefile fragments. The reference on how to do that was located in `gcc/config/soft-fp/t-softfp`. This file described all the variables that I needed to set up to make the whole process find the software floating point functions to add (see how the function names are built with the `$(m)` variable? That's why I couldn't find where did the `__unordtf2` came from...). Those variables were set in `libgcc/config/riscv/t-softp*` files. I replicated them in `gcc/config/riscv` as in the Moxie target and added referenced to them to the `gcc/config.gcc` file, copying the lines I had `libgcc/config.host`. The process was still failing, as the variables were not found by the main makefile. I decided to hardcode them and give it another go, this time it built and I was able to build files and the weird errors did not appear anymore. I realized in the end that the reason why the main makefile wasn't finding the variables was because I was referring to the `t-softfp*` files through the variable `host_address`, as it was done in the `libgcc/config.host`. The problem was that variable was not available in the main `gcc/config.gcc` file so I had to make a beautiful `switch-case` to deduce the wordsize. With all this knowledge and with the help from the Moxie support I finally arranged a new commit, where I duplicated the files that I needed to duplicate, added the correct references to the makefile fragments and I even fixed some of the variables in the makefiles: [`f42a214`][final-cleanup] [final-cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/f42a21427361fb2d6d8481d143258af3237fd232 Yeah, all this was hard to deduce, because this buildsystem is really complex and makefiles are really hard to debug[^debug-makefile]. Also the fact that I don't understand why I need to replicate the `t-softp*` files in both places drives me mad, but I have to learn to deal with the fact that I can't understand everything. [^debug-makefile]: Try to run `make --debug` in a project of the size of GCC and laugh with me. In these commits you can see I deleted references to `extra_parts` and some other things, too. The reason is simple: if other architectures don't need to set those variables, me neither. In the end, the `crt*` files were generated anyway. #### Other changes I also removed `-latomic` from the calls to the linker because it looks like it didn't exist back then (we'll see how this explodes in my face in the future), and fixed a couple of things more, but that's not really interesting in my opinion[^interesting]. [^interesting]: The rest of the post is not really interesting either, but I need to report what I did. It's just me fighting against myself and a very complex buildsystem that could've been simpler and/or better documented. ### Missing things There are many things missing still, but this some I won't even try because they are out of the scope of the project. Remember: **we just need to be able to compile a more recent GCC**, not the rest of the world. Some of the things I left might become mandatory in the near future as we do proper testing of all this. My goal here was to provide something that can run, and then I'll collaborate with the different agents in this bootstrapping effort to fix anything we need to reach the full bootstrapping support. There are few obvious things missing: - **Big Endian support**: `riscv64be-linux-gnu` support, basically (note the `be` in the target name). I won't add this until we are sure we need it. It shouldn't be difficult, I already found some commits in the main GCC where this was added and they were simple. - **Specific device support**: we didn't add support for any specific device yet, that's something we'll need to think about in the future, but we probably won't add because it will make us maintain more code, and I don't think generic RISC-V code is going to have issues in the majority of the devices. - There are also **many commits that came after** the main port that fix some relocations and some other things. Many of them are not really relevant, because most of them are related with bugs that were introduced later, fix things that won't change anything in the only program we need to build (GCC) and so on. In order to know which ones are relevant we need... - **Proper testing!** I didn't do this yet, and I'll probably need help with it. Compile your RISC-V software with this and give it a try! Send me the errors you get! - **Libatomic**: was directly removed from the calls to the linker, as I mentioned before and we have to make sure it didn't exist back then and so on. Boring things... - I didn't even bother to add the **testsuite support**, our only test has to be if we are able to compile GCC with this, which I didn't really try yet anyway (because it needs some extra things). ### Conclusion This part of the project came in the worst moment. I wasn't really motivated and I had some personal things going on. It was difficult for me to do this. In contrast with what I did in the previous steps of the project, this part is really uninteresting because it doesn't give you a lot of chances for learning, which is the only thing that keeps me alive at this point. It's also pretty boring and exasperating to feel you'll never understand something and trying and trying almost in a *trial and error* way is really boring for someone like me. Sometimes, working like this makes you feel really alone. You have almost no people to help you, and the project needs a huge amount of context to be understood so you can't ask for help to *anyone*, and those who are supposed to know are really hard to reach. Or what it might be worse: maybe there's none that understands this thing well, because it's old, it changed a lot and probably just a handful of people do really took part in the development of the fucking buildsystem. In conclusion, this is boring and uninteresting job, but someone has to do this, and... It was my turn this time. You go next.