summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEkaitz Zarraga <ekaitz@elenq.tech>2022-07-01 22:45:15 +0200
committerEkaitz Zarraga <ekaitz@elenq.tech>2022-07-01 22:45:15 +0200
commit775281bfd957ced61dbca446d47a36202e2d7944 (patch)
tree8ed3a37ac117166ba9062c1ec7f6039d75ef5a7d
parentcc3e680ae7120c3c5e4b521ded64126cfb62bfd5 (diff)
Add bootstrap gcc 4
-rw-r--r--content/bootstrapGcc/04_full_compiler.md513
1 files changed, 513 insertions, 0 deletions
diff --git a/content/bootstrapGcc/04_full_compiler.md b/content/bootstrapGcc/04_full_compiler.md
new file mode 100644
index 0000000..1c04e7a
--- /dev/null
+++ b/content/bootstrapGcc/04_full_compiler.md
@@ -0,0 +1,513 @@
+Title: Milestone — Source to Binary RISC-V support in GCC 4.6.4
+Date: 2022-06-20
+Category:
+Tags: Bootstrapping GCC in RISC-V
+Slug: bootstrapGcc4
+Lang: en
+Summary:
+ Description of the changes applied from a minimal compiler that runs and
+ generates assembly to something that is actually able to compile,
+ interacting with binutils and having a working libgcc.
+
+In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC,
+and we already shared how I backported the RISC-V support from the GCC core to
+GCC-4.6.4. Now it's time to finish what we left half-done and actually
+introduce a *full* RISC-V compiler.
+
+### Where we left last time
+
+The Tuesday, 7th of April, I marked a commit with the `minimal-compiler` tag.
+That commit contains all the work we did until that time. In that tag we
+describe how we can build a compiler that is only able to assemble files to
+RISC-V.
+
+As we already explained around here, GCC is a driver program that calls other
+programs to do its work. The GCC core compiles the code to assembly language
+and then calls binutils to do the rest of the work: assembly and linking.
+
+At that point, we had to call binutils by hand.
+
+
+### The changes
+
+The changes applied at the time of writing are available in the
+[`working-compiler`][working-compiler] tag. As the tag message describes, they
+were split in two different branches: the `guix-package` branch and the `riscv`
+branch.
+
+[working-compiler]: https://github.com/ekaitz-zarraga/gcc/releases/tag/working-compiler
+
+The `guix_package` branch is merged in the `riscv` branch but this split lets
+us differentiate which changes are related with the compiler itself and which
+are related with the tooling around the compiler. That way we'll be able to
+choose what to do with the commits easily in the future. We'll probably need to
+rearrange some stuff.
+
+
+### The context is everything: Guix package part
+
+The `guix_package` branch contains all the commits that make the Guix tooling
+around the project work. This includes the compilation process definition in a
+reproducible way, the environment setup and all that.
+
+As the `working-compiler` tag message describes, this is the way you can
+currently make this compiler work and play with it:
+
+``` bash
+$ guix shell -m manifest.scm
+$ source PREPARE_FOR_COMPILATION.sh riscv64-linux-gnu
+ # This second command will prepare the PATH and other environment
+ # variables to make GCC find libraries and executables
+```
+
+> If you use this in the future and it fails, it might be because between the
+> time this blog post was written and you read it Guix made some changes in the
+> core packages that are used. You can always use the `time-machine` utility to
+> make sure you use everything like in the moment this post was written:
+> `guix time-machine --channels=channels.scm -- shell -m manifest.scm`
+
+
+From this point you can directly run the compiler, it will need the `sysroot`
+option to be able to find the `crt*` files, but that's something I'm not
+worried about at this point, we'll fix that when we integrate this in the
+bootstrapping process.
+
+Run the compiler like this now:
+
+``` bash
+$ riscv64-linux-gnu-gcc --sysroot=$GUIX_ENVIRONMENT [-static] ...
+```
+
+#### Notable changes in the Guix side
+
+The most notable change in the Guix side is the addition of the `manifest.scm`
+file and also the `PREPARE_FOR_COMPILATION.sh` file. With the help of my man
+Janneke, I realized the problems I had came from the fact that I was calling
+the compiler with the wrong environment and it was unable to find the linker
+and the assembler. Yes, this kind of things happen a lot in Guix if you are not
+careful (and I am *not* careful at all). Adding these tools let me prepare a
+working environment where the assembler and the compiler are found and called
+properly.
+
+This change also includes the some interesting extras: the GLibC added to the
+manifest also contains the static version so we can generate static binaries
+that are easier to test in an emulated environment without having to deal with
+the dynamic linker. Important stuff.
+
+Also, now the compilation process relies on a newer Guix version, which removed
+the `-unknown` part from the triplets (actually *quadruplets*), like
+`riscv64-unknown-linux-gnu`. That was a little bit of a pain, because I just
+tried to compile everything one day and failed, and in the end it was just that
+small change. I decided to update the Guix version needed to keep it up-to-date
+with the current Guix, so I didn't need to run `guix time-machine` each time.
+It's better like this.
+
+If you want to read more about the change and see how fast Guix
+people helped me understand what was going on, [see this mailing list
+thread][ml-unknown][^guix]. I have also to mention that I needed to add a small
+change to my GCC to be able to work in the case the `-unknown` part was not
+added to it: adding `riscv` to `config.sub` was enough for that.
+
+[^guix]: Some people also spent time with me in the IRC. Thanks to all that
+ helped!
+
+[ml-unknown]: https://lists.gnu.org/archive/html/bug-guix/2022-06/msg00092.html
+
+I also fixed a couple of extra things but they are not really relevant for
+this. Having a working environment preparation is a nice milestone by itself,
+but we did some things more on the GCC side!
+
+
+### Road to a working compiler: The GCC part
+
+The changes in the `riscv` branch contain some commits, most of them are small,
+but they are really important. I have to say this is full of details I don't
+really understand, so I'll try to focus on those I actually do. The rest of
+them are simply things that happened to work in the end. You know, this is
+pretty old software and the project is too complex to understand it all...
+
+#### Memory models and fences
+
+First, before doing anything else, we mentioned in the previous post that the
+memory models were something we needed to review. We knew this because the code
+related to memory models was used in a couple of parts of the RISC-V code we
+copied from the GCC 7.5 codebase, but it was not available in GCC 4.6.4. That
+API simply did not exist back then.
+
+The commit [`71dc25d`][memmodels] removes the memory models from the code
+(which were already commented out but not solved), taking in account the most
+conservative approach: always add the `.aq` flag and the `fence` instruction.
+This is not optimal, but the performance penalty is negligible and it's not
+affecting the functionality.
+
+[memmodels]: https://github.com/ekaitz-zarraga/gcc/commit/71dc25d08354dead26180bd552c0c3e299b012cb
+
+I did not come up with this myself, as I mentioned in the previous post, I
+asked the maintainer of the RISC-V support of GCC (who is also one of the big
+names of RISC-V) about this and he gave me this solution.
+
+I also had to change the optabs a little bit, using `memory_barrier` instead of
+one of the more recent optabs. For this I just compared the code from the MIPS
+architecture and checked how it changed from the 4.6.4 to the 7.5, as I did for
+many other parts of this work. Easy-peasy.
+
+#### Wrong arguments in the assembler call
+
+As I mentioned in the Guix part, we were unable to call the assembler. This
+means we didn't uncover the assembler call was broken until we actually put it
+in the `PATH` and tried to call it.
+
+The commit [`7030067`][as-call] shows how I needed to make small changes in the
+way the assembler is called by GCC to ensure that it was called correctly.
+
+[as-call]: https://github.com/ekaitz-zarraga/gcc/commit/7030067e6aa54b44a2f2447d4e706e76bc88f696
+
+This issue was easy to fix, but not that easy to catch. First I found the
+assembler was complaining because it didn't understand the `-k-march` option. I
+spent some time realizing the problem was that those were to options that were
+merged together due to a lack of a space. Yes, the space in the end of the line
+**is relevant**.
+
+I directly removed the `-k` option from the `ASM_SPEC` because my assembler was
+considering it ambiguous. I don't remember where I copied this from but it
+works and I don't want to think about it ever again.
+
+
+
+#### Libgcc: the core of this change
+
+The biggest thing in this set of changes was the addition of `libgcc`, which
+is mandatory if you want to link your programs compiled with GCC. `libgcc` is a
+library GCC uses for complex operations: instead of generating the assembly
+code directly, it generates calls to `libgcc`, where those complex operations
+are defined. You can read further about those operations but they are not
+really relevant for this post, the relevant part is we need to add `libgcc` in
+order to have a working compiler.
+
+The GCC codebase has different folders for its different blocks, so it's
+not surprising to see there's a folder called `gcc` for the core and a folder
+called `libgcc` for `libgcc`. Anyone would expect that just cherry picking the
+commit that added the `libgcc` support to GCC 7.5 would be enough to have the
+backport ready.
+
+Sadly, life is a little bit harder than that.
+
+##### Cherry picking the libgcc support
+
+The first and easiest thing to do is to cherry pick the commit
+[`72add2f`][libgcc-commit] and pray. It looked plausible to make it work,
+because, if you look at the changes it makes, it's pretty well contained in the
+`libgcc/config/riscv/` folder and adds just a couple of lines to the
+`libgcc/config.sub` to make it find the `riscv` folder.
+
+[libgcc-commit]: https://github.com/ekaitz-zarraga/gcc/commit/72add2fa4c354af4bf8db0b8dcb50c5b076b3ae5
+
+The contents of the commit are pretty clear:
+
+1. Some assembly files that implement some operations
+2. Some header files and C code that implement other things
+3. Some weird files called `t-something`
+
+The first two types of files we can understand as the body of the `libgcc`
+support: the juice. The `t-something` files are what are called Makefile
+Fragments.
+
+The Makefile Fragments are the basis of the GCC build system. The files like
+`config.host`, also part of the commit, sets a variable, `tmake_file`, where
+all the `t-something`s are added so the compiler generator framework knows how
+to build the things according to the rules described in them.
+
+That's how GCC buildsystem works. Now let's talk about the problems.
+
+##### LIB2ADD iteration is broken
+
+First thing I realized when I did the cherry pick of the `libgcc` support was
+the whole thing did not build anymore. There was a crazy issue here.
+
+We are not going to talk about `LIB2ADD` variable yet, but we can see this
+small change, [`b9c7f39`][lib2add], affects it. The main issue here was the
+whole makefile system (`*.mk` files in `libgcc`) was iterating over the values
+of the variable wrong, because `libgcc` support commit was appending values to
+`LIB2ADD` instead of setting it. The `LIB2ADD` variable was set empty from the
+main makefiles, and appending to it was leaving an empty entry, so the
+iteration process was trying to compile an empty value.
+
+[lib2add]: https://github.com/ekaitz-zarraga/gcc/commit/b9c7f394b33a60c1e64191b0e31f0cf98d6a5f93
+
+This was superhard to debug, but this small change just made the whole thing
+compile and now I was able to test the whole thing further.
+
+##### Still broken
+
+But it was still broken. GCC didn't want to compile. Some weird errors
+appeared, mentioning something like the `extra_parts` were not coherent between
+`gcc` and `libgcc`. Weird.
+
+Reading `gcc/config.gcc` and `libgcc/config.host` I realized the use of the
+`extra_parts` variable and how it was certainly incoherent between the two
+files. But why?
+
+This led me to analyze the whole build system, comparing the RISC-V support
+with others. I realized here that the buildsystem is mixed in `gcc` and
+`libgcc` folders and it's extremely difficult to know what's the line that
+separates one from another.
+
+Apart from that, the buildsystem was unable to compile the `crt*` files,
+because it didn't know how to do it... The recipes were missing.
+
+This made me go for the most aggressive change possible,
+[`9c0f736`][aggressive-fix]: just copy everything from the
+`libgcc/config/riscv/` to the `gcc/config/riscv`, add the rules for the `crt*`
+files and make the `extra_parts` coherent.
+
+[aggressive-fix]: https://github.com/ekaitz-zarraga/gcc/commit/9c0f7364b89acb38ea3af1cbe1884059671b3c04
+
+Of course, this is not a good change, but it lets us try if the generated
+compiler is able to compile anything. *"I'll have time to clean this up later"*
+I thought.
+
+
+##### The buildsystem is just a pain in the butt
+
+Now I was able to compile the GCC, so I could try it for some things.
+
+I build a RISC-V cross compiler and tried to statically compile a small Hello
+World program. Errors appeared:
+
+``` unknown
+/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `_nl_lookup':
+/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../include/../locale/localeinfo.h:315: undefined reference to `__unordtf2'
+/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fp.o): in function `__printf_fp_l':
+/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__unordtf2'
+/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/printf_fp.c:394: undefined reference to `__letf2'
+/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/lib/libc.a(printf_fphex.o): in function `__printf_fphex':
+/tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
+/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__unordtf2'
+/gnu/store/gbvg2msjz488d4s08p57mb1ajg48nxlj-profile/bin/riscv64-linux-gnu-ld: /tmp/guix-build-glibc-cross-riscv64-linux-gnu-2.33.drv-0/glibc-2.33/stdio-common/../stdio-common/printf_fphex.c:212: undefined reference to `__letf2'
+collect2: ld returned 1 exit status
+```
+
+The most logical thing to do was to build a MIPS cross compiler and check if
+the same issue appeared. Of course, it didn't.
+
+Researching a little bit in the old GCC internals documentation, I found a
+couple of interesting things:
+
+<https://gcc.gnu.org/onlinedocs/gcc-4.6.4/gccint/Target-Fragment.html#Target-Fragment>
+
+- The `LIB2FUNCS_EXTRA` variable is the one that contains what it should be
+ compiled and added to `libgcc`.
+- **Floating Point Emulation** support is added by generating a couple of files
+ with some macros on top: `fp-bit.c` and `dp-bit.c`.
+
+Neither of those were used in the `libgcc` support we backported because the
+GCC buildsystem changed a lot since 4.6.4. In fact, there is a commit[^commit],
+much later than the 4.6.4 release, that removes the need to generate those
+`fp-bit.c` thingies.
+
+[^commit]: `569dc494616700a3cf078da0cc631c36a4f15821`
+
+The `LIB2FUNCS_EXTRA` variable was not used either, but somewhere in the
+makefiles I found `LIB2ADD` was set from it. It looks like the whole
+buildsystem changed from `LIB2FUNCS_EXTRA` to `LIB2ADD`, which was an internal
+variable in the past. I don't know.
+
+I just moved the `LIB2ADD` to `LIB2FUNCS_EXTRA` and set the floating point
+emulation in the `t-riscv` makefile fragment and hoped my work was done there.
+
+##### A huge pain in the butt
+
+It still failed, but at least now the `__letf2` symbol was found. The only one
+I needed to fix now was `__unordtf2`.
+
+I was disheartened.
+
+The `__unordtf2` name did not appear anywhere in the code, but building
+`libgcc` for MIPS had the symbol inside (I checked it with `nm`!). I had no
+idea of what was going on.
+
+I asked all my peers about this, and I was sent a program that was actually
+compilable and runnable (Janneke is a genius, someone has to say it!):
+
+``` clike
+#include <stdio.h>
+
+int
+main ()
+{
+ return printf ("Hello, world!\n");
+}
+
+int
+__unordtf2 ()
+{
+ return 0;
+}
+```
+
+Hah! Still, no solution, but it was a little bit of hope.
+
+This gave me the energy I needed to research further. This `__unordtf2`
+function comes from software floating point support but the makefile fragments
+in the `libgcc` folder seem to be correctly set...
+
+##### Moxie for the rescue
+
+MIPS architecture was too complex to be understandable for this humble human
+being so I decided to go for Moxie this time.
+
+[Moxie](http://moxielogic.org/blog/pages/architecture.html) is a really
+interesting thing. But we are not going to spend time on it, but in its support
+in GCC 4.6.4. Take a look to the files on both parts of the Moxie support: the
+`libgcc` and `gcc`:
+
+``` unknown
+gcc/config/moxie
+├── constraints.md
+├── crti.asm
+├── crtn.asm
+├── moxie.c
+├── moxie.h
+├── moxie.md
+├── moxie-protos.h
+├── predicates.md
+├── rtems.h
+├── sfp-machine.h
+├── t-moxie
+├── t-moxie-softfp
+└── uclinux.h
+
+libgcc/config/moxie
+├── crti.asm
+├── crtn.asm
+├── sfp-machine.h
+├── t-moxie
+└── t-moxie-softfp
+```
+
+As you can see, some things are repeated, and most of the files are located in
+the `gcc` part, which was not the case in the backported commit. I used this as
+a reference for a massive cleanup of the previous aggressive duplication and I
+ended up with this commit: [`703efe3`][cleanup]
+
+[cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/703efe3e86e68fe05380e996943c831e7ad9a541
+
+But that wasn't enough.
+
+I also found that the `soft-fp` support did not come from the `libgcc`
+directory, but from the `gcc` one, so I needed to fix some makefile fragments.
+The reference on how to do that was located in `gcc/config/soft-fp/t-softfp`.
+This file described all the variables that I needed to set up to make the whole
+process find the software floating point functions to add (see how the function
+names are built with the `$(m)` variable? That's why I couldn't find where did
+the `__unordtf2` came from...).
+
+Those variables were set in `libgcc/config/riscv/t-softp*` files. I replicated
+them in `gcc/config/riscv` as in the Moxie target and added referenced to them
+to the `gcc/config.gcc` file, copying the lines I had `libgcc/config.host`. The
+process was still failing, as the variables were not found by the main
+makefile. I decided to hardcode them and give it another go, this time it built
+and I was able to build files and the weird errors did not appear anymore.
+
+I realized in the end that the reason why the main makefile wasn't finding the
+variables was because I was referring to the `t-softfp*` files through the
+variable `host_address`, as it was done in the `libgcc/config.host`. The
+problem was that variable was not available in the main `gcc/config.gcc` file
+so I had to make a beautiful `switch-case` to deduce the wordsize.
+
+With all this knowledge and with the help from the Moxie support I finally
+arranged a new commit, where I duplicated the files that I needed to duplicate,
+added the correct references to the makefile fragments and I even fixed some of
+the variables in the makefiles: [`f42a214`][final-cleanup]
+
+[final-cleanup]: https://github.com/ekaitz-zarraga/gcc/commit/f42a21427361fb2d6d8481d143258af3237fd232
+
+Yeah, all this was hard to deduce, because this buildsystem is really complex
+and makefiles are really hard to debug[^debug-makefile]. Also the fact that I
+don't understand why I need to replicate the `t-softp*` files in both places
+drives me mad, but I have to learn to deal with the fact that I can't
+understand everything.
+
+[^debug-makefile]: Try to run `make --debug` in a project of the size of GCC
+ and laugh with me.
+
+In these commits you can see I deleted references to `extra_parts` and some
+other things, too. The reason is simple: if other architectures don't need
+to set those variables, me neither. In the end, the `crt*` files were generated
+anyway.
+
+
+#### Other changes
+
+I also removed `-latomic` from the calls to the linker because it looks like it
+didn't exist back then (we'll see how this explodes in my face in the future),
+and fixed a couple of things more, but that's not really interesting in my
+opinion[^interesting].
+
+[^interesting]: The rest of the post is not really interesting either, but I
+ need to report what I did. It's just me fighting against myself and a very
+ complex buildsystem that could've been simpler and/or better documented.
+
+
+### Missing things
+
+There are many things missing still, but this some I won't even try because
+they are out of the scope of the project. Remember: **we just need to be able
+to compile a more recent GCC**, not the rest of the world.
+
+Some of the things I left might become mandatory in the near future as we do
+proper testing of all this. My goal here was to provide something that can run,
+and then I'll collaborate with the different agents in this bootstrapping
+effort to fix anything we need to reach the full bootstrapping support.
+
+There are few obvious things missing:
+
+- **Big Endian support**: `riscv64be-linux-gnu` support, basically (note the
+ `be` in the target name). I won't add this until we are sure we need it. It
+ shouldn't be difficult, I already found some commits in the main GCC where
+ this was added and they were simple.
+- **Specific device support**: we didn't add support for any specific device
+ yet, that's something we'll need to think about in the future, but we
+ probably won't add because it will make us maintain more code, and I don't
+ think generic RISC-V code is going to have issues in the majority of the
+ devices.
+- There are also **many commits that came after** the main port that fix some
+ relocations and some other things. Many of them are not really relevant,
+ because most of them are related with bugs that were introduced later, fix
+ things that won't change anything in the only program we need to build (GCC)
+ and so on. In order to know which ones are relevant we need...
+- **Proper testing!** I didn't do this yet, and I'll probably need help with
+ it. Compile your RISC-V software with this and give it a try! Send me the
+ errors you get!
+- **Libatomic**: was directly removed from the calls to the linker, as I
+ mentioned before and we have to make sure it didn't exist back then and so
+ on. Boring things...
+- I didn't even bother to add the **testsuite support**, our only test has to
+ be if we are able to compile GCC with this, which I didn't really try yet
+ anyway (because it needs some extra things).
+
+### Conclusion
+
+This part of the project came in the worst moment. I wasn't really motivated
+and I had some personal things going on. It was difficult for me to do this.
+
+In contrast with what I did in the previous steps of the project, this part is
+really uninteresting because it doesn't give you a lot of chances for learning,
+which is the only thing that keeps me alive at this point.
+
+It's also pretty boring and exasperating to feel you'll never understand
+something and trying and trying almost in a *trial and error* way is really
+boring for someone like me.
+
+Sometimes, working like this makes you feel really alone. You have almost no
+people to help you, and the project needs a huge amount of context to be
+understood so you can't ask for help to *anyone*, and those who are supposed to
+know are really hard to reach. Or what it might be worse: maybe there's none
+that understands this thing well, because it's old, it changed a lot and
+probably just a handful of people do really took part in the development of the
+<del>fucking</del> buildsystem.
+
+In conclusion, this is boring and uninteresting job, but someone has to do
+this, and... It was my turn this time.
+
+You go next.