From 80a0d99ac166e54249d1d6a33ec79ae03c7d9c09 Mon Sep 17 00:00:00 2001 From: Ekaitz Zarraga Date: Mon, 30 Oct 2023 15:26:47 +0100 Subject: finish 2e2265b0b79cce46954f7d55fa2d6eb11e1de4a9 with recent info --- content/bootstrapGcc/08_tcc_and_mescc.md | 230 +++++++++++++++++++++---------- 1 file changed, 158 insertions(+), 72 deletions(-) (limited to 'content/bootstrapGcc') diff --git a/content/bootstrapGcc/08_tcc_and_mescc.md b/content/bootstrapGcc/08_tcc_and_mescc.md index e49a2e6..43666fb 100644 --- a/content/bootstrapGcc/08_tcc_and_mescc.md +++ b/content/bootstrapGcc/08_tcc_and_mescc.md @@ -1,5 +1,5 @@ Title: Milestone — MesCC builds TinyCC and fun C errors for everyone -Date: 2023-09-27 +Date: 2023-10-30 Category: Tags: Bootstrapping GCC in RISC-V Slug: bootstrapGcc8 @@ -8,7 +8,6 @@ Summary: We spent the last months making MesCC able to compile TinyCC and making the result of that compilation able to compile TinyCC. Many cool problems appeared, this is the summary of our work. -Status: draft It's been a while since the latest technical update in the project and I am fully aware that you were missing it so it's time to recap with a really cool @@ -68,12 +67,14 @@ project, and even a FOSDEM talk about it, but they all give a very broad explanation, so let's focus on what we are doing right now. Here we have Mes, a Scheme interpreter, that runs MesCC, a C compiler, that is -compiling our simplified for of TinyCC, let's call that Bootstrappable TinyCC. +compiling our simplified fork of TinyCC, let's call that Bootstrappable TinyCC. That Bootstrappable TinyCC compiler then tries to compile its own code. It compiles it's own code because it's goal is to add more flags in each -compilation, so it has more features in each round. We do all this because -TinyCC is way faster than MesCC and it's also more complex, but MesCC is only -able to build a simple TinyCC with few features enabled. +compilation, so it has more features in each round[^rounds]. We do all this +because TinyCC is way faster than MesCC and it's also more complex, but MesCC +is only able to build a simple TinyCC with few features enabled. + +[^rounds]: There are many rounds. Like 7 or so. During all this process we use a standard library provided by the Mes project, we'll call it MesLibC, because we can't build glibc at this point, and TinyCC @@ -83,7 +84,7 @@ With all this well understood, this is the achievement: **We made MesCC able to compile the Bootstrappable TinyCC, using MesLibC, to an executable that is able to compile the Bootstrappable TinyCC's codebase to a -binary that works.**[^self-hosted] +binary that works and has all the features we need enabled.**[^self-hosted] [^self-hosted]: So it can compile itself again an again, but who would want to do that? @@ -122,14 +123,19 @@ In summary, TinyCC is a key step in the bootstrapping chain. ### Problems fixed {#problems} This work can be easily followed in the commits in my TCC fork's -[`riscv-mes`][tcc] branch, and in my Mes clone's [`ekaitz`][mes] branch. Most -of the commits are already merged, but we leave that reference for people to be -able to follow the development easier. We are also identifying the contents of -this blogpost in the git history by adding the git tag `self-hosted-tcc-rv64` -to both of my forks. +[`riscv-mes`][tcc] branch, and in my Mes clone's [`riscv-tcc-boot`][mes] +branch. We are also identifying the contents of this blogpost in the git +history by adding the git tag `self-hosted-tcc-rv64` to both of my forks. We +will try to keep both for future reference. + +In Mes the process might be a little bit harder to follow because we sent most +of the patches to Janneke and he merged them so when we were about to release +this post I continued from Janneke's branch to avoid divergences (I had some +problems with that before). In any case, the code is there and searching by +authors (Andrius and myself) would guide you to the changes we did. [tcc]: https://github.com/ekaitz-zarraga/tcc/tree/riscv-mes -[mes]: https://github.com/ekaitz-zarraga/mes/tree/ekaitz +[mes]: https://github.com/ekaitz-zarraga/mes/tree/riscv-tcc-boot Many commits have a long message you can go read there, but this post was born to summarize the most interesting changes we did, and write them in a more @@ -166,8 +172,6 @@ the TinyCC assembler worked, and some instructions need to use relocations which I didn't know how to add. The following commit can show how it feels to work on this, and shares how relocations are done: -[1e597f3d239d9119d2ea4bb3ca29b587ea594dcc][lla-commit] - [lla-commit]: https://github.com/ekaitz-zarraga/tcc/commit/1e597f3d239d9119d2ea4bb3ca29b587ea594dcc There you can see we started to understand things in TinyCC, but some other @@ -219,24 +223,23 @@ In our case, our assembly blocks were clobbering some variables that would have been protected by the compiler if the Extended Asm support was implemented. Andrius found all the places in MesLibC where Extended Asm was used and rewrote -the assembly code to keep variables safe in the cases it was needed. See -[b5eb0e34c6fc76a4558940e43ac78cc8a63ebac1][extended-asm] in Mes. - -[extended-asm]: https://github.com/ekaitz-zarraga/mes/commit/b5eb0e34c6fc76a4558940e43ac78cc8a63ebac1 +the assembly code to keep variables safe in the cases it was needed. The other option was to add Extended Asm support for TinyCC, but we would need to add it in the Bootstrappable TinyCC and also upstream. This also means understanding TinyCC codebase very well and making the changes without errors, -so we decided to simplify MesLibC, because that is easier to make right. +so we decided to simplify MesLibC, because that is easier to make right. We are +probably going to need to do this later on anyway, but we'll try to delay this +as much as possible. #### MesLibC `main` function arguments are not set properly {#main-args} Following the previous problem with assembly, we later found input arguments of the `main` function, that come from the command line arguments, were not properly set by our MesLibC. Andrius also took care of that in -[267a132ca932dafe628da000dc76714612cce144][main-ext] in Mes. +[4f4a1174][main-ext] in Mes. -[main-ext]: https://github.com/ekaitz-zarraga/mes/commit/267a132ca932dafe628da000dc76714612cce144 +[main-ext]: https://github.com/ekaitz-zarraga/mes/commit/4f4a11745d1c7ed0995e9d31c7994abfb4a60b25 This error was easier to find than others because when we found issues with this we already had a compiled TinyCC. So we just needed to fix simple things @@ -316,7 +319,7 @@ the second function. Knowing that, we could take other architectures as a reference to fix this, and so we did. -See [6fbd17852aa11a2d0bc047183efaca4ff57ab80c][tcc-casting-commit]. +See [6fbd1785][tcc-casting-commit]. [tcc-casting-commit]: https://github.com/ekaitz-zarraga/tcc/commit/6fbd17852aa11a2d0bc047183efaca4ff57ab80c @@ -356,7 +359,7 @@ In upstream TinyCC there were some commits that added `long double` support using, and I quote, a *mega hack*, so I just copied that support to our Bootstrappable TinyCC. -See [a7f3da33456b4354e0cc79bb1e3f4c665937395b][tcc-long-double]. +See [a7f3da33456b][tcc-long-double]. [tcc-long-double]: https://github.com/ekaitz-zarraga/tcc/commit/a7f3da33456b4354e0cc79bb1e3f4c665937395b @@ -393,8 +396,7 @@ That's supposed to initialize *all* fields in the `Thing` `struct` to `0`, according to the C standard[^cppref]. As a first solution we set struct fields manually to `0`, to make sure they -were initialized properly. See -[29ac0f40a7afba6a2d055df23a8ee2ee2098529e][tinycc-struct-0] +were initialized properly. See [29ac0f40a7afb][tinycc-struct-0] [tinycc-struct-0]: https://github.com/ekaitz-zarraga/tcc/commit/29ac0f40a7afba6a2d055df23a8ee2ee2098529e @@ -493,10 +495,10 @@ In this case, like in many others, the fix was easier than realizing what was going wrong. I just added support for the signed rotation operation, not only for RISC-V but for all architectures, and I added the correct signedness check to the rotation operation to select the correct instruction. The patch (see -[c0c2556c2b2897814a87b8bdfa6997f79c218eeb][signed-rotation] in Mes) is very -clean and easy to read, because MesCC's codebase is really well ordered. +[88f24ea8][signed-rotation] in Mes) is very clean and easy to read, because +MesCC's codebase is really well ordered. -[signed-rotation]: https://github.com/ekaitz-zarraga/mes/commit/c0c2556c2b2897814a87b8bdfa6997f79c218eeb +[signed-rotation]: https://github.com/ekaitz-zarraga/mes/commit/88f24ea8661dd279c2a919f8fbd5f601bb2509ae #### MesCC switch/case falls-back to default case {#broken-case} @@ -577,10 +579,10 @@ It wasn't that easy to implement, but I first built a simple prototype and Janneke's scheme magic made my approach usable in production. All this is added in Mes's codebase in several commits, as we needed some -iterations to make it right. [f75cf7bfb911868023732bf4274978069b98849a][cases] -has the base of this commit, but there were some iterations more in Mes. +iterations to make it right. [22cbf823582][cases] has the base of this commit, +but there were some iterations more in Mes. -[cases]: https://github.com/ekaitz-zarraga/mes/commit/f75cf7bfb911868023732bf4274978069b98849a +[cases]: https://github.com/ekaitz-zarraga/mes/commit/22cbf823582e3699b6a21ee0cf74c2dbf0a6a4e9 #### Boostrappable TinyCC problems with GOT {#got} @@ -590,7 +592,7 @@ Bootstrappable TinyCC segfaulted because it was generating an empty GOT. Andrius debugged upstream TinyCC alongside ours and realized there was a missing check in an `if` statement. He fixed it in -[f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9][got-commit]. +[f636cf3d4839d1ca][got-commit]. The problem with this kind of errors is TinyCC's codebase is really hard to read. It's a very small compiler but it's not obvious to see how things are @@ -654,8 +656,8 @@ In my code I forgot to replace the comparison operator so the branch checked `if a0 is less than 0` and it was always false, as the `set` operation writes a `0` or a `1` and none of them is less than `0`. -The commit [5a0ef8d0628f719ebb01c952797a86a14051228c][branch-tcc] explains this -in a more technical way, using actual RISC-V instructions. +The commit [5a0ef8d0628f719][branch-tcc] explains this in a more technical way, +using actual RISC-V instructions. This was also a hard to fix, because TinyCC's variable names (`vtop->c.i`) are really weird and they are used for many different purposes. @@ -686,12 +688,18 @@ TinyCC. ##### Extra: files generated with no permissions -There might be more problems with this though, we need to tackle in the future. -The bootstrappable TinyCC built using MesCC generates files with no -permissions and Andrius found that this problem comes from the argument -handling in the `open` system call in MesLibC. It's not a big deal at the -moment, because the next iteration of TinyCC uses correct permissions. We can -just `chmod` the file manually, but we'll probably fix it anyway. +The bootstrappable TinyCC built using MesCC generated files with no permissions +and Andrius found that this problem came from the variable length argument +support definitions. So he fixed that, too[^stikonas]. + +The macro that defined `va_start` was broken pointer arithmetic. At the +beginning he thought it was related with MesCC's internals but he tested in GCC +later and realized the problem was in the macro definition. That's why +currently the commit says "workaround" in the name, but it's more than a +workaround: it's a proper fix. We are rewording that, but that would happen +after we release this post. + +[^stikonas]: He is like that. #### MesLibC use `signed char` for `int8_t` {#int8} @@ -791,15 +799,14 @@ instructions in TinyCC's RISC-V assembler, which sounds great but forces us to upstream the changes, and that process may take long and we'd need to patch it in our bootstrapping scripts until it happens. -We'll think about it, that's why the commit is marked as a WIP: -[42cb302c857fecafde6f27a8311531d606d15feb][setjmp]. +We just added the `#ifdef`s because our code is full of them anyway and sent it +to Mes: [0e2c5569][setjmp]. -[setjmp]: https://github.com/ekaitz-zarraga/mes/commit/42cb302c857fecafde6f27a8311531d606d15feb +[setjmp]: https://github.com/ekaitz-zarraga/mes/commit/0e2c55697df285250c8a24442f169bc52d729c31 [^stolen]: Yo, if it's free software it's not stealing! Please steal my code. Make it better. - #### More {#more} Those are mostly the coolest errors we needed to deal with but we stumbled upon @@ -812,15 +819,43 @@ I found a [bug in Guix shell](https://issues.guix.gnu.org/65225) (it's still open) and had to fix some ELF headers in MesCC generated files because objdump and gdb refused to work on them. -Also, while I was writing this lines Andrius fixed the x86 bootstrapping, which -I broke when the backporting process started. +Andrius also found issues with weak symbols in MesLibC that were triggered +because TCC didn't have support for them, thankfully upstream TCC had that +issue fixed and we just cherry-picked for the win. + +He even had the energy to test all this in real RISC-V we specifically acquired +for this task. + +There are many more things to tell, but this is already getting too long and if +I continue writing we'll probably end up fixing some stuff more. In the end, a project like this is like hitting your head against a wall until -one of them breaks. Sometimes it feels like the head did. +one of them breaks. Sometimes it feels like the head did, but it's all good. + #### Reproducing what we did {#reproducing} -> TODO +All we did means nothing if you can't reproduce it. We provide two ways to +reproduce this process: live-bootstrap and Guix. + +Both provide a similar thing but there are some differences from the +high-level that is worth mention now. + +Comparing with `live-bootstrap`, using Guix helps because it reuses the +previous steps if they didn't change. This results in shorter waits once Mes is +sorted out. + +On the other hand, I've have had issues with the failed builds in Guix (in +emulated systems). It was hard to jump inside the build container and play +around inside so the development cycle suffered a lot. In `live-bootstrap`, if +you are good with `bwrap` you can jump and tweak things with no issues. + +For those who enjoy digging in the code and trying to follow the process I +recommend following `live-bootstrap`'s scripts. The directory structure is a +little bit confusing but the scripts are very plain and linear. The ones in the +Guix process come from previous bootstrap efforts and they are designed to do +many things automagically, that makes them a hard to follow. + ##### Using live-bootstrap {#live-bootstrap} @@ -846,7 +881,7 @@ the exact same thing. > NOTE: `live-bootstrap`'s project description is a little bit outdated. If you > read the comparison with Guix, what you'd read is old information. If you > want to read a more up-to-date information about Guix's bootstrapping process -> I suggest you to read this page of Guix manual: +> I suggest you to read this page of Guix manual: > Being very different projects, in a practical level, the main difference @@ -858,19 +893,67 @@ working on any GNU/Linux distribution[^in-guix]. careful with the options you send to the `rootfs.py` script. If you want to reproduce this exact point in time you only need to use my fork -of `live-bootstrap` you can find HERE, jump to the `self-hosted-tcc-rv64` tag -and run it. Andrius made all the magic to set that process to take all the -inputs from Mes and TinyCC from the correct tag. We'll leave that there for -future reference. +of [live-bootstrap](https://github.com/ekaitz-zarraga/live-bootstrap/), branch +`riscv-tcc-boot`. I also made a tag on it, `self-hosted-tcc-rv64`, to make it +easier to remember when was this post released. Andrius made all the magic to +set that process to take all the inputs from Mes and TinyCC from the correct +tag. + +Clone the repository, set up the dependencies and run this (if you are not in a +RISC-V host you need to configure Qemu and binfmt): + +``` bash + ./rootfs.py --bwrap --arch riscv64 --preserve +``` + +That should, after a long time, reach a point where there's a properly compiled +bootstrappable TinyCC. -> TODO #### Using Guix for a reproducible environment {#guix} -Over what I just mentioned, there's another big difference between -live-bootstrap and Guix: I am the one making the Guix package for this. +I made a Guix recipe that can replicate the whole process, too. It took me long +time to make it work but it finally does. + +From my TCC fork reproducing this should be easy for the people versed in Guix. +There's a `guix` folder with some files, (most of them broken, not gonna lie) +but there are two you should pay attention to: + +- `channels.scm` stores the state of my Guix checkout so you can reproduce it + in the future using `guix time-machine`. At the moment it doesn't feel + necessary but if something fails when you try it, please refer to that. + +- `commencement.scm` is an edited copy of the Guix bootstrapping process, + directly obtained from `gnu/packages/commencement.scm` from Guix's codebase. + I patched this to make it work for RISC-V, using some more modern commits in + the dependencies. + +In order to reproduce all our work in Guix you just need to build `tcc-boot0` +package from the `commencement.scm` file using `riscv64-linux` as your +`--system`. I'm a nice guy so I just added a command there you can use for +this, just run: + +``` bash +./tcc-boot0-from-source.sh +``` + +And that should build the whole thing. It takes hours, you have been warned. + +Also it adds `--no-grafts` (thanks Efraim), because if you keep the grafts it +compiles the world from scratch (curl, x11... not good). + +If you just want to build `mes-boot` as an intermediate step, I also made a +file for that: + +``` bash +./mes-boot-from-source.sh +``` + +The both scripts will load variables from the `commencement.scm` module +provided. The module is not complex if you are used to Guix, but it calls +some complex shell scripts in both Mes and TinyCC to build. Those contain all +the magic. -> TODO ### Conclusions {#conclusions} @@ -930,24 +1013,27 @@ helps. At least it helped me. ### What is next? {#next} -In the short-term, we need to decide what to do with the `setjmp` fix and -include it in MesLibC. After that we need to fix `va_args` in MesCC, for that -error with the permissions in the output files and fix the floating point -numbers in RV64 in TinyCC. +Now we have a fully featured Bootstrappable TinyCC we need to decide what to do +next. -Once that is done, now in the mid-term, we would be able to compile a fully -featured Bootstrappable TinyCC. With that and some fixes in MesLibC, we would -be able to compile upstream TinyCC. We need to fix any error we find there and -until it is ready for GCC. +On the short term, all this has to be released in the original projects: Mes, +M2, and so on. That's the easy part, as everything has proved to be ready. -Now in the long-term, we are going to have problems with GCC so we'll need to -fix those, too. Once that is done, we would use GCC to compile more recent -versions of GCC until we compile the world. +On the mid term, it's not very clear what to do first. We suspect we'll need +upstream TinyCC for the next steps, because we many different tools to +continue with the bootstrapping chain, and the bootstrappable TinyCC might not +be enough to build them. On the other hand, when we go for a standard library +we'll miss the extended assembly support we already mentioned. There's some +uncertainty in the next step. -That's more or less the description of what we will do in the next months. +The long-term is pretty much clear though, the goal is GCC. First GCC for C and +then for C++ to make it able build GCC 7.5 which should enable the rest of the +chain pretty easily (famous last words). I anticipate we are going to have +problems with GCC (I know this because I left them there last time) so we'll +need to fix those, too. Once that is done, we would use GCC to compile more +recent versions of GCC until we compile the world. -Meanwhile, we'll need to test this on real hardware we specifically acquired -for this task. It's slow, but it should be enough for these tests. +That's more or less the description of what we will do in the next months. And this is pretty much it. I hope you learned something new about C, the Bootstrapping process or at least had a good time reading this wall of text. -- cgit v1.2.3