From 2e2265b0b79cce46954f7d55fa2d6eb11e1de4a9 Mon Sep 17 00:00:00 2001 From: Ekaitz Zarraga Date: Tue, 17 Oct 2023 16:28:39 +0200 Subject: WIP: Milestone self-hosted-tcc-riscv --- content/bootstrapGcc/08_tcc_and_mescc.md | 1038 ++++++++++++++++++++++++++++++ 1 file changed, 1038 insertions(+) create mode 100644 content/bootstrapGcc/08_tcc_and_mescc.md diff --git a/content/bootstrapGcc/08_tcc_and_mescc.md b/content/bootstrapGcc/08_tcc_and_mescc.md new file mode 100644 index 0000000..e49a2e6 --- /dev/null +++ b/content/bootstrapGcc/08_tcc_and_mescc.md @@ -0,0 +1,1038 @@ +Title: Milestone — MesCC builds TinyCC and fun C errors for everyone +Date: 2023-09-27 +Category: +Tags: Bootstrapping GCC in RISC-V +Slug: bootstrapGcc8 +Lang: en +Summary: + We spent the last months making MesCC able to compile TinyCC and making the + result of that compilation able to compile TinyCC. Many cool problems + appeared, this is the summary of our work. +Status: draft + +It's been a while since the latest technical update in the project and I am +fully aware that you were missing it so it's time to recap with a really cool +announcement: + + +**We finally made a self-hosted Bootstrappable TinyCC in RISC-V** + + +Most of you probably remember I [already backported](bootstrapGcc6.html) the +Bootstrappable TinyCC compiler, but I didn't test it in a proper environment. +Now, we can confidently say it is able to compile itself, a "large" program +that makes use of more complex C features than I did in the tests. + +All this work was done by Andrius Štikonas and myself. Janneke helped us a lot +with Mes related parts, too. The work this time was pretty hard, honestly. Most +of the things we did here are not obvious, even for C programmers. + +I'm not used to this kind of quirks of the C language. Most of them are really +specific, related with the standards and many others are just things were +missing. I hope the ones I chose to discuss here help you understand your +computing better, as they did to me. + +This is going to be veery long post, so take a ToC to help you out: + +1. [Context](#context) + 1. [Why is this important?](#why-important) +2. [Problems fixed](#problems) + 1. [TinyCC misses assembly instructions needed for MesLibC](#tinycc-missing-instructions) + 2. [TinyCC's assembly syntax is weird](#tcc-assembly) + 3. [TinyCC does not support Extended Asm in RV64](#extended-assembly) + 4. [MesLibC `main` function arguments are not set properly](#main-args) + 5. [TinyCC says `__global_pointer$` is not a valid symbol](#dollars) + 6. [Bootstrappable TinyCC's casting issues](#tcc-casting-issues) + 7. [Bootstrappable TinyCC's `long double` support was missing](#long-double) + 8. [MesCC struct initialization issues](#mescc-struct-init) + 9. [MesCC vs TinyCC size problems](#size-problems) + 10. [MesCC add support for signed rotation](#mes-signed-rotation) + 11. [MesCC switch/case falls-back to default case](#broken-case) + 12. [Boostrappable TinyCC problems with GOT](#got) + 13. [Bootstrappable TinyCC generates wrong assembly in conditionals](#wrong-conditionals) + 14. [Support for variable length arguments](#varargs) + 15. [MesLibC use `signed char` for `int8_t`](#int8) + 16. [MesLibC Implement `setjmp` and `longjmp`](#jmp) + 17. [More](#more) +3. [Reproducing what we did](#reproducing) + 1. [Using live-bootstrap](#live-bootstrap) + 1. [Using Guix](#guix) +4. [Conclusions](#conclusions) +5. [What is next?](#next) + + +### Context {#context} + +You have many blogposts in the series to find the some context about the +project, and even a FOSDEM talk about it, but they all give a very broad +explanation, so let's focus on what we are doing right now. + +Here we have Mes, a Scheme interpreter, that runs MesCC, a C compiler, that is +compiling our simplified for of TinyCC, let's call that Bootstrappable TinyCC. +That Bootstrappable TinyCC compiler then tries to compile its own code. It +compiles it's own code because it's goal is to add more flags in each +compilation, so it has more features in each round. We do all this because +TinyCC is way faster than MesCC and it's also more complex, but MesCC is only +able to build a simple TinyCC with few features enabled. + +During all this process we use a standard library provided by the Mes project, +we'll call it MesLibC, because we can't build glibc at this point, and TinyCC +does not provide it's own C standard library. + +With all this well understood, this is the achievement: + +**We made MesCC able to compile the Bootstrappable TinyCC, using MesLibC, to an +executable that is able to compile the Bootstrappable TinyCC's codebase to a +binary that works.**[^self-hosted] + +[^self-hosted]: So it can compile itself again an again, but who would want to + do that? + +The process affected all the pieces in the system. We added changes in MesCC, +MesLibC and the Bootstrappable TinyCC. + +#### Why is this important? {#why-important} + +We already talked long about the bootstrapping issue, the trusting trust attack +and all that. I won't repeat that here. What I'll do instead is to be specific. +This step is a big thing because this allows us to go way further in the chain. + +All the steps before Mes were already ported to RISC-V mostly thanks to Andrius +Štikonas who worked in [Stage0-POSIX][stage0] and the rest of glue projects +that are needed to reach Mes. + +[stage0]: https://github.com/oriansj/stage0-posix + +Mes had been ported to RISC-V (64 bit) by W. J. van der Laan, and some patches +were added on top of it by Andrius Štikonas himself before our current effort +started. + +At this moment in time, Mes was unable to build our bootstrappable TinyCC in +RISC-V, the next step in the process, and the bootstrappable TinyCC itself was +unable to build itself either. This was a very limiting point, because TinyCC +is the first "proper" C compiler in the chain. + +When I say "proper" I mean fast and fully featured as a C compiler. In x86, +TinyCC is able to compile old versions of GCC. If we manage to port it to +RISC-V we will eventually be able to build GCC with it and with that the world. + +In summary, TinyCC is a key step in the bootstrapping chain. + + +### Problems fixed {#problems} + +This work can be easily followed in the commits in my TCC fork's +[`riscv-mes`][tcc] branch, and in my Mes clone's [`ekaitz`][mes] branch. Most +of the commits are already merged, but we leave that reference for people to be +able to follow the development easier. We are also identifying the contents of +this blogpost in the git history by adding the git tag `self-hosted-tcc-rv64` +to both of my forks. + +[tcc]: https://github.com/ekaitz-zarraga/tcc/tree/riscv-mes +[mes]: https://github.com/ekaitz-zarraga/mes/tree/ekaitz + +Many commits have a long message you can go read there, but this post was born +to summarize the most interesting changes we did, and write them in a more +digestible way. Lets see if I manage to do that. + +The following list is not ordered in any particular way, but we hope the +selection of problems we found is interesting for you. We found some errors +more, but these are the ones we consider more relevant. + + +#### TinyCC misses assembly instructions needed for MesLibC {#tinycc-missing-instructions} + +TinyCC is not like GCC, TinyCC generates binary code directly, no assembly code +in between. TinyCC has a separate assembler that doesn't follow the path that C +code follows. + +It works the same in all architectures, but we can take RISC-V as an example: + +TinyCC has `riscv64-gen.c` which generates the binary files, but +`riscv64-asm.c` file parses assembly code and also generates binary. As you can +see, binary generation is somehow duplicated. + +In the RISC-V case, the C part had support for mostly everything since my +backport, but the assembler did not support many instructions (which, by the +way are supported by the C part). + +MesLibC's `crt1.c` is written in assembly code. Its goal is to prepare the +`main` function and call it. For that it needs to call `jalr` instruction and +others that were not supported by TinyCC, neither upstream nor our +bootstrappable fork. + +These changes appear in several commits because I didn't really understood how +the TinyCC assembler worked, and some instructions need to use relocations +which I didn't know how to add. The following commit can show how it feels to +work on this, and shares how relocations are done: + +[1e597f3d239d9119d2ea4bb3ca29b587ea594dcc][lla-commit] + +[lla-commit]: https://github.com/ekaitz-zarraga/tcc/commit/1e597f3d239d9119d2ea4bb3ca29b587ea594dcc + +There you can see we started to understand things in TinyCC, but some other +changes came after this. + +A very important not here is upstream TinyCC does not have support for these +instructions yet so we need to patch upstream TinyCC when we use it, contribute +the changes or find other kind of solutions. Each solution has its downsides +and upsides, so we need to take a decision about this later. + + +#### TinyCC's assembly syntax is weird {#tcc-assembly} + +Following with the previous fix, TinyCC does not support GNU-Assembler's syntax +in RISC-V. It uses a simplified assembly syntax instead. + +When we would do: + +``` asm +sd s1, 8(a0) +``` + +In TinyCC's assembly we have to do: + +``` asm +sd a0, s1, 8 +``` + +This requires changes in MesLibC, and it makes us create a separate folder for +TinyCC in MesLibC. See `lib/riscv64-mes-tcc/` and `lib/linux/riscv64-mes-tcc` +for more details. + +#### TinyCC does not support Extended Asm in RV64 {#extended-assembly} + +Way later in time we also found TinyCC does not support [Extended Asm][ext-asm] +in RV64. The functions that manage that are simply empty. + +[ext-asm]: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html + +We spent some time until we realized what was going on in here for two reasons. +First, there are few cases of Extended Asm in the code we were compiling. +Second, it was failing silently. + +Extended Asm is important because it lets you tell the compiler you are going +to touch some registers in the assembly block, so it can protect variables and +apply optimizations properly. + +In our case, our assembly blocks were clobbering some variables that would have +been protected by the compiler if the Extended Asm support was implemented. + +Andrius found all the places in MesLibC where Extended Asm was used and rewrote +the assembly code to keep variables safe in the cases it was needed. See +[b5eb0e34c6fc76a4558940e43ac78cc8a63ebac1][extended-asm] in Mes. + +[extended-asm]: https://github.com/ekaitz-zarraga/mes/commit/b5eb0e34c6fc76a4558940e43ac78cc8a63ebac1 + +The other option was to add Extended Asm support for TinyCC, but we would need +to add it in the Bootstrappable TinyCC and also upstream. This also means +understanding TinyCC codebase very well and making the changes without errors, +so we decided to simplify MesLibC, because that is easier to make right. + +#### MesLibC `main` function arguments are not set properly {#main-args} + +Following the previous problem with assembly, we later found input arguments of +the `main` function, that come from the command line arguments, were not +properly set by our MesLibC. Andrius also took care of that in +[267a132ca932dafe628da000dc76714612cce144][main-ext] in Mes. + +[main-ext]: https://github.com/ekaitz-zarraga/mes/commit/267a132ca932dafe628da000dc76714612cce144 + +This error was easier to find than others because when we found issues with +this we already had a compiled TinyCC. So we just needed to fix simple things +around it. + + +#### TinyCC says `__global_pointer$` is not a valid symbol {#dollars} + +This is a small issue that was a headache for a while, but it happened to be a +very simple issue. + +In RISC-V there's a symbol, `__global_pointer$`, that is used for dynamic +linking, defined in the ABI. But TinyCC had issues to parse code around it and +it took us some time to realize it was the dollar sign (`$`) which was causing +the issues in this point. + +TinyCC does not process dollars in identifiers unless you specifically set a +flag (`-fdollars-in-identifiers`) when running it. In the RISC-V case, that +flag must be always active because if it isn't the `__global_pointer$` can't be +processed. + +We tried to set that flag in the command line but we had other issues in the +command line argument parsing (we found and fixed them later later) so we just +hardcoded it. + +This issue is interesting because it's an extremely simple problem, but its +effect appears in weird ways and it's not always easy to know where the problem +is coming from. + + +#### Bootstrappable TinyCC's casting issues {#tcc-casting-issues} + +This one was a really hard one to fix. + +When running our Bootstrappable TinyCC to build MesLibC we found this error: + +``` nothing + cannot cast from/to void +``` + +We managed to isolate a piece of C code that was able to replicate the +problem.[^reproducer] + +``` clike +long cast_charp_to_long (char const *i) +{ + return (long)i; +} + +long cast_int_to_long (int i) +{ + return (long)i; +} + +long cast_voidp_to_long (void const *i) +{ + return (long)i; +} + +void main(int argc, char* argv[]){ + return; +} +``` + +Compiling this file raised the same issue, but then I realized I could remove +two of the functions on the top and the error didn't happen. Adding one of +those functions back raised the error again. + +I tried to change the order of the functions and the functions I chose to add, +and I could reproduce it: if there were two functions it failed but it could +build with only one. + +Andrius found that the function type was not properly set in the RISC-V code +generation and its default value was `void`, so it only failed when it compiled +the second function. + +Knowing that, we could take other architectures as a reference to fix this, and +so we did. + +See [6fbd17852aa11a2d0bc047183efaca4ff57ab80c][tcc-casting-commit]. + +[tcc-casting-commit]: https://github.com/ekaitz-zarraga/tcc/commit/6fbd17852aa11a2d0bc047183efaca4ff57ab80c + +[^reproducer]: This is how we managed to fix most of the problems in our code: + make a small reproducer we can test separately so we can inspect the + process and the result easily. + + +#### Bootstrappable TinyCC's `long double` support was missing {#long-double} + +When I backported the RISC-V support to our Bootstrappable TinyCC I missed the +`long double` support and I didn't realize that because I never tested large +programs with it. + +The C standard doesn't define a size for `long double` (it just says it has to +be at least as long as the `double`), but its size is normally set to 16 bytes. +All this is weird in RV64, because it doesn't have 16 byte size registers. It +needs some extra support. + +Before we fixed this, the following code: + +``` clike +long double f(int a){ + return a; +} +``` + +Failed with: + +``` nothing + riscv64-gen.c:449 (`assert(size == 4 || size == 8)`) +``` + +Because it was only expecting to use `double`s (8 bytes) or `float`s (4 bytes). + +In upstream TinyCC there were some commits that added `long double` support +using, and I quote, a *mega hack*, so I just copied that support to our +Bootstrappable TinyCC. + +See [a7f3da33456b4354e0cc79bb1e3f4c665937395b][tcc-long-double]. + +[tcc-long-double]: https://github.com/ekaitz-zarraga/tcc/commit/a7f3da33456b4354e0cc79bb1e3f4c665937395b + +After this commit, some extra problems appeared with some missing symbols. But +these errors were link-time problems, because TinyCC had the floating point +helper functions needed for RISC-V defined in `lib/lib-arm64.c`, because they +were reusing aarch64 code for them. + +After this, we also compile and link `lib-arm64.c` and we have `long double` +support. + +#### MesCC struct initialization issues {#mescc-struct-init} + +This one was a lot of fun. Our Bootstrappable TinyCC exploded with random +issues: segfaults, weird branch decisions... + +After tons of debugging Andrius found some values in `struct`s were not set +properly. As we don't really know TinyCC's codebase really well, that was hard +to follow and we couldn't really know where was the value coming from. + +Andrius finally realized some `struct`s were not initialized properly. Consider +this example: + +``` clike +typedef struct { + int one; + int two; +} Thing; + +Thing a = {0}; +``` + +That's supposed to initialize *all* fields in the `Thing` `struct` to `0`, +according to the C standard[^cppref]. + +As a first solution we set struct fields manually to `0`, to make sure they +were initialized properly. See +[29ac0f40a7afba6a2d055df23a8ee2ee2098529e][tinycc-struct-0] + +[tinycc-struct-0]: https://github.com/ekaitz-zarraga/tcc/commit/29ac0f40a7afba6a2d055df23a8ee2ee2098529e + +After some debugging we found that the fields that were not explicitly set were +initialized to `22`. So I decided to go to MesCC and see if the struct +initialization was broken. + +This was my first dive in MesCC's code, and I have to say it's really easy to +follow. It took me some time to read through it because I'm not that used to +`match`, but I managed to find the struct initialization code. + +What I found in MesCC is there was a `22` hardcoded in the struct +initialization code, probably coming from some debug code that never was +removed. As no part of the x86 bootstrapping used that kind of initializations, +or nothing relied on them, the error went unnoticed. + +I set that to `0`, as it should be, and continued with our life. + +[^cppref]: You can see an explanation in the (1) case at + [cppreference.com](https://en.cppreference.com/w/c/language/struct_initialization) + + +#### MesCC vs TinyCC size problems {#size-problems} + +The C standard does not set a size for integers. It only sets relative sizes: +`short` has to be shorter or equal to `int`, `int` has to be shorter or equal +to a `long`, and so on. If you platform wants, all the integers, including the +`char`s can have 8 bits, and that's ok for the C standard. + +TinyCC's RISC-V backed was written under the assumption that `int` is 32 bit +wide. You can see this happening in `riscv64-gen.c`, for example, here: + +``` clike + EI(0x13, 0, rr, rr, (int)pi << 20 >> 20); // addi RR, RR, lo(up(fc)) +``` + +The rotation there is done to clear the upper 20 bits of the pi variable. This +code's behavior might be different from one platform to another. Taking the +example before, of that possible platform that only has 8 bit integers, this +code would send a `0` instead of the lower 12 bits of `pi`. + +In our case, we had MesCC using the whole register width, 64bits, for temporary +values so the lowest `44` bits were left and the next assertion that checked +the immediate was less than 12 bits didn't pass. + +This is a huge problem, as most of the code in the RISC-V generation is written +using this style. + +There are other ways to do the same thing (`pi & 0xFFF` maybe?) in a more +portable way, but we don't know why upstream TinyCC decided to do it this way. +Probably they did because GCC (and TinyCC itself) use 32 bit integers, but they +didn't handle other possible cases, like the one we had here with MesCC. + +In any case, this made us rethink MesCC, dig on how are its integers defined, +how to change this to be compatible with TinyCC and so on, but I finally +decided to add casts in the middle to make sure all this was compiled as +expected. + +It was a good reason to make us re-think MesCC's integers, but it took a very +long time to deal with this, that could be better used in something else. Now, +we all became paranoids about integers and we still think some extra errors +will arise from them in the future. Integers are hard. + + +#### MesCC add support for signed rotation {#mes-signed-rotation} + +Integers were in our minds for long, as described in the previous block, but I +didn't talk about signedness in that one. + +Following one of the crazy errors we had in TinyCC, I somehow realized (I don't +remember how!) that we were missing signed rotation support in MesCC. I think +that I found this while doing some research of the code MesCC was outputting +when I spotted some rotations done using unsigned instructions for signed +values and I started digging in MesCC to find out why. I finally realized that +there was no support for that and the rotation operation wasn't selected +depending on the signedness of the value being rotated. + +Let's see this with an example: + +``` clike +signed char a = 0xF0; +unsigned char b = 0xF0; + +// What is this? (Answer: 0xFF => 255) +a >> 4; + +// And this? (Answer: 0x0F => 15) +b >> 4; +``` + +In the example you can see the rotation operation does not work the same way if +the value is signed or not. If you always use the unsigned version of the `>>` +operation, you don't have the results you expected. Signs are also hard. + +In this case, like in many others, the fix was easier than realizing what was +going wrong. I just added support for the signed rotation operation, not only +for RISC-V but for all architectures, and I added the correct signedness check +to the rotation operation to select the correct instruction. The patch (see +[c0c2556c2b2897814a87b8bdfa6997f79c218eeb][signed-rotation] in Mes) is very +clean and easy to read, because MesCC's codebase is really well ordered. + +[signed-rotation]: https://github.com/ekaitz-zarraga/mes/commit/c0c2556c2b2897814a87b8bdfa6997f79c218eeb + + +#### MesCC switch/case falls-back to default case {#broken-case} + +In the early bootstrap runs, our Bootstrappable TinyCC it did weird things. +After many debugging sessions we realized the `switch` statements in +`riscv64-gen.c`, more specifically in `gen_opil`, were broken. The fall-backs +in the `switch` were automatically directed to the `default` case. Weird! + +MesCC has many tests so I read all that were related with the `switch` +statements and the ones that handled the fall-backs were all falling-back to +the `default` case, so our weird behavior wasn't tested. + +I added the tests for our case and read the disassemble of simple examples when +I realized the problem. + +Each of the `case` blocks has two parts: the clause that checks if the value +of the expression is the one of the case, and the body of the case itself. + +The `switch` statement generation was doing some magic to deal with `case` +blocks, but it was failing to deal with complex fall-through schemes because +the clause of the target `case` block was always run, making the code fall to +the `default` case, as the clause was always false because the one that matched +was the one that made the fall-back. + +There were some problems to fix this, as NyaCC (MesCC's C parser) returns +`case` blocks as nested when they don't have a `break` statement: + +``` lisp +(case testA + (case testB + (case testC BODY))) +``` + +Instead of doing this, I decided to flatten the `case` blocks with empty +bodies. This way we can deal with the structure in a simpler way. + +``` lisp +((case testA (expr-stmt)) + (case testB (expr-stmt)) + (case testC BODY)) +``` + +Once this is done, I expanded each `case` block to a jump that jumps over the +clause, the clause and then its body. Doing this, the fall-back doesn't +re-evaluate the clause, as it doesn't need to. The generated code looks like +this in pseudocode: + +``` assembly + ;; This doesn't have the jump because it's the first +CASE1: + testA +CASE1_BODY: + ... + + goto CASE2_BODY +CASE2: + testB +CASE2_BODY: + ... + + goto CASE3_BODY +CASE3: + testB +CASE3_BODY: + ... +``` + +If one of the `case`s has a `break`, it's treated as part of its body, and it +will end the execution of the `switch` statement normally, no fall-back. + +This results in a simpler `case` block control. The previous approach dealt +with nested `case` blocks and tried to be clever about them, but +unsuccessfully. The best thing about this commit is most of the cleverness was +simply removed with a simple solution (flatten all the things!). + +It wasn't that easy to implement, but I first built a simple prototype and +Janneke's scheme magic made my approach usable in production. + +All this is added in Mes's codebase in several commits, as we needed some +iterations to make it right. [f75cf7bfb911868023732bf4274978069b98849a][cases] +has the base of this commit, but there were some iterations more in Mes. + +[cases]: https://github.com/ekaitz-zarraga/mes/commit/f75cf7bfb911868023732bf4274978069b98849a + + +#### Boostrappable TinyCC problems with GOT {#got} + +The Global Offset Table is a table that helps with relocatable binaries. Our +Bootstrappable TinyCC segfaulted because it was generating an empty GOT. + +Andrius debugged upstream TinyCC alongside ours and realized there was a +missing check in an `if` statement. He fixed it in +[f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9][got-commit]. + +The problem with this kind of errors is TinyCC's codebase is really hard to +read. It's a very small compiler but it's not obvious to see how things are +done on it, so we had to spend many hours in debugging sessions that went +nowhere. If we had a compiler that is easier to read and change, it would be +way simpler to fix and we would have had a better experience with it. + +[got-commit]: https://github.com/ekaitz-zarraga/tcc/commit/f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9 + +#### Bootstrappable TinyCC generates wrong assembly in conditionals {#wrong-conditionals} + +We spent a long time debugging a bug I introduced during the backport when I +tried to undo some optimization upstream TinyCC applied to comparison +operations. + +Consider the following code: + +``` clike +if ( x < 8 ) + whatever(); +else + whatever_else(); +``` + +Our Bootstrappable TinyCC was unable to compile this code correctly, instead, +it outputted a code that always took the same branch, regardless of the value +in `x`. + +In TinyCC, a conditional like `if (x < CONSTANT)` has a special treatment, and +it's converted to something like this pseudoassembly: + +``` pseudo +load x to a0 +load CONSTANT to a1 +set a0 if less than a1 +branch if a0 not equal 0 ; Meaning it's `set` +``` + +This behaviour uses the `a0` register as a flag, emulating what other CPUs +use for comparisons. RISC-V doesn't need that, but it's still done here +probably for compatibility with other architectures. In RISC-V it could look +like this: + +``` pseudo +load x to a0 +load CONSTANT to a1 +branch if a0 less than a1 +``` + +You can easily see the `branch` "instruction" does a different comparison in +one case versus the other. In the one in the top it checks if `a0` is set, +and in the other checks if `a0` is smaller than `a1`. + +TinyCC handles this case in a very clever way (maybe too clever?). When they +emit the `set a0 if less than a1` instruction they replace the current +comparison operation with `not equal` and they remove the `CONSTANT` and +replace it with a `0`. That way, when the `branch` instruction is generated, +they insert the correct clause. + +In my code I forgot to replace the comparison operator so the branch checked +`if a0 is less than 0` and it was always false, as the `set` operation writes +a `0` or a `1` and none of them is less than `0`. + +The commit [5a0ef8d0628f719ebb01c952797a86a14051228c][branch-tcc] explains this +in a more technical way, using actual RISC-V instructions. + +This was also a hard to fix, because TinyCC's variable names (`vtop->c.i`) are +really weird and they are used for many different purposes. + +[branch-tcc]: https://github.com/ekaitz-zarraga/tcc/commit/5a0ef8d0628f719ebb01c952797a86a14051228c + + +#### Support for variable length arguments {#varargs} + +In C you can define functions with variable argument length. In RISC-V, those +arguments are sent using registers while in other architectures are sent using +the stack. This means the RISC-V case is a little bit more complex to deal +with, and needs special treatment. + +Andrius realized in our Bootsrappable TinyCC we had issues with variable length +arguments, specially in the most famous function that uses them: `printf`. He +also found that the problem came from the arguments not being properly set and +found the problem. + +Reading upstream TinyCC we found they use a really weird system for the defines +that deal with this. They have a header file, `include/tccdefs.h`, which is +included in the codebase, but also processed by a tool that generates strings +that are later injected at execution time in TinyCC. + +This was too much for us so we just extracted the simplest variable arguments +definitions for RISC-V and introduced that in MesLibC and our Bootstrappable +TinyCC. + +##### Extra: files generated with no permissions + +There might be more problems with this though, we need to tackle in the future. +The bootstrappable TinyCC built using MesCC generates files with no +permissions and Andrius found that this problem comes from the argument +handling in the `open` system call in MesLibC. It's not a big deal at the +moment, because the next iteration of TinyCC uses correct permissions. We can +just `chmod` the file manually, but we'll probably fix it anyway. + + +#### MesLibC use `signed char` for `int8_t` {#int8} + +We already had a running Bootstrappable TinyCC compiled using MesCC when we +stumbled upon this issue. Somehow, when assembling: + +``` asm +addi a0, a0, 9 +``` + +The code was trying to read `9` as a register name, and failed to do it (of +course). It was weird to realize that the following code (in `riscv64-asm.c`) +was always using the true branch in the `if` statement, even if +`asm_parse_regvar` returned `-1`: + +``` clike +int8_t reg; +... +if ((reg = asm_parse_regvar(tok)) != -1) { + ... +} else ... +``` + +I disassembled and saw something like this: + +``` pseudoassembly +call asm_parse_regvar ;; Returns value in a0 +reg = a0 +a0 = a0 + 1 +branch if a0 equals 0 +``` + +This looks ok, it does some magic with the `-1` but it makes sense anyway. The +problem is that it didn't branch because `a0` was `256` even when +`asm_parse_regvar` returned `-1`. + +During some of the `int` related problems someone told me in the Fediverse that +`char`'s default signedness is not defined in the C standard. I read MesLibC +and, exactly: `int8_t` was defined as an alias to `char`. + +In RISC-V `char` is by default `unsigned` (don't ask me why) but we are used to +x86 where it's `signed` by default. Only saying `char` is not portable. + +Replacing: + +``` clike +typedef char int8_t; +``` + +With: + +``` clike +typedef signed char int8_t; +``` + +Fixed the issue. + +From this you can learn several things: + +1. Don't assume `char`'s signedness in C +2. If you design a programming language, be consistent with your decisions. In + C `int` is always `signed int`, but `char`'s don't act like that. Don't do + this. + +#### MesLibC Implement `setjmp` and `longjmp` {#jmp} + +Those that are not that versed in C, as I was before we found this issue, won't +know about `setjmp` and `longjmp` but they are, simplifying a lot, like a +`goto` you can use in any part of the code. `setjmp` needs a buffer and it +stores the state of the program on it, `longjmp` sets the status of the program +to the values on the buffer, so it jumps to the position stored in `setjmp`. + +Both functions are part of the C standard library and they need specific +support for each architecture because they need to know which registers are +considered part of the state of the program. They need to know how to store the +program counter, the return address, and so on, and how to restore them. + +In their simplest form they are a set of stores in the case of the `setjmp` and +a set of loads in the case of `longjmp`. + +In RISC-V they only need to store the `s*` registers, as they are the ones that +are not treated as temporary. It's simple, but it needs to be done, which +wasn't in neither for GCC nor for RISC-V in MesLibC. + +Andrius is not convinced with our commit in here, and I agree with his +concerns. We added the full `setjmp` and `longjmp` implementations directly +stolen from inspired in the ones in Musl[^stolen] but it has also +floating point register support, using instructions that are not implemented in +TinyCC yet. This is going to be a problem in the future because later +iterations will try to execute instructions they don't actually understand. + +There are two (or three) possible solutions here. The first is to remove the +floating point instructions for now (another flavor for this solution is to +hide them under an `#ifdef`). The second is to implement the floating point +instructions in TinyCC's RISC-V assembler, which sounds great but forces us to +upstream the changes, and that process may take long and we'd need to patch it +in our bootstrapping scripts until it happens. + +We'll think about it, that's why the commit is marked as a WIP: +[42cb302c857fecafde6f27a8311531d606d15feb][setjmp]. + +[setjmp]: https://github.com/ekaitz-zarraga/mes/commit/42cb302c857fecafde6f27a8311531d606d15feb + +[^stolen]: Yo, if it's free software it's not stealing! Please steal my code. + Make it better. + + +#### More {#more} + +Those are mostly the coolest errors we needed to deal with but we stumbled upon +a lot of errors more. + +Before this effort started Andrius added support for 64 bit instructions in Mes +and fixed some issues 64bit architectures had in M2. + +I found a [bug in Guix shell](https://issues.guix.gnu.org/65225) (it's still +open) and had to fix some ELF headers in MesCC generated files because objdump +and gdb refused to work on them. + +Also, while I was writing this lines Andrius fixed the x86 bootstrapping, which +I broke when the backporting process started. + +In the end, a project like this is like hitting your head against a wall until +one of them breaks. Sometimes it feels like the head did. + +#### Reproducing what we did {#reproducing} + +> TODO + +##### Using live-bootstrap {#live-bootstrap} + +Andrius is part of the `live-bootstrap` effort and he's doing all the scripting +there to keep the process reproducible. + +[Live-bootstrap](https://github.com/fosslinux/live-bootstrap) is... + +> An attempt to provide a reproducible, automatic, complete end-to-end +> bootstrap from a minimal number of binary seeds to a supported fully +> functioning operating system. + +That's the official description of the project. From a more practical +perspective, it's a set of scripts that build the whole operating system from +scratch, depending on few binary seeds. + +That's not very different to what Guix provides from a bootstrapping +perspective. Guix is "just" an environment where you can run "scripts" (the +packages define how they are built) in a reproducible way. Of course, Guix is +way more than that, but if we focus on what we are doing right now it acts like +the exact same thing. + +> NOTE: `live-bootstrap`'s project description is a little bit outdated. If you +> read the comparison with Guix, what you'd read is old information. If you +> want to read a more up-to-date information about Guix's bootstrapping process +> I suggest you to read this page of Guix manual: +> + +Being very different projects, in a practical level, the main difference +between them is `live-bootstrap` is probably easier for you to test if you are +working on any GNU/Linux distribution[^in-guix]. + +[^in-guix]: If you run it in Guix or in a distribution that doesn't follow FHS + you'd probably need to touch the path of your Qemu installation or be + careful with the options you send to the `rootfs.py` script. + +If you want to reproduce this exact point in time you only need to use my fork +of `live-bootstrap` you can find HERE, jump to the `self-hosted-tcc-rv64` tag +and run it. Andrius made all the magic to set that process to take all the +inputs from Mes and TinyCC from the correct tag. We'll leave that there for +future reference. + +> TODO + +#### Using Guix for a reproducible environment {#guix} + +Over what I just mentioned, there's another big difference between +live-bootstrap and Guix: I am the one making the Guix package for this. + +> TODO + +### Conclusions {#conclusions} + +Of course, the problems we fixed now look easy and simple to fix. This blog +post doesn't really do justice to the countless debugging hours and all the +nights we, Andrius and I, spent thinking about where could the issues be +coming from. + +The debugging setup wasn't as good as you might imagine. The early steps of the +bootstrap don't have all the debug symbols as a "normal" userspace program +would. In many cases, function names were all we had. + +I have thank my colleague Andrius here because he did a really good debugging +job, and he provided me with small reproducers that I could finally fix. Most +of the times he made the assist and I scored the goal. + +He also did a great job with the testing which I couldn't do because I was +struggling with Guix from the early days, trying to make the compilers find the +header files and libraries. + +In the emotional part it is also a great improvement to have someone to rely +on. Andrius, Janneke and I had a good teamwork and we supported each other when +our faith started to crumble. And believe, it does crumble when a new bug +appears after you fixed one that you needed a week for. There were times this +summer I thought we would never reach this point. + +It's also worth mention here that the bootstrapping process is extremely slow: +it takes hours. This kills the responsiveness and makes testing way harder than +it should be. Not to mention that we are working on a foreign architecture, +which has it's own problems too. + +If you have to take some lesson from something like this, here you have a +suggestion list: + +- The simplest error can take ages to debug if your code is crazy enough. +- Don't be clever. It sets a very high standard for your future self and people + who will read your code in the future. +- I guess we can summarize the previous two points in one: If we could remove + TinyCC from the chain, we would. It's a source of errors and it's hard to + debug. The codebase is really hard to read for no apparent reason. +- When build times are long, small reproducers help. +- Add tests for each new case you find. +- Don't trust, disassemble and debug. +- Be careful with C and standards and undefined behavior. +- Integers are hard. Signedness makes them harder. +- Being surrounded by the correct people makes your life easier. + +Also, as a personal note I noticed I'm a better programmer since the previous +post in the this series. I feel way more comfortable with complex reasoning and +even writing new programs in other languages, even if I spent almost no time +coding anything from scratch. It's like dealing with this kind of issues about +the internals give you some level of awareness that is useful in a more general +way than it looks. Crazy stuff. + +If you can, try to play with the internals of things from time to time. It +helps. At least it helped me. + +### What is next? {#next} + +In the short-term, we need to decide what to do with the `setjmp` fix and +include it in MesLibC. After that we need to fix `va_args` in MesCC, for that +error with the permissions in the output files and fix the floating point +numbers in RV64 in TinyCC. + +Once that is done, now in the mid-term, we would be able to compile a fully +featured Bootstrappable TinyCC. With that and some fixes in MesLibC, we would +be able to compile upstream TinyCC. We need to fix any error we find there and +until it is ready for GCC. + +Now in the long-term, we are going to have problems with GCC so we'll need to +fix those, too. Once that is done, we would use GCC to compile more recent +versions of GCC until we compile the world. + +That's more or less the description of what we will do in the next months. + +Meanwhile, we'll need to test this on real hardware we specifically acquired +for this task. It's slow, but it should be enough for these tests. + +And this is pretty much it. I hope you learned something new about C, the +Bootstrapping process or at least had a good time reading this wall of text. + +We'll try to work less for the next one, but we can't promise that. 😉 + +Take care. + + +--- + + -- cgit v1.2.3