Title: Milestone — MesCC builds TinyCC and fun C errors for everyone Date: 2023-10-30 Category: Tags: Bootstrapping GCC in RISC-V Slug: bootstrapGcc8 Lang: en Summary: We spent the last months making MesCC able to compile TinyCC and making the result of that compilation able to compile TinyCC. Many cool problems appeared, this is the summary of our work. It's been a while since the latest technical update in the project and I am fully aware that you were missing it so it's time to recap with a really cool announcement: **We finally made a self-hosted Bootstrappable TinyCC in RISC-V** Most of you probably remember I [already backported](bootstrapGcc6.html) the Bootstrappable TinyCC compiler, but I didn't test it in a proper environment. Now, we can confidently say it is able to compile itself, a "large" program that makes use of more complex C features than I did in the tests. All this work was done by Andrius Štikonas and myself. Janneke helped us a lot with Mes related parts, too. The work this time was pretty hard, honestly. Most of the things we did here are not obvious, even for C programmers. I'm not used to this kind of quirks of the C language. Most of them are really specific, related with the standards and many others are just things were missing. I hope the ones I chose to discuss here help you understand your computing better, as they did to me. This is going to be veery long post, so take a ToC to help you out: 1. [Context](#context) 1. [Why is this important?](#why-important) 2. [Problems fixed](#problems) 1. [TinyCC misses assembly instructions needed for MesLibC](#tinycc-missing-instructions) 2. [TinyCC's assembly syntax is weird](#tcc-assembly) 3. [TinyCC does not support Extended Asm in RV64](#extended-assembly) 4. [MesLibC `main` function arguments are not set properly](#main-args) 5. [TinyCC says `__global_pointer$` is not a valid symbol](#dollars) 6. [Bootstrappable TinyCC's casting issues](#tcc-casting-issues) 7. [Bootstrappable TinyCC's `long double` support was missing](#long-double) 8. [MesCC struct initialization issues](#mescc-struct-init) 9. [MesCC vs TinyCC size problems](#size-problems) 10. [MesCC add support for signed rotation](#mes-signed-rotation) 11. [MesCC switch/case falls-back to default case](#broken-case) 12. [Boostrappable TinyCC problems with GOT](#got) 13. [Bootstrappable TinyCC generates wrong assembly in conditionals](#wrong-conditionals) 14. [Support for variable length arguments](#varargs) 15. [MesLibC use `signed char` for `int8_t`](#int8) 16. [MesLibC Implement `setjmp` and `longjmp`](#jmp) 17. [More](#more) 3. [Reproducing what we did](#reproducing) 1. [Using live-bootstrap](#live-bootstrap) 1. [Using Guix](#guix) 4. [Conclusions](#conclusions) 5. [What is next?](#next) ### Context {#context} You have many blogposts in the series to find the some context about the project, and even a FOSDEM talk about it, but they all give a very broad explanation, so let's focus on what we are doing right now. Here we have Mes, a Scheme interpreter, that runs MesCC, a C compiler, that is compiling our simplified fork of TinyCC, let's call that Bootstrappable TinyCC. That Bootstrappable TinyCC compiler then tries to compile its own code. It compiles it's own code because it's goal is to add more flags in each compilation, so it has more features in each round[^rounds]. We do all this because TinyCC is way faster than MesCC and it's also more complex, but MesCC is only able to build a simple TinyCC with few features enabled. [^rounds]: There are many rounds. Like 7 or so. During all this process we use a standard library provided by the Mes project, we'll call it MesLibC, because we can't build glibc at this point, and TinyCC does not provide it's own C standard library. With all this well understood, this is the achievement: **We made MesCC able to compile the Bootstrappable TinyCC, using MesLibC, to an executable that is able to compile the Bootstrappable TinyCC's codebase to a binary that works and has all the features we need enabled.**[^self-hosted] [^self-hosted]: So it can compile itself again an again, but who would want to do that? The process affected all the pieces in the system. We added changes in MesCC, MesLibC and the Bootstrappable TinyCC. #### Why is this important? {#why-important} We already talked long about the bootstrapping issue, the trusting trust attack and all that. I won't repeat that here. What I'll do instead is to be specific. This step is a big thing because this allows us to go way further in the chain. All the steps before Mes were already ported to RISC-V mostly thanks to Andrius Štikonas who worked in [Stage0-POSIX][stage0] and the rest of glue projects that are needed to reach Mes. [stage0]: https://github.com/oriansj/stage0-posix Mes had been ported to RISC-V (64 bit) by W. J. van der Laan, and some patches were added on top of it by Andrius Štikonas himself before our current effort started. At this moment in time, Mes was unable to build our bootstrappable TinyCC in RISC-V, the next step in the process, and the bootstrappable TinyCC itself was unable to build itself either. This was a very limiting point, because TinyCC is the first "proper" C compiler in the chain. When I say "proper" I mean fast and fully featured as a C compiler. In x86, TinyCC is able to compile old versions of GCC. If we manage to port it to RISC-V we will eventually be able to build GCC with it and with that the world. In summary, TinyCC is a key step in the bootstrapping chain. ### Problems fixed {#problems} This work can be easily followed in the commits in my TCC fork's [`riscv-mes`][tcc] branch, and in my Mes clone's [`riscv-tcc-boot`][mes] branch. We are also identifying the contents of this blogpost in the git history by adding the git tag `self-hosted-tcc-rv64` to both of my forks. We will try to keep both for future reference. In Mes the process might be a little bit harder to follow because we sent most of the patches to Janneke and he merged them so when we were about to release this post I continued from Janneke's branch to avoid divergences (I had some problems with that before). In any case, the code is there and searching by authors (Andrius and myself) would guide you to the changes we did. [tcc]: https://github.com/ekaitz-zarraga/tcc/tree/riscv-mes [mes]: https://github.com/ekaitz-zarraga/mes/tree/riscv-tcc-boot Many commits have a long message you can go read there, but this post was born to summarize the most interesting changes we did, and write them in a more digestible way. Lets see if I manage to do that. The following list is not ordered in any particular way, but we hope the selection of problems we found is interesting for you. We found some errors more, but these are the ones we consider more relevant. #### TinyCC misses assembly instructions needed for MesLibC {#tinycc-missing-instructions} TinyCC is not like GCC, TinyCC generates binary code directly, no assembly code in between. TinyCC has a separate assembler that doesn't follow the path that C code follows. It works the same in all architectures, but we can take RISC-V as an example: TinyCC has `riscv64-gen.c` which generates the binary files, but `riscv64-asm.c` file parses assembly code and also generates binary. As you can see, binary generation is somehow duplicated. In the RISC-V case, the C part had support for mostly everything since my backport, but the assembler did not support many instructions (which, by the way are supported by the C part). MesLibC's `crt1.c` is written in assembly code. Its goal is to prepare the `main` function and call it. For that it needs to call `jalr` instruction and others that were not supported by TinyCC, neither upstream nor our bootstrappable fork. These changes appear in several commits because I didn't really understood how the TinyCC assembler worked, and some instructions need to use relocations which I didn't know how to add. The following commit can show how it feels to work on this, and shares how relocations are done: [lla-commit]: https://github.com/ekaitz-zarraga/tcc/commit/1e597f3d239d9119d2ea4bb3ca29b587ea594dcc There you can see we started to understand things in TinyCC, but some other changes came after this. A very important not here is upstream TinyCC does not have support for these instructions yet so we need to patch upstream TinyCC when we use it, contribute the changes or find other kind of solutions. Each solution has its downsides and upsides, so we need to take a decision about this later. #### TinyCC's assembly syntax is weird {#tcc-assembly} Following with the previous fix, TinyCC does not support GNU-Assembler's syntax in RISC-V. It uses a simplified assembly syntax instead. When we would do: ``` asm sd s1, 8(a0) ``` In TinyCC's assembly we have to do: ``` asm sd a0, s1, 8 ``` This requires changes in MesLibC, and it makes us create a separate folder for TinyCC in MesLibC. See `lib/riscv64-mes-tcc/` and `lib/linux/riscv64-mes-tcc` for more details. #### TinyCC does not support Extended Asm in RV64 {#extended-assembly} Way later in time we also found TinyCC does not support [Extended Asm][ext-asm] in RV64. The functions that manage that are simply empty. [ext-asm]: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html We spent some time until we realized what was going on in here for two reasons. First, there are few cases of Extended Asm in the code we were compiling. Second, it was failing silently. Extended Asm is important because it lets you tell the compiler you are going to touch some registers in the assembly block, so it can protect variables and apply optimizations properly. In our case, our assembly blocks were clobbering some variables that would have been protected by the compiler if the Extended Asm support was implemented. Andrius found all the places in MesLibC where Extended Asm was used and rewrote the assembly code to keep variables safe in the cases it was needed. The other option was to add Extended Asm support for TinyCC, but we would need to add it in the Bootstrappable TinyCC and also upstream. This also means understanding TinyCC codebase very well and making the changes without errors, so we decided to simplify MesLibC, because that is easier to make right. We are probably going to need to do this later on anyway, but we'll try to delay this as much as possible. #### MesLibC `main` function arguments are not set properly {#main-args} Following the previous problem with assembly, we later found input arguments of the `main` function, that come from the command line arguments, were not properly set by our MesLibC. Andrius also took care of that in [4f4a1174][main-ext] in Mes. [main-ext]: https://github.com/ekaitz-zarraga/mes/commit/4f4a11745d1c7ed0995e9d31c7994abfb4a60b25 This error was easier to find than others because when we found issues with this we already had a compiled TinyCC. So we just needed to fix simple things around it. #### TinyCC says `__global_pointer$` is not a valid symbol {#dollars} This is a small issue that was a headache for a while, but it happened to be a very simple issue. In RISC-V there's a symbol, `__global_pointer$`, that is used for dynamic linking, defined in the ABI. But TinyCC had issues to parse code around it and it took us some time to realize it was the dollar sign (`$`) which was causing the issues in this point. TinyCC does not process dollars in identifiers unless you specifically set a flag (`-fdollars-in-identifiers`) when running it. In the RISC-V case, that flag must be always active because if it isn't the `__global_pointer$` can't be processed. We tried to set that flag in the command line but we had other issues in the command line argument parsing (we found and fixed them later later) so we just hardcoded it. This issue is interesting because it's an extremely simple problem, but its effect appears in weird ways and it's not always easy to know where the problem is coming from. #### Bootstrappable TinyCC's casting issues {#tcc-casting-issues} This one was a really hard one to fix. When running our Bootstrappable TinyCC to build MesLibC we found this error: ``` nothing cannot cast from/to void ``` We managed to isolate a piece of C code that was able to replicate the problem.[^reproducer] ``` clike long cast_charp_to_long (char const *i) { return (long)i; } long cast_int_to_long (int i) { return (long)i; } long cast_voidp_to_long (void const *i) { return (long)i; } void main(int argc, char* argv[]){ return; } ``` Compiling this file raised the same issue, but then I realized I could remove two of the functions on the top and the error didn't happen. Adding one of those functions back raised the error again. I tried to change the order of the functions and the functions I chose to add, and I could reproduce it: if there were two functions it failed but it could build with only one. Andrius found that the function type was not properly set in the RISC-V code generation and its default value was `void`, so it only failed when it compiled the second function. Knowing that, we could take other architectures as a reference to fix this, and so we did. See [6fbd1785][tcc-casting-commit]. [tcc-casting-commit]: https://github.com/ekaitz-zarraga/tcc/commit/6fbd17852aa11a2d0bc047183efaca4ff57ab80c [^reproducer]: This is how we managed to fix most of the problems in our code: make a small reproducer we can test separately so we can inspect the process and the result easily. #### Bootstrappable TinyCC's `long double` support was missing {#long-double} When I backported the RISC-V support to our Bootstrappable TinyCC I missed the `long double` support and I didn't realize that because I never tested large programs with it. The C standard doesn't define a size for `long double` (it just says it has to be at least as long as the `double`), but its size is normally set to 16 bytes. All this is weird in RV64, because it doesn't have 16 byte size registers. It needs some extra support. Before we fixed this, the following code: ``` clike long double f(int a){ return a; } ``` Failed with: ``` nothing riscv64-gen.c:449 (`assert(size == 4 || size == 8)`) ``` Because it was only expecting to use `double`s (8 bytes) or `float`s (4 bytes). In upstream TinyCC there were some commits that added `long double` support using, and I quote, a *mega hack*, so I just copied that support to our Bootstrappable TinyCC. See [a7f3da33456b][tcc-long-double]. [tcc-long-double]: https://github.com/ekaitz-zarraga/tcc/commit/a7f3da33456b4354e0cc79bb1e3f4c665937395b After this commit, some extra problems appeared with some missing symbols. But these errors were link-time problems, because TinyCC had the floating point helper functions needed for RISC-V defined in `lib/lib-arm64.c`, because they were reusing aarch64 code for them. After this, we also compile and link `lib-arm64.c` and we have `long double` support. #### MesCC struct initialization issues {#mescc-struct-init} This one was a lot of fun. Our Bootstrappable TinyCC exploded with random issues: segfaults, weird branch decisions... After tons of debugging Andrius found some values in `struct`s were not set properly. As we don't really know TinyCC's codebase really well, that was hard to follow and we couldn't really know where was the value coming from. Andrius finally realized some `struct`s were not initialized properly. Consider this example: ``` clike typedef struct { int one; int two; } Thing; Thing a = {0}; ``` That's supposed to initialize *all* fields in the `Thing` `struct` to `0`, according to the C standard[^cppref]. As a first solution we set struct fields manually to `0`, to make sure they were initialized properly. See [29ac0f40a7afb][tinycc-struct-0] [tinycc-struct-0]: https://github.com/ekaitz-zarraga/tcc/commit/29ac0f40a7afba6a2d055df23a8ee2ee2098529e After some debugging we found that the fields that were not explicitly set were initialized to `22`. So I decided to go to MesCC and see if the struct initialization was broken. This was my first dive in MesCC's code, and I have to say it's really easy to follow. It took me some time to read through it because I'm not that used to `match`, but I managed to find the struct initialization code. What I found in MesCC is there was a `22` hardcoded in the struct initialization code, probably coming from some debug code that never was removed. As no part of the x86 bootstrapping used that kind of initializations, or nothing relied on them, the error went unnoticed. I set that to `0`, as it should be, and continued with our life. [^cppref]: You can see an explanation in the (1) case at [cppreference.com](https://en.cppreference.com/w/c/language/struct_initialization) #### MesCC vs TinyCC size problems {#size-problems} The C standard does not set a size for integers. It only sets relative sizes: `short` has to be shorter or equal to `int`, `int` has to be shorter or equal to a `long`, and so on. If you platform wants, all the integers, including the `char`s can have 8 bits, and that's ok for the C standard. TinyCC's RISC-V backed was written under the assumption that `int` is 32 bit wide. You can see this happening in `riscv64-gen.c`, for example, here: ``` clike EI(0x13, 0, rr, rr, (int)pi << 20 >> 20); // addi RR, RR, lo(up(fc)) ``` The rotation there is done to clear the upper 20 bits of the pi variable. This code's behavior might be different from one platform to another. Taking the example before, of that possible platform that only has 8 bit integers, this code would send a `0` instead of the lower 12 bits of `pi`. In our case, we had MesCC using the whole register width, 64bits, for temporary values so the lowest `44` bits were left and the next assertion that checked the immediate was less than 12 bits didn't pass. This is a huge problem, as most of the code in the RISC-V generation is written using this style. There are other ways to do the same thing (`pi & 0xFFF` maybe?) in a more portable way, but we don't know why upstream TinyCC decided to do it this way. Probably they did because GCC (and TinyCC itself) use 32 bit integers, but they didn't handle other possible cases, like the one we had here with MesCC. In any case, this made us rethink MesCC, dig on how are its integers defined, how to change this to be compatible with TinyCC and so on, but I finally decided to add casts in the middle to make sure all this was compiled as expected. It was a good reason to make us re-think MesCC's integers, but it took a very long time to deal with this, that could be better used in something else. Now, we all became paranoids about integers and we still think some extra errors will arise from them in the future. Integers are hard. #### MesCC add support for signed rotation {#mes-signed-rotation} Integers were in our minds for long, as described in the previous block, but I didn't talk about signedness in that one. Following one of the crazy errors we had in TinyCC, I somehow realized (I don't remember how!) that we were missing signed rotation support in MesCC. I think that I found this while doing some research of the code MesCC was outputting when I spotted some rotations done using unsigned instructions for signed values and I started digging in MesCC to find out why. I finally realized that there was no support for that and the rotation operation wasn't selected depending on the signedness of the value being rotated. Let's see this with an example: ``` clike signed char a = 0xF0; unsigned char b = 0xF0; // What is this? (Answer: 0xFF => 255) a >> 4; // And this? (Answer: 0x0F => 15) b >> 4; ``` In the example you can see the rotation operation does not work the same way if the value is signed or not. If you always use the unsigned version of the `>>` operation, you don't have the results you expected. Signs are also hard. In this case, like in many others, the fix was easier than realizing what was going wrong. I just added support for the signed rotation operation, not only for RISC-V but for all architectures, and I added the correct signedness check to the rotation operation to select the correct instruction. The patch (see [88f24ea8][signed-rotation] in Mes) is very clean and easy to read, because MesCC's codebase is really well ordered. [signed-rotation]: https://github.com/ekaitz-zarraga/mes/commit/88f24ea8661dd279c2a919f8fbd5f601bb2509ae #### MesCC switch/case falls-back to default case {#broken-case} In the early bootstrap runs, our Bootstrappable TinyCC it did weird things. After many debugging sessions we realized the `switch` statements in `riscv64-gen.c`, more specifically in `gen_opil`, were broken. The fall-backs in the `switch` were automatically directed to the `default` case. Weird! MesCC has many tests so I read all that were related with the `switch` statements and the ones that handled the fall-backs were all falling-back to the `default` case, so our weird behavior wasn't tested. I added the tests for our case and read the disassemble of simple examples when I realized the problem. Each of the `case` blocks has two parts: the clause that checks if the value of the expression is the one of the case, and the body of the case itself. The `switch` statement generation was doing some magic to deal with `case` blocks, but it was failing to deal with complex fall-through schemes because the clause of the target `case` block was always run, making the code fall to the `default` case, as the clause was always false because the one that matched was the one that made the fall-back. There were some problems to fix this, as NyaCC (MesCC's C parser) returns `case` blocks as nested when they don't have a `break` statement: ``` lisp (case testA (case testB (case testC BODY))) ``` Instead of doing this, I decided to flatten the `case` blocks with empty bodies. This way we can deal with the structure in a simpler way. ``` lisp ((case testA (expr-stmt)) (case testB (expr-stmt)) (case testC BODY)) ``` Once this is done, I expanded each `case` block to a jump that jumps over the clause, the clause and then its body. Doing this, the fall-back doesn't re-evaluate the clause, as it doesn't need to. The generated code looks like this in pseudocode: ``` assembly ;; This doesn't have the jump because it's the first CASE1: testA CASE1_BODY: ... goto CASE2_BODY CASE2: testB CASE2_BODY: ... goto CASE3_BODY CASE3: testB CASE3_BODY: ... ``` If one of the `case`s has a `break`, it's treated as part of its body, and it will end the execution of the `switch` statement normally, no fall-back. This results in a simpler `case` block control. The previous approach dealt with nested `case` blocks and tried to be clever about them, but unsuccessfully. The best thing about this commit is most of the cleverness was simply removed with a simple solution (flatten all the things!). It wasn't that easy to implement, but I first built a simple prototype and Janneke's scheme magic made my approach usable in production. All this is added in Mes's codebase in several commits, as we needed some iterations to make it right. [22cbf823582][cases] has the base of this commit, but there were some iterations more in Mes. [cases]: https://github.com/ekaitz-zarraga/mes/commit/22cbf823582e3699b6a21ee0cf74c2dbf0a6a4e9 #### Boostrappable TinyCC problems with GOT {#got} The Global Offset Table is a table that helps with relocatable binaries. Our Bootstrappable TinyCC segfaulted because it was generating an empty GOT. Andrius debugged upstream TinyCC alongside ours and realized there was a missing check in an `if` statement. He fixed it in [f636cf3d4839d1ca][got-commit]. The problem with this kind of errors is TinyCC's codebase is really hard to read. It's a very small compiler but it's not obvious to see how things are done on it, so we had to spend many hours in debugging sessions that went nowhere. If we had a compiler that is easier to read and change, it would be way simpler to fix and we would have had a better experience with it. [got-commit]: https://github.com/ekaitz-zarraga/tcc/commit/f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9 #### Bootstrappable TinyCC generates wrong assembly in conditionals {#wrong-conditionals} We spent a long time debugging a bug I introduced during the backport when I tried to undo some optimization upstream TinyCC applied to comparison operations. Consider the following code: ``` clike if ( x < 8 ) whatever(); else whatever_else(); ``` Our Bootstrappable TinyCC was unable to compile this code correctly, instead, it outputted a code that always took the same branch, regardless of the value in `x`. In TinyCC, a conditional like `if (x < CONSTANT)` has a special treatment, and it's converted to something like this pseudoassembly: ``` pseudo load x to a0 load CONSTANT to a1 set a0 if less than a1 branch if a0 not equal 0 ; Meaning it's `set` ``` This behaviour uses the `a0` register as a flag, emulating what other CPUs use for comparisons. RISC-V doesn't need that, but it's still done here probably for compatibility with other architectures. In RISC-V it could look like this: ``` pseudo load x to a0 load CONSTANT to a1 branch if a0 less than a1 ``` You can easily see the `branch` "instruction" does a different comparison in one case versus the other. In the one in the top it checks if `a0` is set, and in the other checks if `a0` is smaller than `a1`. TinyCC handles this case in a very clever way (maybe too clever?). When they emit the `set a0 if less than a1` instruction they replace the current comparison operation with `not equal` and they remove the `CONSTANT` and replace it with a `0`. That way, when the `branch` instruction is generated, they insert the correct clause. In my code I forgot to replace the comparison operator so the branch checked `if a0 is less than 0` and it was always false, as the `set` operation writes a `0` or a `1` and none of them is less than `0`. The commit [5a0ef8d0628f719][branch-tcc] explains this in a more technical way, using actual RISC-V instructions. This was also a hard to fix, because TinyCC's variable names (`vtop->c.i`) are really weird and they are used for many different purposes. [branch-tcc]: https://github.com/ekaitz-zarraga/tcc/commit/5a0ef8d0628f719ebb01c952797a86a14051228c #### Support for variable length arguments {#varargs} In C you can define functions with variable argument length. In RISC-V, those arguments are sent using registers while in other architectures are sent using the stack. This means the RISC-V case is a little bit more complex to deal with, and needs special treatment. Andrius realized in our Bootsrappable TinyCC we had issues with variable length arguments, specially in the most famous function that uses them: `printf`. He also found that the problem came from the arguments not being properly set and found the problem. Reading upstream TinyCC we found they use a really weird system for the defines that deal with this. They have a header file, `include/tccdefs.h`, which is included in the codebase, but also processed by a tool that generates strings that are later injected at execution time in TinyCC. This was too much for us so we just extracted the simplest variable arguments definitions for RISC-V and introduced that in MesLibC and our Bootstrappable TinyCC. ##### Extra: files generated with no permissions The bootstrappable TinyCC built using MesCC generated files with no permissions and Andrius found that this problem came from the variable length argument support definitions. So he fixed that, too[^stikonas]. The macro that defined `va_start` was broken pointer arithmetic. At the beginning he thought it was related with MesCC's internals but he tested in GCC later and realized the problem was in the macro definition. That's why currently the commit says "workaround" in the name, but it's more than a workaround: it's a proper fix. We are rewording that, but that would happen after we release this post. [^stikonas]: He is like that. #### MesLibC use `signed char` for `int8_t` {#int8} We already had a running Bootstrappable TinyCC compiled using MesCC when we stumbled upon this issue. Somehow, when assembling: ``` asm addi a0, a0, 9 ``` The code was trying to read `9` as a register name, and failed to do it (of course). It was weird to realize that the following code (in `riscv64-asm.c`) was always using the true branch in the `if` statement, even if `asm_parse_regvar` returned `-1`: ``` clike int8_t reg; ... if ((reg = asm_parse_regvar(tok)) != -1) { ... } else ... ``` I disassembled and saw something like this: ``` pseudoassembly call asm_parse_regvar ;; Returns value in a0 reg = a0 a0 = a0 + 1 branch if a0 equals 0 ``` This looks ok, it does some magic with the `-1` but it makes sense anyway. The problem is that it didn't branch because `a0` was `256` even when `asm_parse_regvar` returned `-1`. During some of the `int` related problems someone told me in the Fediverse that `char`'s default signedness is not defined in the C standard. I read MesLibC and, exactly: `int8_t` was defined as an alias to `char`. In RISC-V `char` is by default `unsigned` (don't ask me why) but we are used to x86 where it's `signed` by default. Only saying `char` is not portable. Replacing: ``` clike typedef char int8_t; ``` With: ``` clike typedef signed char int8_t; ``` Fixed the issue. From this you can learn several things: 1. Don't assume `char`'s signedness in C 2. If you design a programming language, be consistent with your decisions. In C `int` is always `signed int`, but `char`'s don't act like that. Don't do this. #### MesLibC Implement `setjmp` and `longjmp` {#jmp} Those that are not that versed in C, as I was before we found this issue, won't know about `setjmp` and `longjmp` but they are, simplifying a lot, like a `goto` you can use in any part of the code. `setjmp` needs a buffer and it stores the state of the program on it, `longjmp` sets the status of the program to the values on the buffer, so it jumps to the position stored in `setjmp`. Both functions are part of the C standard library and they need specific support for each architecture because they need to know which registers are considered part of the state of the program. They need to know how to store the program counter, the return address, and so on, and how to restore them. In their simplest form they are a set of stores in the case of the `setjmp` and a set of loads in the case of `longjmp`. In RISC-V they only need to store the `s*` registers, as they are the ones that are not treated as temporary. It's simple, but it needs to be done, which wasn't in neither for GCC nor for RISC-V in MesLibC. Andrius is not convinced with our commit in here, and I agree with his concerns. We added the full `setjmp` and `longjmp` implementations directly stolen from inspired in the ones in Musl[^stolen] but it has also floating point register support, using instructions that are not implemented in TinyCC yet. This is going to be a problem in the future because later iterations will try to execute instructions they don't actually understand. There are two (or three) possible solutions here. The first is to remove the floating point instructions for now (another flavor for this solution is to hide them under an `#ifdef`). The second is to implement the floating point instructions in TinyCC's RISC-V assembler, which sounds great but forces us to upstream the changes, and that process may take long and we'd need to patch it in our bootstrapping scripts until it happens. We just added the `#ifdef`s because our code is full of them anyway and sent it to Mes: [0e2c5569][setjmp]. [setjmp]: https://github.com/ekaitz-zarraga/mes/commit/0e2c55697df285250c8a24442f169bc52d729c31 [^stolen]: Yo, if it's free software it's not stealing! Please steal my code. Make it better. #### More {#more} Those are mostly the coolest errors we needed to deal with but we stumbled upon a lot of errors more. Before this effort started Andrius added support for 64 bit instructions in Mes and fixed some issues 64bit architectures had in M2. I found a [bug in Guix shell](https://issues.guix.gnu.org/65225) (it's still open) and had to fix some ELF headers in MesCC generated files because objdump and gdb refused to work on them. Andrius also found issues with weak symbols in MesLibC that were triggered because TCC didn't have support for them, thankfully upstream TCC had that issue fixed and we just cherry-picked for the win. He even had the energy to test all this in real RISC-V we specifically acquired for this task. There are many more things to tell, but this is already getting too long and if I continue writing we'll probably end up fixing some stuff more. In the end, a project like this is like hitting your head against a wall until one of them breaks. Sometimes it feels like the head did, but it's all good. #### Reproducing what we did {#reproducing} All we did means nothing if you can't reproduce it. We provide two ways to reproduce this process: live-bootstrap and Guix. Both provide a similar thing but there are some differences from the high-level that is worth mention now. Comparing with `live-bootstrap`, using Guix helps because it reuses the previous steps if they didn't change. This results in shorter waits once Mes is sorted out. On the other hand, I've have had issues with the failed builds in Guix (in emulated systems). It was hard to jump inside the build container and play around inside so the development cycle suffered a lot. In `live-bootstrap`, if you are good with `bwrap` you can jump and tweak things with no issues. For those who enjoy digging in the code and trying to follow the process I recommend following `live-bootstrap`'s scripts. The directory structure is a little bit confusing but the scripts are very plain and linear. The ones in the Guix process come from previous bootstrap efforts and they are designed to do many things automagically, that makes them a hard to follow. ##### Using live-bootstrap {#live-bootstrap} Andrius is part of the `live-bootstrap` effort and he's doing all the scripting there to keep the process reproducible. [Live-bootstrap](https://github.com/fosslinux/live-bootstrap) is... > An attempt to provide a reproducible, automatic, complete end-to-end > bootstrap from a minimal number of binary seeds to a supported fully > functioning operating system. That's the official description of the project. From a more practical perspective, it's a set of scripts that build the whole operating system from scratch, depending on few binary seeds. That's not very different to what Guix provides from a bootstrapping perspective. Guix is "just" an environment where you can run "scripts" (the packages define how they are built) in a reproducible way. Of course, Guix is way more than that, but if we focus on what we are doing right now it acts like the exact same thing. > NOTE: `live-bootstrap`'s project description is a little bit outdated. If you > read the comparison with Guix, what you'd read is old information. If you > want to read a more up-to-date information about Guix's bootstrapping process > I suggest you to read this page of Guix manual: > Being very different projects, in a practical level, the main difference between them is `live-bootstrap` is probably easier for you to test if you are working on any GNU/Linux distribution[^in-guix]. [^in-guix]: If you run it in Guix or in a distribution that doesn't follow FHS you'd probably need to touch the path of your Qemu installation or be careful with the options you send to the `rootfs.py` script. If you want to reproduce this exact point in time you only need to use my fork of [live-bootstrap](https://github.com/ekaitz-zarraga/live-bootstrap/), branch `riscv-tcc-boot`. I also made a tag on it, `self-hosted-tcc-rv64`, to make it easier to remember when was this post released. Andrius made all the magic to set that process to take all the inputs from Mes and TinyCC from the correct tag. Clone the repository, set up the dependencies and run this (if you are not in a RISC-V host you need to configure Qemu and binfmt): ``` bash ./rootfs.py --bwrap --arch riscv64 --preserve ``` That should, after a long time, reach a point where there's a properly compiled bootstrappable TinyCC. #### Using Guix for a reproducible environment {#guix} I made a Guix recipe that can replicate the whole process, too. It took me long time to make it work but it finally does. From my TCC fork reproducing this should be easy for the people versed in Guix. There's a `guix` folder with some files, (most of them broken, not gonna lie) but there are two you should pay attention to: - `channels.scm` stores the state of my Guix checkout so you can reproduce it in the future using `guix time-machine`. At the moment it doesn't feel necessary but if something fails when you try it, please refer to that. - `commencement.scm` is an edited copy of the Guix bootstrapping process, directly obtained from `gnu/packages/commencement.scm` from Guix's codebase. I patched this to make it work for RISC-V, using some more modern commits in the dependencies. In order to reproduce all our work in Guix you just need to build `tcc-boot0` package from the `commencement.scm` file using `riscv64-linux` as your `--system`. I'm a nice guy so I just added a command there you can use for this, just run: ``` bash ./tcc-boot0-from-source.sh ``` And that should build the whole thing. It takes hours, you have been warned. Also it adds `--no-grafts` (thanks Efraim), because if you keep the grafts it compiles the world from scratch (curl, x11... not good). If you just want to build `mes-boot` as an intermediate step, I also made a file for that: ``` bash ./mes-boot-from-source.sh ``` The both scripts will load variables from the `commencement.scm` module provided. The module is not complex if you are used to Guix, but it calls some complex shell scripts in both Mes and TinyCC to build. Those contain all the magic. ### Conclusions {#conclusions} Of course, the problems we fixed now look easy and simple to fix. This blog post doesn't really do justice to the countless debugging hours and all the nights we, Andrius and I, spent thinking about where could the issues be coming from. The debugging setup wasn't as good as you might imagine. The early steps of the bootstrap don't have all the debug symbols as a "normal" userspace program would. In many cases, function names were all we had. I have thank my colleague Andrius here because he did a really good debugging job, and he provided me with small reproducers that I could finally fix. Most of the times he made the assist and I scored the goal. He also did a great job with the testing which I couldn't do because I was struggling with Guix from the early days, trying to make the compilers find the header files and libraries. In the emotional part it is also a great improvement to have someone to rely on. Andrius, Janneke and I had a good teamwork and we supported each other when our faith started to crumble. And believe, it does crumble when a new bug appears after you fixed one that you needed a week for. There were times this summer I thought we would never reach this point. It's also worth mention here that the bootstrapping process is extremely slow: it takes hours. This kills the responsiveness and makes testing way harder than it should be. Not to mention that we are working on a foreign architecture, which has it's own problems too. If you have to take some lesson from something like this, here you have a suggestion list: - The simplest error can take ages to debug if your code is crazy enough. - Don't be clever. It sets a very high standard for your future self and people who will read your code in the future. - I guess we can summarize the previous two points in one: If we could remove TinyCC from the chain, we would. It's a source of errors and it's hard to debug. The codebase is really hard to read for no apparent reason. - When build times are long, small reproducers help. - Add tests for each new case you find. - Don't trust, disassemble and debug. - Be careful with C and standards and undefined behavior. - Integers are hard. Signedness makes them harder. - Being surrounded by the correct people makes your life easier. Also, as a personal note I noticed I'm a better programmer since the previous post in the this series. I feel way more comfortable with complex reasoning and even writing new programs in other languages, even if I spent almost no time coding anything from scratch. It's like dealing with this kind of issues about the internals give you some level of awareness that is useful in a more general way than it looks. Crazy stuff. If you can, try to play with the internals of things from time to time. It helps. At least it helped me. ### What is next? {#next} Now we have a fully featured Bootstrappable TinyCC we need to decide what to do next. On the short term, all this has to be released in the original projects: Mes, M2, and so on. That's the easy part, as everything has proved to be ready. On the mid term, it's not very clear what to do first. We suspect we'll need upstream TinyCC for the next steps, because we many different tools to continue with the bootstrapping chain, and the bootstrappable TinyCC might not be enough to build them. On the other hand, when we go for a standard library we'll miss the extended assembly support we already mentioned. There's some uncertainty in the next step. The long-term is pretty much clear though, the goal is GCC. First GCC for C and then for C++ to make it able build GCC 7.5 which should enable the rest of the chain pretty easily (famous last words). I anticipate we are going to have problems with GCC (I know this because I left them there last time) so we'll need to fix those, too. Once that is done, we would use GCC to compile more recent versions of GCC until we compile the world. That's more or less the description of what we will do in the next months. And this is pretty much it. I hope you learned something new about C, the Bootstrapping process or at least had a good time reading this wall of text. We'll try to work less for the next one, but we can't promise that. 😉 Take care. ---