summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEkaitz Zarraga <ekaitz@elenq.tech>2023-10-17 16:28:39 +0200
committerEkaitz Zarraga <ekaitz@elenq.tech>2023-10-17 16:28:39 +0200
commit2e2265b0b79cce46954f7d55fa2d6eb11e1de4a9 (patch)
tree2b87aac302c0cbf655febe71e053d1d9772d10be
parent8a7a83596487fca91b1eb9a7c3b4552766c39d82 (diff)
WIP: Milestone self-hosted-tcc-riscv
-rw-r--r--content/bootstrapGcc/08_tcc_and_mescc.md1038
1 files changed, 1038 insertions, 0 deletions
diff --git a/content/bootstrapGcc/08_tcc_and_mescc.md b/content/bootstrapGcc/08_tcc_and_mescc.md
new file mode 100644
index 0000000..e49a2e6
--- /dev/null
+++ b/content/bootstrapGcc/08_tcc_and_mescc.md
@@ -0,0 +1,1038 @@
+Title: Milestone — MesCC builds TinyCC and fun C errors for everyone
+Date: 2023-09-27
+Category:
+Tags: Bootstrapping GCC in RISC-V
+Slug: bootstrapGcc8
+Lang: en
+Summary:
+ We spent the last months making MesCC able to compile TinyCC and making the
+ result of that compilation able to compile TinyCC. Many cool problems
+ appeared, this is the summary of our work.
+Status: draft
+
+It's been a while since the latest technical update in the project and I am
+fully aware that you were missing it so it's time to recap with a really cool
+announcement:
+
+<span style="font-size: larger">
+**We finally made a self-hosted Bootstrappable TinyCC in RISC-V**
+</span>
+
+Most of you probably remember I [already backported](bootstrapGcc6.html) the
+Bootstrappable TinyCC compiler, but I didn't test it in a proper environment.
+Now, we can confidently say it is able to compile itself, a "large" program
+that makes use of more complex C features than I did in the tests.
+
+All this work was done by Andrius Štikonas and myself. Janneke helped us a lot
+with Mes related parts, too. The work this time was pretty hard, honestly. Most
+of the things we did here are not obvious, even for C programmers.
+
+I'm not used to this kind of quirks of the C language. Most of them are really
+specific, related with the standards and many others are just things were
+missing. I hope the ones I chose to discuss here help you understand your
+computing better, as they did to me.
+
+This is going to be veery long post, so take a ToC to help you out:
+
+1. [Context](#context)
+ 1. [Why is this important?](#why-important)
+2. [Problems fixed](#problems)
+ 1. [TinyCC misses assembly instructions needed for MesLibC](#tinycc-missing-instructions)
+ 2. [TinyCC's assembly syntax is weird](#tcc-assembly)
+ 3. [TinyCC does not support Extended Asm in RV64](#extended-assembly)
+ 4. [MesLibC `main` function arguments are not set properly](#main-args)
+ 5. [TinyCC says `__global_pointer$` is not a valid symbol](#dollars)
+ 6. [Bootstrappable TinyCC's casting issues](#tcc-casting-issues)
+ 7. [Bootstrappable TinyCC's `long double` support was missing](#long-double)
+ 8. [MesCC struct initialization issues](#mescc-struct-init)
+ 9. [MesCC vs TinyCC size problems](#size-problems)
+ 10. [MesCC add support for signed rotation](#mes-signed-rotation)
+ 11. [MesCC switch/case falls-back to default case](#broken-case)
+ 12. [Boostrappable TinyCC problems with GOT](#got)
+ 13. [Bootstrappable TinyCC generates wrong assembly in conditionals](#wrong-conditionals)
+ 14. [Support for variable length arguments](#varargs)
+ 15. [MesLibC use `signed char` for `int8_t`](#int8)
+ 16. [MesLibC Implement `setjmp` and `longjmp`](#jmp)
+ 17. [More](#more)
+3. [Reproducing what we did](#reproducing)
+ 1. [Using live-bootstrap](#live-bootstrap)
+ 1. [Using Guix](#guix)
+4. [Conclusions](#conclusions)
+5. [What is next?](#next)
+
+
+### Context {#context}
+
+You have many blogposts in the series to find the some context about the
+project, and even a FOSDEM talk about it, but they all give a very broad
+explanation, so let's focus on what we are doing right now.
+
+Here we have Mes, a Scheme interpreter, that runs MesCC, a C compiler, that is
+compiling our simplified for of TinyCC, let's call that Bootstrappable TinyCC.
+That Bootstrappable TinyCC compiler then tries to compile its own code. It
+compiles it's own code because it's goal is to add more flags in each
+compilation, so it has more features in each round. We do all this because
+TinyCC is way faster than MesCC and it's also more complex, but MesCC is only
+able to build a simple TinyCC with few features enabled.
+
+During all this process we use a standard library provided by the Mes project,
+we'll call it MesLibC, because we can't build glibc at this point, and TinyCC
+does not provide it's own C standard library.
+
+With all this well understood, this is the achievement:
+
+**We made MesCC able to compile the Bootstrappable TinyCC, using MesLibC, to an
+executable that is able to compile the Bootstrappable TinyCC's codebase to a
+binary that works.**[^self-hosted]
+
+[^self-hosted]: So it can compile itself again an again, but who would want to
+ do that?
+
+The process affected all the pieces in the system. We added changes in MesCC,
+MesLibC and the Bootstrappable TinyCC.
+
+#### Why is this important? {#why-important}
+
+We already talked long about the bootstrapping issue, the trusting trust attack
+and all that. I won't repeat that here. What I'll do instead is to be specific.
+This step is a big thing because this allows us to go way further in the chain.
+
+All the steps before Mes were already ported to RISC-V mostly thanks to Andrius
+Štikonas who worked in [Stage0-POSIX][stage0] and the rest of glue projects
+that are needed to reach Mes.
+
+[stage0]: https://github.com/oriansj/stage0-posix
+
+Mes had been ported to RISC-V (64 bit) by W. J. van der Laan, and some patches
+were added on top of it by Andrius Štikonas himself before our current effort
+started.
+
+At this moment in time, Mes was unable to build our bootstrappable TinyCC in
+RISC-V, the next step in the process, and the bootstrappable TinyCC itself was
+unable to build itself either. This was a very limiting point, because TinyCC
+is the first "proper" C compiler in the chain.
+
+When I say "proper" I mean fast and fully featured as a C compiler. In x86,
+TinyCC is able to compile old versions of GCC. If we manage to port it to
+RISC-V we will eventually be able to build GCC with it and with that the world.
+
+In summary, TinyCC is a key step in the bootstrapping chain.
+
+
+### Problems fixed {#problems}
+
+This work can be easily followed in the commits in my TCC fork's
+[`riscv-mes`][tcc] branch, and in my Mes clone's [`ekaitz`][mes] branch. Most
+of the commits are already merged, but we leave that reference for people to be
+able to follow the development easier. We are also identifying the contents of
+this blogpost in the git history by adding the git tag `self-hosted-tcc-rv64`
+to both of my forks.
+
+[tcc]: https://github.com/ekaitz-zarraga/tcc/tree/riscv-mes
+[mes]: https://github.com/ekaitz-zarraga/mes/tree/ekaitz
+
+Many commits have a long message you can go read there, but this post was born
+to summarize the most interesting changes we did, and write them in a more
+digestible way. Lets see if I manage to do that.
+
+The following list is not ordered in any particular way, but we hope the
+selection of problems we found is interesting for you. We found some errors
+more, but these are the ones we consider more relevant.
+
+
+#### TinyCC misses assembly instructions needed for MesLibC {#tinycc-missing-instructions}
+
+TinyCC is not like GCC, TinyCC generates binary code directly, no assembly code
+in between. TinyCC has a separate assembler that doesn't follow the path that C
+code follows.
+
+It works the same in all architectures, but we can take RISC-V as an example:
+
+TinyCC has `riscv64-gen.c` which generates the binary files, but
+`riscv64-asm.c` file parses assembly code and also generates binary. As you can
+see, binary generation is somehow duplicated.
+
+In the RISC-V case, the C part had support for mostly everything since my
+backport, but the assembler did not support many instructions (which, by the
+way are supported by the C part).
+
+MesLibC's `crt1.c` is written in assembly code. Its goal is to prepare the
+`main` function and call it. For that it needs to call `jalr` instruction and
+others that were not supported by TinyCC, neither upstream nor our
+bootstrappable fork.
+
+These changes appear in several commits because I didn't really understood how
+the TinyCC assembler worked, and some instructions need to use relocations
+which I didn't know how to add. The following commit can show how it feels to
+work on this, and shares how relocations are done:
+
+[1e597f3d239d9119d2ea4bb3ca29b587ea594dcc][lla-commit]
+
+[lla-commit]: https://github.com/ekaitz-zarraga/tcc/commit/1e597f3d239d9119d2ea4bb3ca29b587ea594dcc
+
+There you can see we started to understand things in TinyCC, but some other
+changes came after this.
+
+A very important not here is upstream TinyCC does not have support for these
+instructions yet so we need to patch upstream TinyCC when we use it, contribute
+the changes or find other kind of solutions. Each solution has its downsides
+and upsides, so we need to take a decision about this later.
+
+
+#### TinyCC's assembly syntax is weird {#tcc-assembly}
+
+Following with the previous fix, TinyCC does not support GNU-Assembler's syntax
+in RISC-V. It uses a simplified assembly syntax instead.
+
+When we would do:
+
+``` asm
+sd s1, 8(a0)
+```
+
+In TinyCC's assembly we have to do:
+
+``` asm
+sd a0, s1, 8
+```
+
+This requires changes in MesLibC, and it makes us create a separate folder for
+TinyCC in MesLibC. See `lib/riscv64-mes-tcc/` and `lib/linux/riscv64-mes-tcc`
+for more details.
+
+#### TinyCC does not support Extended Asm in RV64 {#extended-assembly}
+
+Way later in time we also found TinyCC does not support [Extended Asm][ext-asm]
+in RV64. The functions that manage that are simply empty.
+
+[ext-asm]: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
+
+We spent some time until we realized what was going on in here for two reasons.
+First, there are few cases of Extended Asm in the code we were compiling.
+Second, it was failing silently.
+
+Extended Asm is important because it lets you tell the compiler you are going
+to touch some registers in the assembly block, so it can protect variables and
+apply optimizations properly.
+
+In our case, our assembly blocks were clobbering some variables that would have
+been protected by the compiler if the Extended Asm support was implemented.
+
+Andrius found all the places in MesLibC where Extended Asm was used and rewrote
+the assembly code to keep variables safe in the cases it was needed. See
+[b5eb0e34c6fc76a4558940e43ac78cc8a63ebac1][extended-asm] in Mes.
+
+[extended-asm]: https://github.com/ekaitz-zarraga/mes/commit/b5eb0e34c6fc76a4558940e43ac78cc8a63ebac1
+
+The other option was to add Extended Asm support for TinyCC, but we would need
+to add it in the Bootstrappable TinyCC and also upstream. This also means
+understanding TinyCC codebase very well and making the changes without errors,
+so we decided to simplify MesLibC, because that is easier to make right.
+
+#### MesLibC `main` function arguments are not set properly {#main-args}
+
+Following the previous problem with assembly, we later found input arguments of
+the `main` function, that come from the command line arguments, were not
+properly set by our MesLibC. Andrius also took care of that in
+[267a132ca932dafe628da000dc76714612cce144][main-ext] in Mes.
+
+[main-ext]: https://github.com/ekaitz-zarraga/mes/commit/267a132ca932dafe628da000dc76714612cce144
+
+This error was easier to find than others because when we found issues with
+this we already had a compiled TinyCC. So we just needed to fix simple things
+around it.
+
+
+#### TinyCC says `__global_pointer$` is not a valid symbol {#dollars}
+
+This is a small issue that was a headache for a while, but it happened to be a
+very simple issue.
+
+In RISC-V there's a symbol, `__global_pointer$`, that is used for dynamic
+linking, defined in the ABI. But TinyCC had issues to parse code around it and
+it took us some time to realize it was the dollar sign (`$`) which was causing
+the issues in this point.
+
+TinyCC does not process dollars in identifiers unless you specifically set a
+flag (`-fdollars-in-identifiers`) when running it. In the RISC-V case, that
+flag must be always active because if it isn't the `__global_pointer$` can't be
+processed.
+
+We tried to set that flag in the command line but we had other issues in the
+command line argument parsing (we found and fixed them later later) so we just
+hardcoded it.
+
+This issue is interesting because it's an extremely simple problem, but its
+effect appears in weird ways and it's not always easy to know where the problem
+is coming from.
+
+
+#### Bootstrappable TinyCC's casting issues {#tcc-casting-issues}
+
+This one was a really hard one to fix.
+
+When running our Bootstrappable TinyCC to build MesLibC we found this error:
+
+``` nothing
+ cannot cast from/to void
+```
+
+We managed to isolate a piece of C code that was able to replicate the
+problem.[^reproducer]
+
+``` clike
+long cast_charp_to_long (char const *i)
+{
+ return (long)i;
+}
+
+long cast_int_to_long (int i)
+{
+ return (long)i;
+}
+
+long cast_voidp_to_long (void const *i)
+{
+ return (long)i;
+}
+
+void main(int argc, char* argv[]){
+ return;
+}
+```
+
+Compiling this file raised the same issue, but then I realized I could remove
+two of the functions on the top and the error didn't happen. Adding one of
+those functions back raised the error again.
+
+I tried to change the order of the functions and the functions I chose to add,
+and I could reproduce it: if there were two functions it failed but it could
+build with only one.
+
+Andrius found that the function type was not properly set in the RISC-V code
+generation and its default value was `void`, so it only failed when it compiled
+the second function.
+
+Knowing that, we could take other architectures as a reference to fix this, and
+so we did.
+
+See [6fbd17852aa11a2d0bc047183efaca4ff57ab80c][tcc-casting-commit].
+
+[tcc-casting-commit]: https://github.com/ekaitz-zarraga/tcc/commit/6fbd17852aa11a2d0bc047183efaca4ff57ab80c
+
+[^reproducer]: This is how we managed to fix most of the problems in our code:
+ make a small reproducer we can test separately so we can inspect the
+ process and the result easily.
+
+
+#### Bootstrappable TinyCC's `long double` support was missing {#long-double}
+
+When I backported the RISC-V support to our Bootstrappable TinyCC I missed the
+`long double` support and I didn't realize that because I never tested large
+programs with it.
+
+The C standard doesn't define a size for `long double` (it just says it has to
+be at least as long as the `double`), but its size is normally set to 16 bytes.
+All this is weird in RV64, because it doesn't have 16 byte size registers. It
+needs some extra support.
+
+Before we fixed this, the following code:
+
+``` clike
+long double f(int a){
+ return a;
+}
+```
+
+Failed with:
+
+``` nothing
+ riscv64-gen.c:449 (`assert(size == 4 || size == 8)`)
+```
+
+Because it was only expecting to use `double`s (8 bytes) or `float`s (4 bytes).
+
+In upstream TinyCC there were some commits that added `long double` support
+using, and I quote, a *mega hack*, so I just copied that support to our
+Bootstrappable TinyCC.
+
+See [a7f3da33456b4354e0cc79bb1e3f4c665937395b][tcc-long-double].
+
+[tcc-long-double]: https://github.com/ekaitz-zarraga/tcc/commit/a7f3da33456b4354e0cc79bb1e3f4c665937395b
+
+After this commit, some extra problems appeared with some missing symbols. But
+these errors were link-time problems, because TinyCC had the floating point
+helper functions needed for RISC-V defined in `lib/lib-arm64.c`, because they
+were reusing aarch64 code for them.
+
+After this, we also compile and link `lib-arm64.c` and we have `long double`
+support.
+
+#### MesCC struct initialization issues {#mescc-struct-init}
+
+This one was a lot of fun. Our Bootstrappable TinyCC exploded with random
+issues: segfaults, weird branch decisions...
+
+After tons of debugging Andrius found some values in `struct`s were not set
+properly. As we don't really know TinyCC's codebase really well, that was hard
+to follow and we couldn't really know where was the value coming from.
+
+Andrius finally realized some `struct`s were not initialized properly. Consider
+this example:
+
+``` clike
+typedef struct {
+ int one;
+ int two;
+} Thing;
+
+Thing a = {0};
+```
+
+That's supposed to initialize *all* fields in the `Thing` `struct` to `0`,
+according to the C standard[^cppref].
+
+As a first solution we set struct fields manually to `0`, to make sure they
+were initialized properly. See
+[29ac0f40a7afba6a2d055df23a8ee2ee2098529e][tinycc-struct-0]
+
+[tinycc-struct-0]: https://github.com/ekaitz-zarraga/tcc/commit/29ac0f40a7afba6a2d055df23a8ee2ee2098529e
+
+After some debugging we found that the fields that were not explicitly set were
+initialized to `22`. So I decided to go to MesCC and see if the struct
+initialization was broken.
+
+This was my first dive in MesCC's code, and I have to say it's really easy to
+follow. It took me some time to read through it because I'm not that used to
+`match`, but I managed to find the struct initialization code.
+
+What I found in MesCC is there was a `22` hardcoded in the struct
+initialization code, probably coming from some debug code that never was
+removed. As no part of the x86 bootstrapping used that kind of initializations,
+or nothing relied on them, the error went unnoticed.
+
+I set that to `0`, as it should be, and continued with our life.
+
+[^cppref]: You can see an explanation in the (1) case at
+ [cppreference.com](https://en.cppreference.com/w/c/language/struct_initialization)
+
+
+#### MesCC vs TinyCC size problems {#size-problems}
+
+The C standard does not set a size for integers. It only sets relative sizes:
+`short` has to be shorter or equal to `int`, `int` has to be shorter or equal
+to a `long`, and so on. If you platform wants, all the integers, including the
+`char`s can have 8 bits, and that's ok for the C standard.
+
+TinyCC's RISC-V backed was written under the assumption that `int` is 32 bit
+wide. You can see this happening in `riscv64-gen.c`, for example, here:
+
+``` clike
+ EI(0x13, 0, rr, rr, (int)pi << 20 >> 20); // addi RR, RR, lo(up(fc))
+```
+
+The rotation there is done to clear the upper 20 bits of the pi variable. This
+code's behavior might be different from one platform to another. Taking the
+example before, of that possible platform that only has 8 bit integers, this
+code would send a `0` instead of the lower 12 bits of `pi`.
+
+In our case, we had MesCC using the whole register width, 64bits, for temporary
+values so the lowest `44` bits were left and the next assertion that checked
+the immediate was less than 12 bits didn't pass.
+
+This is a huge problem, as most of the code in the RISC-V generation is written
+using this style.
+
+There are other ways to do the same thing (`pi & 0xFFF` maybe?) in a more
+portable way, but we don't know why upstream TinyCC decided to do it this way.
+Probably they did because GCC (and TinyCC itself) use 32 bit integers, but they
+didn't handle other possible cases, like the one we had here with MesCC.
+
+In any case, this made us rethink MesCC, dig on how are its integers defined,
+how to change this to be compatible with TinyCC and so on, but I finally
+decided to add casts in the middle to make sure all this was compiled as
+expected.
+
+It was a good reason to make us re-think MesCC's integers, but it took a very
+long time to deal with this, that could be better used in something else. Now,
+we all became paranoids about integers and we still think some extra errors
+will arise from them in the future. Integers are hard.
+
+
+#### MesCC add support for signed rotation {#mes-signed-rotation}
+
+Integers were in our minds for long, as described in the previous block, but I
+didn't talk about signedness in that one.
+
+Following one of the crazy errors we had in TinyCC, I somehow realized (I don't
+remember how!) that we were missing signed rotation support in MesCC. I think
+that I found this while doing some research of the code MesCC was outputting
+when I spotted some rotations done using unsigned instructions for signed
+values and I started digging in MesCC to find out why. I finally realized that
+there was no support for that and the rotation operation wasn't selected
+depending on the signedness of the value being rotated.
+
+Let's see this with an example:
+
+``` clike
+signed char a = 0xF0;
+unsigned char b = 0xF0;
+
+// What is this? (Answer: 0xFF => 255)
+a >> 4;
+
+// And this? (Answer: 0x0F => 15)
+b >> 4;
+```
+
+In the example you can see the rotation operation does not work the same way if
+the value is signed or not. If you always use the unsigned version of the `>>`
+operation, you don't have the results you expected. Signs are also hard.
+
+In this case, like in many others, the fix was easier than realizing what was
+going wrong. I just added support for the signed rotation operation, not only
+for RISC-V but for all architectures, and I added the correct signedness check
+to the rotation operation to select the correct instruction. The patch (see
+[c0c2556c2b2897814a87b8bdfa6997f79c218eeb][signed-rotation] in Mes) is very
+clean and easy to read, because MesCC's codebase is really well ordered.
+
+[signed-rotation]: https://github.com/ekaitz-zarraga/mes/commit/c0c2556c2b2897814a87b8bdfa6997f79c218eeb
+
+
+#### MesCC switch/case falls-back to default case {#broken-case}
+
+In the early bootstrap runs, our Bootstrappable TinyCC it did weird things.
+After many debugging sessions we realized the `switch` statements in
+`riscv64-gen.c`, more specifically in `gen_opil`, were broken. The fall-backs
+in the `switch` were automatically directed to the `default` case. Weird!
+
+MesCC has many tests so I read all that were related with the `switch`
+statements and the ones that handled the fall-backs were all falling-back to
+the `default` case, so our weird behavior wasn't tested.
+
+I added the tests for our case and read the disassemble of simple examples when
+I realized the problem.
+
+Each of the `case` blocks has two parts: the clause that checks if the value
+of the expression is the one of the case, and the body of the case itself.
+
+The `switch` statement generation was doing some magic to deal with `case`
+blocks, but it was failing to deal with complex fall-through schemes because
+the clause of the target `case` block was always run, making the code fall to
+the `default` case, as the clause was always false because the one that matched
+was the one that made the fall-back.
+
+There were some problems to fix this, as NyaCC (MesCC's C parser) returns
+`case` blocks as nested when they don't have a `break` statement:
+
+``` lisp
+(case testA
+ (case testB
+ (case testC BODY)))
+```
+
+Instead of doing this, I decided to flatten the `case` blocks with empty
+bodies. This way we can deal with the structure in a simpler way.
+
+``` lisp
+((case testA (expr-stmt))
+ (case testB (expr-stmt))
+ (case testC BODY))
+```
+
+Once this is done, I expanded each `case` block to a jump that jumps over the
+clause, the clause and then its body. Doing this, the fall-back doesn't
+re-evaluate the clause, as it doesn't need to. The generated code looks like
+this in pseudocode:
+
+``` assembly
+ ;; This doesn't have the jump because it's the first
+CASE1:
+ testA
+CASE1_BODY:
+ ...
+
+ goto CASE2_BODY
+CASE2:
+ testB
+CASE2_BODY:
+ ...
+
+ goto CASE3_BODY
+CASE3:
+ testB
+CASE3_BODY:
+ ...
+```
+
+If one of the `case`s has a `break`, it's treated as part of its body, and it
+will end the execution of the `switch` statement normally, no fall-back.
+
+This results in a simpler `case` block control. The previous approach dealt
+with nested `case` blocks and tried to be clever about them, but
+unsuccessfully. The best thing about this commit is most of the cleverness was
+simply removed with a simple solution (flatten all the things!).
+
+It wasn't that easy to implement, but I first built a simple prototype and
+Janneke's scheme magic made my approach usable in production.
+
+All this is added in Mes's codebase in several commits, as we needed some
+iterations to make it right. [f75cf7bfb911868023732bf4274978069b98849a][cases]
+has the base of this commit, but there were some iterations more in Mes.
+
+[cases]: https://github.com/ekaitz-zarraga/mes/commit/f75cf7bfb911868023732bf4274978069b98849a
+
+
+#### Boostrappable TinyCC problems with GOT {#got}
+
+The Global Offset Table is a table that helps with relocatable binaries. Our
+Bootstrappable TinyCC segfaulted because it was generating an empty GOT.
+
+Andrius debugged upstream TinyCC alongside ours and realized there was a
+missing check in an `if` statement. He fixed it in
+[f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9][got-commit].
+
+The problem with this kind of errors is TinyCC's codebase is really hard to
+read. It's a very small compiler but it's not obvious to see how things are
+done on it, so we had to spend many hours in debugging sessions that went
+nowhere. If we had a compiler that is easier to read and change, it would be
+way simpler to fix and we would have had a better experience with it.
+
+[got-commit]: https://github.com/ekaitz-zarraga/tcc/commit/f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9
+
+#### Bootstrappable TinyCC generates wrong assembly in conditionals {#wrong-conditionals}
+
+We spent a long time debugging a bug I introduced during the backport when I
+tried to undo some optimization upstream TinyCC applied to comparison
+operations.
+
+Consider the following code:
+
+``` clike
+if ( x < 8 )
+ whatever();
+else
+ whatever_else();
+```
+
+Our Bootstrappable TinyCC was unable to compile this code correctly, instead,
+it outputted a code that always took the same branch, regardless of the value
+in `x`.
+
+In TinyCC, a conditional like `if (x < CONSTANT)` has a special treatment, and
+it's converted to something like this pseudoassembly:
+
+``` pseudo
+load x to a0
+load CONSTANT to a1
+set a0 if less than a1
+branch if a0 not equal 0 ; Meaning it's `set`
+```
+
+This behaviour uses the `a0` register as a flag, emulating what other CPUs
+use for comparisons. RISC-V doesn't need that, but it's still done here
+probably for compatibility with other architectures. In RISC-V it could look
+like this:
+
+``` pseudo
+load x to a0
+load CONSTANT to a1
+branch if a0 less than a1
+```
+
+You can easily see the `branch` "instruction" does a different comparison in
+one case versus the other. In the one in the top it checks if `a0` is set,
+and in the other checks if `a0` is smaller than `a1`.
+
+TinyCC handles this case in a very clever way (maybe too clever?). When they
+emit the `set a0 if less than a1` instruction they replace the current
+comparison operation with `not equal` and they remove the `CONSTANT` and
+replace it with a `0`. That way, when the `branch` instruction is generated,
+they insert the correct clause.
+
+In my code I forgot to replace the comparison operator so the branch checked
+`if a0 is less than 0` and it was always false, as the `set` operation writes
+a `0` or a `1` and none of them is less than `0`.
+
+The commit [5a0ef8d0628f719ebb01c952797a86a14051228c][branch-tcc] explains this
+in a more technical way, using actual RISC-V instructions.
+
+This was also a hard to fix, because TinyCC's variable names (`vtop->c.i`) are
+really weird and they are used for many different purposes.
+
+[branch-tcc]: https://github.com/ekaitz-zarraga/tcc/commit/5a0ef8d0628f719ebb01c952797a86a14051228c
+
+
+#### Support for variable length arguments {#varargs}
+
+In C you can define functions with variable argument length. In RISC-V, those
+arguments are sent using registers while in other architectures are sent using
+the stack. This means the RISC-V case is a little bit more complex to deal
+with, and needs special treatment.
+
+Andrius realized in our Bootsrappable TinyCC we had issues with variable length
+arguments, specially in the most famous function that uses them: `printf`. He
+also found that the problem came from the arguments not being properly set and
+found the problem.
+
+Reading upstream TinyCC we found they use a really weird system for the defines
+that deal with this. They have a header file, `include/tccdefs.h`, which is
+included in the codebase, but also processed by a tool that generates strings
+that are later injected at execution time in TinyCC.
+
+This was too much for us so we just extracted the simplest variable arguments
+definitions for RISC-V and introduced that in MesLibC and our Bootstrappable
+TinyCC.
+
+##### Extra: files generated with no permissions
+
+There might be more problems with this though, we need to tackle in the future.
+The bootstrappable TinyCC built using MesCC generates files with no
+permissions and Andrius found that this problem comes from the argument
+handling in the `open` system call in MesLibC. It's not a big deal at the
+moment, because the next iteration of TinyCC uses correct permissions. We can
+just `chmod` the file manually, but we'll probably fix it anyway.
+
+
+#### MesLibC use `signed char` for `int8_t` {#int8}
+
+We already had a running Bootstrappable TinyCC compiled using MesCC when we
+stumbled upon this issue. Somehow, when assembling:
+
+``` asm
+addi a0, a0, 9
+```
+
+The code was trying to read `9` as a register name, and failed to do it (of
+course). It was weird to realize that the following code (in `riscv64-asm.c`)
+was always using the true branch in the `if` statement, even if
+`asm_parse_regvar` returned `-1`:
+
+``` clike
+int8_t reg;
+...
+if ((reg = asm_parse_regvar(tok)) != -1) {
+ ...
+} else ...
+```
+
+I disassembled and saw something like this:
+
+``` pseudoassembly
+call asm_parse_regvar ;; Returns value in a0
+reg = a0
+a0 = a0 + 1
+branch if a0 equals 0
+```
+
+This looks ok, it does some magic with the `-1` but it makes sense anyway. The
+problem is that it didn't branch because `a0` was `256` even when
+`asm_parse_regvar` returned `-1`.
+
+During some of the `int` related problems someone told me in the Fediverse that
+`char`'s default signedness is not defined in the C standard. I read MesLibC
+and, exactly: `int8_t` was defined as an alias to `char`.
+
+In RISC-V `char` is by default `unsigned` (don't ask me why) but we are used to
+x86 where it's `signed` by default. Only saying `char` is not portable.
+
+Replacing:
+
+``` clike
+typedef char int8_t;
+```
+
+With:
+
+``` clike
+typedef signed char int8_t;
+```
+
+Fixed the issue.
+
+From this you can learn several things:
+
+1. Don't assume `char`'s signedness in C
+2. If you design a programming language, be consistent with your decisions. In
+ C `int` is always `signed int`, but `char`'s don't act like that. Don't do
+ this.
+
+#### MesLibC Implement `setjmp` and `longjmp` {#jmp}
+
+Those that are not that versed in C, as I was before we found this issue, won't
+know about `setjmp` and `longjmp` but they are, simplifying a lot, like a
+`goto` you can use in any part of the code. `setjmp` needs a buffer and it
+stores the state of the program on it, `longjmp` sets the status of the program
+to the values on the buffer, so it jumps to the position stored in `setjmp`.
+
+Both functions are part of the C standard library and they need specific
+support for each architecture because they need to know which registers are
+considered part of the state of the program. They need to know how to store the
+program counter, the return address, and so on, and how to restore them.
+
+In their simplest form they are a set of stores in the case of the `setjmp` and
+a set of loads in the case of `longjmp`.
+
+In RISC-V they only need to store the `s*` registers, as they are the ones that
+are not treated as temporary. It's simple, but it needs to be done, which
+wasn't in neither for GCC nor for RISC-V in MesLibC.
+
+Andrius is not convinced with our commit in here, and I agree with his
+concerns. We added the full `setjmp` and `longjmp` implementations directly
+<del>stolen from</del> inspired in the ones in Musl[^stolen] but it has also
+floating point register support, using instructions that are not implemented in
+TinyCC yet. This is going to be a problem in the future because later
+iterations will try to execute instructions they don't actually understand.
+
+There are two (or three) possible solutions here. The first is to remove the
+floating point instructions for now (another flavor for this solution is to
+hide them under an `#ifdef`). The second is to implement the floating point
+instructions in TinyCC's RISC-V assembler, which sounds great but forces us to
+upstream the changes, and that process may take long and we'd need to patch it
+in our bootstrapping scripts until it happens.
+
+We'll think about it, that's why the commit is marked as a WIP:
+[42cb302c857fecafde6f27a8311531d606d15feb][setjmp].
+
+[setjmp]: https://github.com/ekaitz-zarraga/mes/commit/42cb302c857fecafde6f27a8311531d606d15feb
+
+[^stolen]: Yo, if it's free software it's not stealing! Please steal my code.
+ Make it better.
+
+
+#### More {#more}
+
+Those are mostly the coolest errors we needed to deal with but we stumbled upon
+a lot of errors more.
+
+Before this effort started Andrius added support for 64 bit instructions in Mes
+and fixed some issues 64bit architectures had in M2.
+
+I found a [bug in Guix shell](https://issues.guix.gnu.org/65225) (it's still
+open) and had to fix some ELF headers in MesCC generated files because objdump
+and gdb refused to work on them.
+
+Also, while I was writing this lines Andrius fixed the x86 bootstrapping, which
+I broke when the backporting process started.
+
+In the end, a project like this is like hitting your head against a wall until
+one of them breaks. Sometimes it feels like the head did.
+
+#### Reproducing what we did {#reproducing}
+
+> TODO
+
+##### Using live-bootstrap {#live-bootstrap}
+
+Andrius is part of the `live-bootstrap` effort and he's doing all the scripting
+there to keep the process reproducible.
+
+[Live-bootstrap](https://github.com/fosslinux/live-bootstrap) is...
+
+> An attempt to provide a reproducible, automatic, complete end-to-end
+> bootstrap from a minimal number of binary seeds to a supported fully
+> functioning operating system.
+
+That's the official description of the project. From a more practical
+perspective, it's a set of scripts that build the whole operating system from
+scratch, depending on few binary seeds.
+
+That's not very different to what Guix provides from a bootstrapping
+perspective. Guix is "just" an environment where you can run "scripts" (the
+packages define how they are built) in a reproducible way. Of course, Guix is
+way more than that, but if we focus on what we are doing right now it acts like
+the exact same thing.
+
+> NOTE: `live-bootstrap`'s project description is a little bit outdated. If you
+> read the comparison with Guix, what you'd read is old information. If you
+> want to read a more up-to-date information about Guix's bootstrapping process
+> I suggest you to read this page of Guix manual:
+> <https://guix.gnu.org/manual/devel/en/html_node/Full_002dSource-Bootstrap.html>
+
+Being very different projects, in a practical level, the main difference
+between them is `live-bootstrap` is probably easier for you to test if you are
+working on any GNU/Linux distribution[^in-guix].
+
+[^in-guix]: If you run it in Guix or in a distribution that doesn't follow FHS
+ you'd probably need to touch the path of your Qemu installation or be
+ careful with the options you send to the `rootfs.py` script.
+
+If you want to reproduce this exact point in time you only need to use my fork
+of `live-bootstrap` you can find HERE, jump to the `self-hosted-tcc-rv64` tag
+and run it. Andrius made all the magic to set that process to take all the
+inputs from Mes and TinyCC from the correct tag. We'll leave that there for
+future reference.
+
+> TODO
+
+#### Using Guix for a reproducible environment {#guix}
+
+Over what I just mentioned, there's another big difference between
+live-bootstrap and Guix: I am the one making the Guix package for this.
+
+> TODO
+
+### Conclusions {#conclusions}
+
+Of course, the problems we fixed now look easy and simple to fix. This blog
+post doesn't really do justice to the countless debugging hours and all the
+nights we, Andrius and I, spent thinking about where could the issues be
+coming from.
+
+The debugging setup wasn't as good as you might imagine. The early steps of the
+bootstrap don't have all the debug symbols as a "normal" userspace program
+would. In many cases, function names were all we had.
+
+I have thank my colleague Andrius here because he did a really good debugging
+job, and he provided me with small reproducers that I could finally fix. Most
+of the times he made the assist and I scored the goal.
+
+He also did a great job with the testing which I couldn't do because I was
+struggling with Guix from the early days, trying to make the compilers find the
+header files and libraries.
+
+In the emotional part it is also a great improvement to have someone to rely
+on. Andrius, Janneke and I had a good teamwork and we supported each other when
+our faith started to crumble. And believe, it does crumble when a new bug
+appears after you fixed one that you needed a week for. There were times this
+summer I thought we would never reach this point.
+
+It's also worth mention here that the bootstrapping process is extremely slow:
+it takes hours. This kills the responsiveness and makes testing way harder than
+it should be. Not to mention that we are working on a foreign architecture,
+which has it's own problems too.
+
+If you have to take some lesson from something like this, here you have a
+suggestion list:
+
+- The simplest error can take ages to debug if your code is crazy enough.
+- Don't be clever. It sets a very high standard for your future self and people
+ who will read your code in the future.
+- I guess we can summarize the previous two points in one: If we could remove
+ TinyCC from the chain, we would. It's a source of errors and it's hard to
+ debug. The codebase is really hard to read for no apparent reason.
+- When build times are long, small reproducers help.
+- Add tests for each new case you find.
+- Don't trust, disassemble and debug.
+- Be careful with C and standards and undefined behavior.
+- Integers are hard. Signedness makes them harder.
+- Being surrounded by the correct people makes your life easier.
+
+Also, as a personal note I noticed I'm a better programmer since the previous
+post in the this series. I feel way more comfortable with complex reasoning and
+even writing new programs in other languages, even if I spent almost no time
+coding anything from scratch. It's like dealing with this kind of issues about
+the internals give you some level of awareness that is useful in a more general
+way than it looks. Crazy stuff.
+
+If you can, try to play with the internals of things from time to time. It
+helps. At least it helped me.
+
+### What is next? {#next}
+
+In the short-term, we need to decide what to do with the `setjmp` fix and
+include it in MesLibC. After that we need to fix `va_args` in MesCC, for that
+error with the permissions in the output files and fix the floating point
+numbers in RV64 in TinyCC.
+
+Once that is done, now in the mid-term, we would be able to compile a fully
+featured Bootstrappable TinyCC. With that and some fixes in MesLibC, we would
+be able to compile upstream TinyCC. We need to fix any error we find there and
+until it is ready for GCC.
+
+Now in the long-term, we are going to have problems with GCC so we'll need to
+fix those, too. Once that is done, we would use GCC to compile more recent
+versions of GCC until we compile the world.
+
+That's more or less the description of what we will do in the next months.
+
+Meanwhile, we'll need to test this on real hardware we specifically acquired
+for this task. It's slow, but it should be enough for these tests.
+
+And this is pretty much it. I hope you learned something new about C, the
+Bootstrapping process or at least had a good time reading this wall of text.
+
+We'll try to work less for the next one, but we can't promise that. 😉
+
+Take care.
+
+
+---
+
+<!--
+
+MANY OF THIS ARE REALLY HARD TO REASON ABOUT!!!!
+WITH THIS WE START PASSING MANY MORE TESTS IN MESCC AND ALSO ADDED SOME EXTRA
+TESTS THAT CHECK COMPLEX BEHAVIOR HERE AND THERE
+
+- `int`s are 64 bit in MesCC and TinyCC is written like they are 32 bit.
+
+- TinyCC's assembly for RISC-V is not complete and we need some of that in
+ meslibc. We implemented the missing instructions (jal, jalr, lla and some
+ pseudoinstructions).
+
+- TinyCC's assembler for RISC-V uses a simplified syntax, so we need to rewrite
+ our meslibc according to that.
+
+- RISC-V uses a `__global_pointer$` symbol, but TinyCC does not allow dollars
+ in identifiers by default. The `-fdollars-in-identifiers` flag exploded when
+ used so we hardcoded the flag to true.
+
+- We backported the `long double` support from TinyCC's `mob` branch.
+ - And large constant generation.
+
+- Fixed some weird casting issues in TinyCC (see Fix casting issues (missing
+ func_vt in riscvgen.c)
+
+- MesCC produced binaries that were impossible to debug with GDB and OBJDUMP
+ complained about them. We fixed those too (some archs are missing)
+
+- MesCC's struct initialization to zeroes like `Whatever a = {0};` initialized
+ everything to `22` and is now working as expected.
+
+- `switch/case` statements in MesCC fallback always to default because they
+ check the fallback clause and then jump to default.
+
+- Mes had some incompatibilities with Guile that prevented us from running the
+ code fast. Fixed those.
+
+- Added support for RISC-V instruction formats in MesCC
+ (https://git.savannah.gnu.org/cgit/mes.git/commit/?h=wip-riscv&id=e42cf58d14520a5360d7d527d1c2c18c0a498c28)
+
+- Added support for signed rotation in MesCC. (all arches affected)
+
+- And also fixed some M2 things that allow all this 64 bit support happen in
+ MesCC, which didn't have 64 bit support before. Stikonas?
+
+- Stikonas also fixed problems in M2:
+ https://github.com/oriansj/M2-Planet/commit/85dd953b70c5f607769016bbf2a0aa3de7e41b6c
+
+- Fix Bootstrappable TinyCC's GOT (global offset table). It was just a broken
+ condition in an if (stikonas dealt with that)
+
+- Meslibc again! Tinycc does not support [extended
+ asm](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html) in RV64 but
+ stikonas fixes it replacing the extended asm by abi-compatible handwired asm.
+ The good fix would be to implement it, but upstream doesn't have it either...
+
+- `int size = 0; if (size < 8) size = 8;` does not work because TCC generated
+ wrong assembly and it jumps over the true branch even if it checks the
+ condition is ok. (reproducer in `C_TESTS/if.c`)
+
+- Variable length arguments were broken in Bootstrappable TCC. Upstream TCC
+ does some string magic to support them (c2str) where the same header file is
+ used twice: one in the binary and one in runtime. That functionality was lost
+ in the ~translation~ backport. We had to push some defines to Meslibc that
+ support that.
+
+- Meslibc had `typedef char int8_t` in `stdint.h` but that's not reliable,
+ because the C standard doesn't define the signedness of the `char`. In RISC-V
+ the signedness of the char is `unsigned` by default, so we have to be
+ explicit and say `signed char`, to avoid issues.
+
+- Remove some 0bXXXX literals I introduced in the assembler to simplify
+ things... They happen not to be standard C but a GCC extension.
+
+- Add a setjmp and longjmp implementation to meslibc that also support tinycc
+ assembler syntax. (copy from musl but with our syntax)
+-->