From 59a47e7d8d14415a3a1c0118e184e7b8de1093c8 Mon Sep 17 00:00:00 2001 From: Ekaitz Zarraga Date: Fri, 30 Sep 2022 23:23:05 +0200 Subject: Final? post about gcc bootstrap: RISCV support for bootstrappable TinyCC --- content/bootstrapGcc/06_tcc_mes.md | 371 +++++++++++++++++++++++++++++++++++++ 1 file changed, 371 insertions(+) create mode 100644 content/bootstrapGcc/06_tcc_mes.md diff --git a/content/bootstrapGcc/06_tcc_mes.md b/content/bootstrapGcc/06_tcc_mes.md new file mode 100644 index 0000000..e2a4a0e --- /dev/null +++ b/content/bootstrapGcc/06_tcc_mes.md @@ -0,0 +1,371 @@ +Title: Milestone – RISC-V support in Mes's bootstrappable TinyCC +Date: 2022-09-22 +Category: +Tags: Bootstrapping GCC in RISC-V +Slug: bootstrapGcc6 +Lang: en +Summary: + Bringing RISC-V support to the bootstrappable TinyCC Mes forked. Some + problems and a look into the future. + +In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC, +TinyCC, Mes and Mes's TinyCC fork that is designed to be bootstrappable. In +this post we are going to deal with the latter, explain how we made it work for +RISC-V and the challenges we encountered. + +### The non-bootstrappable nature of TinyCC + +As we introduced in the previous post TinyCC is not compilable from very simple +compilers like Mes's `mescc`. So the Mes project decided to make a [fork that +`mescc` was able to compile](https://gitlab.com/janneke/tinycc). Mes calls it a +*bootstrappable tinycc*. + +> There's a in uninteresting philosophical debate about what does +> *bootstrappable* mean, which leads to many errors and +> misunderstandings[^misunderstandings]. Many compilers call themselves +> bootstrappable if they can be compiled with themselves. When **we** talk +> about this, we are looking for a *full-source bootstrappability*, that is, +> that the compilers can be compiled from *source*, or from a *full-source +> bootstrappable* compiler. + +TinyCC is supposed to be compilable by itself, but who compiles the version +that compiles TinyCC? Another TinyCC? And who compiles that? + +The yogurt problem we always get: how do you make yogurt? Take yogurt, mix with +milk and in some hours you'll get yogurt. See the problem? + +If you are a culinary maniac, as I am, you can stretch this metaphor further. +If you know what you are doing, you can obtain yogurt from raw milk[^kefir]. + +That's what our project is doing: make yogurt from raw milk at some point. + +So the compilers normally only care about the latest yogurt, but, we, the +saviors of the ancient milk, those who can acidify the raw pureness, can make +yogurt starter with raw milk. + +That's the kind of magic nobody cares about, not in the compiler world nor in +the real life. + +The yogurt starter does not make the best yogurt, by the way, it needs +generations and generations of yogurts to make the best. That's what our +project does: start simple (stage-0 and Mes) and go enriching the product +(TinyCC) until reaching a mature yogurt (GCC). + +TinyCC does not really care about this bootstrappability concept. They only +want to be compilable with themselves. Nothing else. + +That's why [Jan](http://joyofsource.com/), the inventor of this metaphor I just +stretched to the infinite, had to fork the project. He had another choice: +simplify TinyCC's code upstream to be able to be compiled from a simpler step, +but his ideas were rejected and some weird animosity I don't understand +started. More on that later. + +[^misunderstandings]: I've reached many misunderstandings about my project too. +Some people have told me all this work is worthless because you can always +bootstrap from an x86_64 machine and then continue the bootstrapping effort in +your RISC-V. And so on. That's why this blog doesn't have a comment section. +People insist to believe that other people's work is worthless or they are able +to do it simpler with no effort. I won't claim that my explanations are the +best, but I can claim to be the laziest person I know, and I'd never spent time +in something that doesn't worth the effort. + +[^kefir]: With kefir you are fucked. We don't know where it comes from. Luckily + we harvested a lot and it's easy to grow. + +### The RISC-V support + +When the previous blogpost was written, TinyCC had a RV64 backend, but the +TinyCC fork did not have RISC-V support. + +My job here was to take the backend from the official TinyCC and bring it to +the bootstrappable one, Jan's fork. I can say that is done. Good for me. + +#### The process + +I followed the cross-compiler trick again, in order to make this process easier +in my computer and because Mes doesn't support RISC-V output yet. Making a +TinyCC for my x86_64 machine that had RISC-V output sounded more than +reasonable to me. Later I could always move to a full RISC-V machine making +sure that the backend was working. + +So first I made a guix package for upstream [TinyCC cross-compiler (for +RISC-V)](https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm#L85) +with GCC. This wasn't really obvious, because there were some variables to set +correctly. Tested everything compiled and worked like expected. Apart from a +couple of issues later corrected upstream, it did. + +Next, I made a guix package for [the forked TinyCC with +GCC](https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm#L83). This +also needed some changes, as the forked one is a quite old version of TinyCC. +The process needs here a `libtcc1.a` that can be empty if the process is +compiled with GCC (`libgcc` provides that functionality) but the compilation +process doesn't mention anything about this, and coming up with that by +yourself is hard. + +Now the project was compilable, it was time to code. You can see this part in +the `riscv-mes` branch: + + + +I took the backend from the upstream and inserted it in the fork. Of course, it +didn't compile. Many internal structures and APIs changed, so after trying to +stitch all together myself, I headed to the Mailing List. At the beginning I +wanted to think the answers I was getting were because I wasn't explaining my +doubts properly or something but what it was happening was that the animosity +towards our fork (decision I didn't take) appeared and someone tried to +ridicule me in the mailing list for no reason at all. + +The funny thing is I'd never needed to contact the mailing list if the project +was as well written as they claim it to be. It's full of functions and +variables with one character, the code is mixed together in a very aggressive +way... It's supersmall, tiny even, but really hard to read. Also, the commits +are not very descriptive for anyone that is not the main maintainer, who, +surprise! Is the same person that gives aggressive answers in the mailing +list... I hope it's only my perception and they are nice with his friends and +family, but the interaction made me feel uncomfortable and I don't want to +touch this code again. + +It was a sad moment, I must admit. But I decided I was going to do this with +help or without it. And I think I did it. Removed references here and there and +finally it looks like I reached somewhere. + +There are some differences to point out, one of the commits that made me ask in +the mailing list was a huge change on the way that conditionals are handled in +TinyCC. Our fork didn't have that so I needed to split the code in several +pieces and the benefits from that commit (some instruction optimization) are +lost in the backport. Still the branching and jumping is correct, but less +optimal. Not bad. + +Code added and compiled, it was time for testing. I made a little script (I +didn't share that, but it's not really relevant either) and a small test case +of simple C files and compiled (not linked) them with the upstream version of +the compiler and the forked one. Disassembled them and compared differences. + +You can try it building the upstream TinyCC and the fork and make them compile +(`-c`) a some files. Use `objdump --dissassemble` and see the results. It's not +really hard to test. Here you have an example of a program you can build: + +``` clike +// Example file to build +int main (int argc, char *argv[]){ + int a = 19, b = 90; + if (a && b){ + return 1; + } else { + return 45 + 90 << 8; + } +} +``` + +And the result it should give in both versions, optimized (upstream) and +unoptimized (our fork): + +``` text +OPTIMIZED VERSION || UNOPTIMIZED VERSION +===============================================||================================================== +0000000000000000
: || 0000000000000000
: + 0: fd010113 addi sp,sp,-48 || 0: fd010113 addi sp,sp,-48 + 4: 02113423 sd ra,40(sp) || 4: 02113423 sd ra,40(sp) + 8: 02813023 sd s0,32(sp) || 8: 02813023 sd s0,32(sp) + c: 03010413 addi s0,sp,48 || c: 03010413 addi s0,sp,48 + 10: 00000013 nop || 10: 00000013 nop + 14: fea43423 sd a0,-24(s0) || 14: fea43423 sd a0,-24(s0) + 18: feb43023 sd a1,-32(s0) || 18: feb43023 sd a1,-32(s0) + 1c: 0130051b addiw a0,zero,19 || 1c: 0130051b addiw a0,zero,19 + 20: fca42e23 sw a0,-36(s0) || 20: fca42e23 sw a0,-36(s0) + 24: 05a0051b addiw a0,zero,90 || 24: 05a0051b addiw a0,zero,90 + 28: fca42c23 sw a0,-40(s0) || 28: fca42c23 sw a0,-40(s0) + 2c: fdc42503 lw a0,-36(s0) || 2c: fdc42503 lw a0,-36(s0) + 30: 00051463 bnez a0,38 || 30: 00051463 bnez a0,38 + 34: 0180006f j 4c || 34: 01c0006f j 50 + 38: fd842503 lw a0,-40(s0) || 38: fd842503 lw a0,-40(s0) + 3c: 00051463 bnez a0,44 || 3c: 00051463 bnez a0,44 + 40: 00c0006f j 4c || 40: 0100006f j 50 + 44: 0010051b addiw a0,zero,1 || 44: 0010051b addiw a0,zero,1 + 48: 0100006f j 58 || 48: 0140006f j 5c + 4c: 00008537 lui a0,0x8 || 4c: 0100006f j 5c + 50: 7005051b addiw a0,a0,1792 || 50: 00008537 lui a0,0x8 + 54: 00000033 add zero,zero,zero || 54: 7005051b addiw a0,a0,1792 + 58: 02813083 ld ra,40(sp) || 58: 00000033 add zero,zero,zero + 5c: 02013403 ld s0,32(sp) || 5c: 02813083 ld ra,40(sp) + 60: 03010113 addi sp,sp,48 || 60: 02013403 ld s0,32(sp) + 64: 00008067 ret || 64: 03010113 addi sp,sp,48 + || 68: 00008067 ret + +``` + +In the right you can see there are some `j` instructions duplicated, but it's +not supposed to be a problem, as the rest of the addresses are calculated +properly, and they are never going to be reached. + + +#### Last step + +So the code is added to the fork and it seems to work. That's what I promised +to do, but I wanted to go a little bit further and test if Mes was able to +handle the code I added to the TinyCC fork. + +In order to do that I made another branch in the project where I changed the +package and some configuration in order to compile the forked TinyCC using Mes. + +You can see what I did here: + + + +Turns out that I managed to build the thing, using Mes for my x86_64 machine +choosing RISC-V as the backend, but it doesn't work at all. + +The resulting compiler generates empty files that have no permissions and fails +instantly. + +At least we tested that `mescc` is ok with the C constructs we used in the +backport of the RISC-V support. But there are still many things to test and +this isn't easy at all. + +Let me give you some examples on how tricky this process is. + +This line in the `guix.scm` file[^line]: + +``` clike + "--extra-cflags=-Dinline= -DONE_SOURCE=1" +``` + +Does two crazy preprocessor tricks, inserted as C flags. It's equivalent to +adding these macros in the top level of the sources: + +``` clike +#define inline +#define ONE_SOURCE 1 +``` + +The first one removes the word `inline` from the source code, because `mescc` +does not support that. The second, defines `ONE_SOURCE` to a value because if +it's only defined, without a value, like the makefile does by default, it is +not matched properly by de `#ifdef`s. Finding this is not obvious. + +[^line]: + +That's of course not the only thing, we found out many others. I spent a couple +of weeks making the building process work for `mescc` and when I thought it was +working the result is a broken binary. Pretty fun. + +And why all this trouble, you might think? + +Jan's fork is not compiled using the `configure` and the `Makefile` the project +comes with, he wrote some shell scripts to build everything. I wanted to try to +build the project directly as it came for several reasons: the scripts are +prepared for native compilers and not for the cross compiler I was building, +they use Mes from source but I just needed to use the upstream one and I +thought integrating all this in the normal building process would be an extra +win. + +I lost this time though. + +The compilation process might be missing some libraries, or some stubs might be +in use instead of the real code... Maybe the problem is I'm using the x86_64 +version of Mes, which is not thoroughly tested... But using the i386 version is +not possible because I'm building for 64bit RISC-V and the i386 doesn't know +how to deal with 64 bit words... Honestly, I don't know what to do. + +### Something cool to say + +Mes does not compile following the classic process. Mes is integrated with some +tools from the stage-0 project so it uses the M1 macro system, hex0 and all +that kind of things to build the programs. + +During the process I found that some of the M1 instructions Mes was generating +were not available by M1, so I had to add a few extra instructions to the M1 +macro definitions for Mes. Here's the diff (a little bit simplified) I had to +make: + + +``` diff +diff --git a/lib/x86_64-mes/x86_64.M1 b/lib/x86_64-mes/x86_64.M1 +index 9ffbbf15..64997c55 100644 +--- a/lib/x86_64-mes/x86_64.M1 ++++ b/lib/x86_64-mes/x86_64.M1 +@@ -147,6 +148,10 @@ DEFINE mov____0x8(%rbp),%rsp 488b65 + DEFINE mov____0x8(%rdi),%rax 488b47 + DEFINE mov____0x8(%rdi),%rbp 488b6f + DEFINE mov____0x8(%rdi),%rsp 488b67 ++DEFINE mov____(%rax),%si 668b30 ++DEFINE mov____(%rax),%sil 408a30 ++DEFINE mov____%si,(%rdi) 668937 ++DEFINE mov____%sil,(%rdi) 448837 + DEFINE movl___%eax,0x32 890425 + DEFINE movl___%edi,0x32 893c25 + DEFINE movl___%esi,(%rdi) 8937 + +base-commit: aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037 +``` + +Now, with those instructions added, my package got a little bit more complex: +I had to extend the Mes package with my patch until that change is accepted +upstream. But this is great! Using software and improving it while you use it +is the best feeling in life![^choco] + +[^choco]: Chocolate and hot coffee too. + +Let me use this point to show you a little bit how this macro system works. You +can see this `x86_64.M1` file has three columns: `DEFINE`, some text, and some +number in hex. This is kind of an assembler description. There's the M1 program +that receives a file written with instructions that look like the text in the +second column in the `.M1` file and converts them one by one to the numbers in +the third. In short, the `.M1` file is a reference that tells the M1 program +how to do the conversion. + +M1 is just a text replacement tool that makes the conversion based on the input +file it gets from the `.M1` file. It helps us write instructions in a way that +looks like they have a meaning (that's what an assembler is after all). + +Later, those numbers are converted to binary, using Hex0 or another a little +bit more sophisticated tool. + +All these tools are written in a way that can be audited (Hex0 is written in +Hex0...) and they are executed from source at their very beginning. + +This is how we make yogurt directly from milk. Cool huh? +Props to + +### Conclusions + +Back to the project, considering the fact that I didn't manage to build a fully +working TinyCC with a RISC-V backend using Mes, is this a failure? + +I wouldn't say so. + +The new RISC-V backend is added and tested in the forked TinyCC, using GCC as a +compiler. That's a big chunk of the work. + +On the other hand, I can compile the forked TinyCC with `mescc` even if the +result didn't work, I can say the code I added was processed so it was +technically acceptable for `mescc`. Not bad, but we'll still need to see how +true is this. + +In the end, these kind of small steps make progress, and having everything +documented here and in the commits on the git repositories help others continue +with what I just did. + +Now, I'm going to leave this as finished, as the code is supposed to work. All +the dots are more or less drawn. Now it's time for another project, one that +connects all the dots of the RISC-V full source bootstrap: from `mescc` +(already has some RISC-V support) to the forked TinyCC (I added the RISC-V +support), next to the mainline TinyCC (has RISC-V support) or/and GCC 4.6.4 (I +added RISC-V support) and from one of those to GCC 7.5 (the first one with +RISC-V support) and then to the world. + +My work in this project left all the breadcrumbs in the forest, ready for +anyone to follow[^breadcrumbs]. + +[^breadcrumbs]: I hope someone follows them before the birds eat them. + +That person can be me, anyone else or even a group of people. All I can say is +I won't forget this project, I'll always be reachable for advice and I'd try to +help as much as I can. As I always do. + +These days I'll continue to give a couple of tries to this and I may reach +something else, but I won't be as busy on it as I've been. I think I gave +everything I could in this project. There's still a lot to do, but what it's +left is not something I can do alone. + +Until next time. -- cgit v1.2.3