summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorEkaitz Zarraga <ekaitz@elenq.tech>2022-09-30 23:23:05 +0200
committerEkaitz Zarraga <ekaitz@elenq.tech>2022-09-30 23:23:05 +0200
commit59a47e7d8d14415a3a1c0118e184e7b8de1093c8 (patch)
tree1c7b63c90c9adc600a2c4231a58a2f6159496b33
parent6e551039cd9afea33d0acde6cfd248ece8e63e04 (diff)
Final? post about gcc bootstrap: RISCV support for bootstrappable TinyCC
-rw-r--r--content/bootstrapGcc/06_tcc_mes.md371
1 files changed, 371 insertions, 0 deletions
diff --git a/content/bootstrapGcc/06_tcc_mes.md b/content/bootstrapGcc/06_tcc_mes.md
new file mode 100644
index 0000000..e2a4a0e
--- /dev/null
+++ b/content/bootstrapGcc/06_tcc_mes.md
@@ -0,0 +1,371 @@
+Title: Milestone – RISC-V support in Mes's bootstrappable TinyCC
+Date: 2022-09-22
+Category:
+Tags: Bootstrapping GCC in RISC-V
+Slug: bootstrapGcc6
+Lang: en
+Summary:
+ Bringing RISC-V support to the bootstrappable TinyCC Mes forked. Some
+ problems and a look into the future.
+
+In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC,
+TinyCC, Mes and Mes's TinyCC fork that is designed to be bootstrappable. In
+this post we are going to deal with the latter, explain how we made it work for
+RISC-V and the challenges we encountered.
+
+### The non-bootstrappable nature of TinyCC
+
+As we introduced in the previous post TinyCC is not compilable from very simple
+compilers like Mes's `mescc`. So the Mes project decided to make a [fork that
+`mescc` was able to compile](https://gitlab.com/janneke/tinycc). Mes calls it a
+*bootstrappable tinycc*.
+
+> There's a in uninteresting philosophical debate about what does
+> *bootstrappable* mean, which leads to many errors and
+> misunderstandings[^misunderstandings]. Many compilers call themselves
+> bootstrappable if they can be compiled with themselves. When **we** talk
+> about this, we are looking for a *full-source bootstrappability*, that is,
+> that the compilers can be compiled from *source*, or from a *full-source
+> bootstrappable* compiler.
+
+TinyCC is supposed to be compilable by itself, but who compiles the version
+that compiles TinyCC? Another TinyCC? And who compiles that?
+
+The yogurt problem we always get: how do you make yogurt? Take yogurt, mix with
+milk and in some hours you'll get yogurt. See the problem?
+
+If you are a culinary maniac, as I am, you can stretch this metaphor further.
+If you know what you are doing, you can obtain yogurt from raw milk[^kefir].
+
+That's what our project is doing: make yogurt from raw milk at some point.
+
+So the compilers normally only care about the latest yogurt, but, we, the
+saviors of the ancient milk, those who can acidify the raw pureness, can make
+yogurt starter with raw milk.
+
+That's the kind of magic nobody cares about, not in the compiler world nor in
+the real life.
+
+The yogurt starter does not make the best yogurt, by the way, it needs
+generations and generations of yogurts to make the best. That's what our
+project does: start simple (stage-0 and Mes) and go enriching the product
+(TinyCC) until reaching a mature yogurt (GCC).
+
+TinyCC does not really care about this bootstrappability concept. They only
+want to be compilable with themselves. Nothing else.
+
+That's why [Jan](http://joyofsource.com/), the inventor of this metaphor I just
+stretched to the infinite, had to fork the project. He had another choice:
+simplify TinyCC's code upstream to be able to be compiled from a simpler step,
+but his ideas were rejected and some weird animosity I don't understand
+started. More on that later.
+
+[^misunderstandings]: I've reached many misunderstandings about my project too.
+Some people have told me all this work is worthless because you can always
+bootstrap from an x86_64 machine and then continue the bootstrapping effort in
+your RISC-V. And so on. That's why this blog doesn't have a comment section.
+People insist to believe that other people's work is worthless or they are able
+to do it simpler with no effort. I won't claim that my explanations are the
+best, but I can claim to be the laziest person I know, and I'd never spent time
+in something that doesn't worth the effort.
+
+[^kefir]: With kefir you are fucked. We don't know where it comes from. Luckily
+ we harvested a lot and it's easy to grow.
+
+### The RISC-V support
+
+When the previous blogpost was written, TinyCC had a RV64 backend, but the
+TinyCC fork did not have RISC-V support.
+
+My job here was to take the backend from the official TinyCC and bring it to
+the bootstrappable one, Jan's fork. I can say that is done. Good for me.
+
+#### The process
+
+I followed the cross-compiler trick again, in order to make this process easier
+in my computer and because Mes doesn't support RISC-V output yet. Making a
+TinyCC for my x86_64 machine that had RISC-V output sounded more than
+reasonable to me. Later I could always move to a full RISC-V machine making
+sure that the backend was working.
+
+So first I made a guix package for upstream [TinyCC cross-compiler (for
+RISC-V)](https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm#L85)
+with GCC. This wasn't really obvious, because there were some variables to set
+correctly. Tested everything compiled and worked like expected. Apart from a
+couple of issues later corrected upstream, it did.
+
+Next, I made a guix package for [the forked TinyCC with
+GCC](https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm#L83). This
+also needed some changes, as the forked one is a quite old version of TinyCC.
+The process needs here a `libtcc1.a` that can be empty if the process is
+compiled with GCC (`libgcc` provides that functionality) but the compilation
+process doesn't mention anything about this, and coming up with that by
+yourself is hard.
+
+Now the project was compilable, it was time to code. You can see this part in
+the `riscv-mes` branch:
+
+<https://github.com/ekaitz-zarraga/tcc/commits/riscv-mes>
+
+I took the backend from the upstream and inserted it in the fork. Of course, it
+didn't compile. Many internal structures and APIs changed, so after trying to
+stitch all together myself, I headed to the Mailing List. At the beginning I
+wanted to think the answers I was getting were because I wasn't explaining my
+doubts properly or something but what it was happening was that the animosity
+towards our fork (decision I didn't take) appeared and someone tried to
+ridicule me in the mailing list for no reason at all.
+
+The funny thing is I'd never needed to contact the mailing list if the project
+was as well written as they claim it to be. It's full of functions and
+variables with one character, the code is mixed together in a very aggressive
+way... It's supersmall, tiny even, but really hard to read. Also, the commits
+are not very descriptive for anyone that is not the main maintainer, who,
+surprise! Is the same person that gives aggressive answers in the mailing
+list... I hope it's only my perception and they are nice with his friends and
+family, but the interaction made me feel uncomfortable and I don't want to
+touch this code again.
+
+It was a sad moment, I must admit. But I decided I was going to do this with
+help or without it. And I think I did it. Removed references here and there and
+finally it looks like I reached somewhere.
+
+There are some differences to point out, one of the commits that made me ask in
+the mailing list was a huge change on the way that conditionals are handled in
+TinyCC. Our fork didn't have that so I needed to split the code in several
+pieces and the benefits from that commit (some instruction optimization) are
+lost in the backport. Still the branching and jumping is correct, but less
+optimal. Not bad.
+
+Code added and compiled, it was time for testing. I made a little script (I
+didn't share that, but it's not really relevant either) and a small test case
+of simple C files and compiled (not linked) them with the upstream version of
+the compiler and the forked one. Disassembled them and compared differences.
+
+You can try it building the upstream TinyCC and the fork and make them compile
+(`-c`) a some files. Use `objdump --dissassemble` and see the results. It's not
+really hard to test. Here you have an example of a program you can build:
+
+``` clike
+// Example file to build
+int main (int argc, char *argv[]){
+ int a = 19, b = 90;
+ if (a && b){
+ return 1;
+ } else {
+ return 45 + 90 << 8;
+ }
+}
+```
+
+And the result it should give in both versions, optimized (upstream) and
+unoptimized (our fork):
+
+``` text
+OPTIMIZED VERSION || UNOPTIMIZED VERSION
+===============================================||==================================================
+0000000000000000 <main>: || 0000000000000000 <main>:
+ 0: fd010113 addi sp,sp,-48 || 0: fd010113 addi sp,sp,-48
+ 4: 02113423 sd ra,40(sp) || 4: 02113423 sd ra,40(sp)
+ 8: 02813023 sd s0,32(sp) || 8: 02813023 sd s0,32(sp)
+ c: 03010413 addi s0,sp,48 || c: 03010413 addi s0,sp,48
+ 10: 00000013 nop || 10: 00000013 nop
+ 14: fea43423 sd a0,-24(s0) || 14: fea43423 sd a0,-24(s0)
+ 18: feb43023 sd a1,-32(s0) || 18: feb43023 sd a1,-32(s0)
+ 1c: 0130051b addiw a0,zero,19 || 1c: 0130051b addiw a0,zero,19
+ 20: fca42e23 sw a0,-36(s0) || 20: fca42e23 sw a0,-36(s0)
+ 24: 05a0051b addiw a0,zero,90 || 24: 05a0051b addiw a0,zero,90
+ 28: fca42c23 sw a0,-40(s0) || 28: fca42c23 sw a0,-40(s0)
+ 2c: fdc42503 lw a0,-36(s0) || 2c: fdc42503 lw a0,-36(s0)
+ 30: 00051463 bnez a0,38 <main+0x38> || 30: 00051463 bnez a0,38 <main+0x38>
+ 34: 0180006f j 4c <main+0x4c> || 34: 01c0006f j 50 <main+0x50>
+ 38: fd842503 lw a0,-40(s0) || 38: fd842503 lw a0,-40(s0)
+ 3c: 00051463 bnez a0,44 <main+0x44> || 3c: 00051463 bnez a0,44 <main+0x44>
+ 40: 00c0006f j 4c <main+0x4c> || 40: 0100006f j 50 <main+0x50>
+ 44: 0010051b addiw a0,zero,1 || 44: 0010051b addiw a0,zero,1
+ 48: 0100006f j 58 <main+0x58> || 48: 0140006f j 5c <main+0x5c>
+ 4c: 00008537 lui a0,0x8 || 4c: 0100006f j 5c <main+0x5c>
+ 50: 7005051b addiw a0,a0,1792 || 50: 00008537 lui a0,0x8
+ 54: 00000033 add zero,zero,zero || 54: 7005051b addiw a0,a0,1792
+ 58: 02813083 ld ra,40(sp) || 58: 00000033 add zero,zero,zero
+ 5c: 02013403 ld s0,32(sp) || 5c: 02813083 ld ra,40(sp)
+ 60: 03010113 addi sp,sp,48 || 60: 02013403 ld s0,32(sp)
+ 64: 00008067 ret || 64: 03010113 addi sp,sp,48
+ || 68: 00008067 ret
+
+```
+
+In the right you can see there are some `j` instructions duplicated, but it's
+not supposed to be a problem, as the rest of the addresses are calculated
+properly, and they are never going to be reached.
+
+
+#### Last step
+
+So the code is added to the fork and it seems to work. That's what I promised
+to do, but I wanted to go a little bit further and test if Mes was able to
+handle the code I added to the TinyCC fork.
+
+In order to do that I made another branch in the project where I changed the
+package and some configuration in order to compile the forked TinyCC using Mes.
+
+You can see what I did here:
+
+<https://github.com/ekaitz-zarraga/tcc/commits/mes-package>
+
+Turns out that I managed to build the thing, using Mes for my x86_64 machine
+choosing RISC-V as the backend, but it doesn't work at all.
+
+The resulting compiler generates empty files that have no permissions and fails
+instantly.
+
+At least we tested that `mescc` is ok with the C constructs we used in the
+backport of the RISC-V support. But there are still many things to test and
+this isn't easy at all.
+
+Let me give you some examples on how tricky this process is.
+
+This line in the `guix.scm` file[^line]:
+
+``` clike
+ "--extra-cflags=-Dinline= -DONE_SOURCE=1"
+```
+
+Does two crazy preprocessor tricks, inserted as C flags. It's equivalent to
+adding these macros in the top level of the sources:
+
+``` clike
+#define inline
+#define ONE_SOURCE 1
+```
+
+The first one removes the word `inline` from the source code, because `mescc`
+does not support that. The second, defines `ONE_SOURCE` to a value because if
+it's only defined, without a value, like the makefile does by default, it is
+not matched properly by de `#ifdef`s. Finding this is not obvious.
+
+[^line]: <https://github.com/ekaitz-zarraga/tcc/blob/mes-package/guix.scm#L196>
+
+That's of course not the only thing, we found out many others. I spent a couple
+of weeks making the building process work for `mescc` and when I thought it was
+working the result is a broken binary. Pretty fun.
+
+And why all this trouble, you might think?
+
+Jan's fork is not compiled using the `configure` and the `Makefile` the project
+comes with, he wrote some shell scripts to build everything. I wanted to try to
+build the project directly as it came for several reasons: the scripts are
+prepared for native compilers and not for the cross compiler I was building,
+they use Mes from source but I just needed to use the upstream one and I
+thought integrating all this in the normal building process would be an extra
+win.
+
+I lost this time though.
+
+The compilation process might be missing some libraries, or some stubs might be
+in use instead of the real code... Maybe the problem is I'm using the x86_64
+version of Mes, which is not thoroughly tested... But using the i386 version is
+not possible because I'm building for 64bit RISC-V and the i386 doesn't know
+how to deal with 64 bit words... Honestly, I don't know what to do.
+
+### Something cool to say
+
+Mes does not compile following the classic process. Mes is integrated with some
+tools from the stage-0 project so it uses the M1 macro system, hex0 and all
+that kind of things to build the programs.
+
+During the process I found that some of the M1 instructions Mes was generating
+were not available by M1, so I had to add a few extra instructions to the M1
+macro definitions for Mes. Here's the diff (a little bit simplified) I had to
+make:
+
+
+``` diff
+diff --git a/lib/x86_64-mes/x86_64.M1 b/lib/x86_64-mes/x86_64.M1
+index 9ffbbf15..64997c55 100644
+--- a/lib/x86_64-mes/x86_64.M1
++++ b/lib/x86_64-mes/x86_64.M1
+@@ -147,6 +148,10 @@ DEFINE mov____0x8(%rbp),%rsp 488b65
+ DEFINE mov____0x8(%rdi),%rax 488b47
+ DEFINE mov____0x8(%rdi),%rbp 488b6f
+ DEFINE mov____0x8(%rdi),%rsp 488b67
++DEFINE mov____(%rax),%si 668b30
++DEFINE mov____(%rax),%sil 408a30
++DEFINE mov____%si,(%rdi) 668937
++DEFINE mov____%sil,(%rdi) 448837
+ DEFINE movl___%eax,0x32 890425
+ DEFINE movl___%edi,0x32 893c25
+ DEFINE movl___%esi,(%rdi) 8937
+
+base-commit: aa5f1533e1736a89e60d2c34c2a0ab3b01f8d037
+```
+
+Now, with those instructions added, my package got a little bit more complex:
+I had to extend the Mes package with my patch until that change is accepted
+upstream. But this is great! Using software and improving it while you use it
+is the best feeling in life![^choco]
+
+[^choco]: Chocolate and hot coffee too.
+
+Let me use this point to show you a little bit how this macro system works. You
+can see this `x86_64.M1` file has three columns: `DEFINE`, some text, and some
+number in hex. This is kind of an assembler description. There's the M1 program
+that receives a file written with instructions that look like the text in the
+second column in the `.M1` file and converts them one by one to the numbers in
+the third. In short, the `.M1` file is a reference that tells the M1 program
+how to do the conversion.
+
+M1 is just a text replacement tool that makes the conversion based on the input
+file it gets from the `.M1` file. It helps us write instructions in a way that
+looks like they have a meaning (that's what an assembler is after all).
+
+Later, those numbers are converted to binary, using Hex0 or another a little
+bit more sophisticated tool.
+
+All these tools are written in a way that can be audited (Hex0 is written in
+Hex0...) and they are executed from source at their very beginning.
+
+This is how we make yogurt directly from milk. Cool huh?
+Props to <http://bootstrappable.org/>
+
+### Conclusions
+
+Back to the project, considering the fact that I didn't manage to build a fully
+working TinyCC with a RISC-V backend using Mes, is this a failure?
+
+I wouldn't say so.
+
+The new RISC-V backend is added and tested in the forked TinyCC, using GCC as a
+compiler. That's a big chunk of the work.
+
+On the other hand, I can compile the forked TinyCC with `mescc` even if the
+result didn't work, I can say the code I added was processed so it was
+technically acceptable for `mescc`. Not bad, but we'll still need to see how
+true is this.
+
+In the end, these kind of small steps make progress, and having everything
+documented here and in the commits on the git repositories help others continue
+with what I just did.
+
+Now, I'm going to leave this as finished, as the code is supposed to work. All
+the dots are more or less drawn. Now it's time for another project, one that
+connects all the dots of the RISC-V full source bootstrap: from `mescc`
+(already has some RISC-V support) to the forked TinyCC (I added the RISC-V
+support), next to the mainline TinyCC (has RISC-V support) or/and GCC 4.6.4 (I
+added RISC-V support) and from one of those to GCC 7.5 (the first one with
+RISC-V support) and then to the world.
+
+My work in this project left all the breadcrumbs in the forest, ready for
+anyone to follow[^breadcrumbs].
+
+[^breadcrumbs]: I hope someone follows them before the birds eat them.
+
+That person can be me, anyone else or even a group of people. All I can say is
+I won't forget this project, I'll always be reachable for advice and I'd try to
+help as much as I can. As I always do.
+
+These days I'll continue to give a couple of tries to this and I may reach
+something else, but I won't be as busy on it as I've been. I think I gave
+everything I could in this project. There's still a lot to do, but what it's
+left is not something I can do alone.
+
+Until next time.