From 6e551039cd9afea33d0acde6cfd248ece8e63e04 Mon Sep 17 00:00:00 2001 From: Ekaitz Zarraga Date: Tue, 2 Aug 2022 14:14:40 +0200 Subject: Add post about tinycc --- content/bootstrapGcc/05_tcc_changes.md | 278 +++++++++++++++++++++++++++++++++ 1 file changed, 278 insertions(+) create mode 100644 content/bootstrapGcc/05_tcc_changes.md diff --git a/content/bootstrapGcc/05_tcc_changes.md b/content/bootstrapGcc/05_tcc_changes.md new file mode 100644 index 0000000..0eb42c6 --- /dev/null +++ b/content/bootstrapGcc/05_tcc_changes.md @@ -0,0 +1,278 @@ +Title: Adding TinyCC to the mix +Date: 2022-08-01 +Category: +Tags: Bootstrapping GCC in RISC-V +Slug: bootstrapGcc5 +Lang: en +Summary: + Discussing what changes need to be done to make GCC compilable form a + simpler C compiler, TinyCC. + +In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC, +made it able to compile C programs and so on, but we didn't solve how to build +that GCC with a simpler compiler. In this post I'll try to explain which +changes must be applied to all the ecosystem to be able to do this. + +### The current status + +I already talked about this in the past, but it's always a good moment to +remind the bootstrapping process we are immerse in. There are steps before of +these, but I'm going to start in GNU Mes, which is the core of all this. + +From the part that interests us, GNU Mes has a C compiler, called MesCC. This C +compiler is the one we use to compile TinyCC and we use that TinyCC to compile +a really old version of GCC, the 2.95, and from that we compile more recent +versions until we reach the current one. From the current one we compile the +world. + +That's the theory, and it's what we currently have in the most widely supported +architectures (`i386` and maybe some ARM flavour). Problems arise when you deal +with some new architecture, like the one we have to deal with: RISC-V. + +RISC-V was invented recently, and the compilers did not add support for it +until some years ago. GCC added support for RISC-V in the 7.5 version, as we +have been discussing through this series, which needed a C++ compiler in order +to be built. That's a problem we almost solved in the previous steps, +backporting the RISC-V support to a GCC that only needed a C compiler to be +built. + +Now, extra problems appear. Which C compiler are we going to use to build that +GCC 4.6.4 that has the RISC-V support we backported? + +According to the process we described, we should use GCC 2.95, but it doesn't +support RISC-V so we would need to backport the RISC-V support to that one too. +That's not cool. + +Another option would be to remove the GCC 2.95 from the equation and compile +the GCC 4.6.4 directly from TinyCC, if that's possible. Making the whole +process faster removing some dependencies. But this means TinyCC has to be able +to compile GCC 4.6.4. We are going to try to make this one, but that requires +some work we will describe today. + +On the other hand, in order to be able to build all this for RISC-V, TinyCC and +MesCC have to be able to target RISC-V... + +Too many conditions have to be true to all this to work. But hey! Let's go step +by step. + +### RISC-V support in TinyCC + +First, we have to make sure that TinyCC has RISC-V support, and it does. Since +not a long time ago, TinyCC is able to compile, assemble and link for RISC-V, +only for 64 bits. + +I tested this support using a TinyCC cross-compiler and it works. If you want +to try it, I have a simple [Guix package][tcc-package] for the cross compiler, +and I also fixed the official Guix package for the native TinyCC, which have +been broken for long. + +Still, I didn't test the RISC-V support natively, but if the cross-compiler +works, chances are the native will also work, so I'm not really worried about +this point. + +[tcc-package]: https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm + + +### GNU Mes compiling TinyCC + +GNU Mes supports an old C standard that is simpler than the one TinyCC uses, so +it uses a fork of TinyCC with some C features removed. This fork was done way +before the RISC-V support was added to TinyCC and many things have changed +since then. + +[We need to backport the TinyCC RISC-V support to Mes's own TinyCC fork, +then.](https://www.youtube.com/watch?v=-1qju6V1jLM) Or at least do something +about it. + +When I first took a look into this issue, I thought it would be an easy fix, I +already backported GCC, which is orders of magnitude larger than TinyCC... But +it's not that easy. TinyCC's internal API changed quite a bit since the fork +was done, and I need to review all of it in order to make it work. Also, this +process includes the need to convert all the modern C that is not supported by +MesCC to the older C constructs that are available on it. + +It's a lot of work, but it's doable to a certain degree, and this might suppose +a big step for the full source bootstrap process. Like what I did in GCC, it's +not going to solve everything, but it's a huge step in the right direction. + + +### GNU Mes supporting RISC-V + +On the lower level part of the story, if we want to make all this process work +for RISC-V, GNU Mes itself should be runnable on it, and able to generate +binaries for it. + +[There have been efforts][mes-riscv-effort] to make all this possible, and I +don't expect this support to take long to appear finally in GNU Mes. It's just +a matter of time and funding. I am aware that Jan is also interested on +spending time on this, so I think we are covered on this area. + +[mes-riscv-effort]: https://lists.gnu.org/archive/html/bug-mes/2021-04/msg00031.html + + +### GCC compilation with TinyCC + +The only point we are missing then is to be able to build the backported GCC +from TinyCC, without the intermediate GCC 2.95. This a tough one to test and +achieve, because the GCC compilation process is extremely complex, and we need +to make quite complex packages for this process to work. + +On the other hand, the work I already did, packaging my backported GCC for guix +is not enough for several reasons: it was designed to work with a modern GCC +toolchain, and not with TinyCC; and a cross-compiler is not the same thing as a +native one. + +GCC is normally compiled in stages, which are called *bootstrap* by the GCC +build system. I described a little bit of that process [in a footnote in +past][staged]. That process is not activated in a cross-compilation +environment, which is what I used when the backend I backported was +backtested. If the *bootstrap* process doesn't work, it means the +compilation process fails, so this introduces possible errors in the build +system which we were avoiding thanks to the cross-compilation trick. + +[staged]: https://ekaitz.elenq.tech/bootstrapGcc3.html#fn:staged + +I did this on purpose, of course. I just wanted a simple working environment +which was letting me test the backported RISC-V backend of the compiler, but +now we need to make a proper package for GCC 4.6.4, and make it work for +TinyCC. + +I wouldn't mention this if I didn't try it and failed making this package. It's +not specially difficult to make a package, or it doesn't look like, until you +get errors like: + +``` weird-error-lol +configure: error: C compiler cannot create executables + +`¯\_(ツ)_/¯` +``` + +That being said, this is not only a packaging issue. As we already mentioned, +we are removing GCC 2.95 from the pipeline, so TinyCC has to be able to deal +with the GCC 4.6.4 codebase directly, including the backport I did. + +The easiest way to test this is to compile GCC 4.6.4 for x86_64 in my machine, +with no emulation in between, so we can find the things TinyCC can't deal with. +Later we would be able to test this further in an emulated environment or +directly in a RISC-V machine to make sure TinyCC can deal with the RISC-V +backend, but for a first review in the GCC core, using x86_64 can be enough. +It requires no weird setup, further than a working package... Ouch! + +I'm not really good at this part and I'm not sure if anyone else is, but I +don't feel like spending time in trying to make this package cascade. I feel +like my time is better spent on fixing stuff, or, once the package cascade is +done, fixing the compatibility. + +During the whole project, making Guix packages and figuring out build systems +is the part where more time was spent, and it's the one with the lowest success +rate. It feels like I wasted hours trying to make the build process work for +nothing. + +The funny part of this is Guix is partially the one to blame here, not +conforming the FHS and having this weird way to handle inputs is what makes the +whole process really complex. Code has to be patched to find the libraries, +scripts must be patched too, binaries are hard to find... On the good side, +it's Guix that makes this work worth the effort, and also what makes this +process reproducible, once it's done, to let everyone enjoy it. + + +#### Wait, but didn't Mes use a TinyCC fork? + +Oh yeah of course. What I forgot to mention is the step we just described, +making TinyCC able to compile the backported GCC 4.6.4, is not just as simple +as I mentioned. If we use upstream TinyCC to compile GCC, who is going to +compile that TinyCC? We already said MesCC is not able to do that directly. + +We could build that TinyCC with the TinyCC fork Mes has or make the TinyCC fork +go directly for the GCC 4.6.4, but in any case there's an obvious task to +tackle: The RISC-V support must arrive the TinyCC fork before we can do +anything else. And that's where I want to focus. + +### This is not only about RISC-V + +I have to be clear with you: I mixed two problems together and I did that on +purpose. + +On the one hand we have the RISC-V support related changes. And on the other +hand we have the changes on the compilation pipeline: the removal of GCC 2.95. + +The second part is just a consequence of the first, but it's not only related +with the RISC-V world. Once we have our compilers ready, we are going to apply +the change for the whole thing. Removing a step is a really important task for +many reasons but one is the obvious at this point: having a really old compiler +like GCC 2.95 forces us to stay with the architectures it was able to target, +or makes us add them and maintain them ourselves. It's a huge flexibility +issue for the little gain it gives: GCC 4.6.4 is already compilable from a C90 +compiler. + +So, this is an important milestone, not only for my part of the job but also +for the whole GNU Mes and bootstrapping effort. Skipping GCC 2.95 has to be +done in every architecture, and the packaging effort of that is unavoidable. + +### What I already did + +While I was reviewing what it needed to be done, I started doing things here +and there, preparing the work and making sure I was understanding the context +better. + +First, I realized I introduced some non-C90 constructs in the backport of GCC, +because I directly copied some code from 7.5 and I removed those. This is +important, because we need to be able to compile all this with TinyCC, and I +don't expect TinyCC to support modern constructs. + + +I packaged a TinyCC RISC-V cross compiler [for the upstream +project][tcc-package], and also for [the Mes fork][mes-tcc-package] even +thought the latter is not available yet for compilation: we need to backport +the backend in order to make it work. Still, it's important work, because it +lets me start the backport easily. I'll need to apply more changes on top of +it, for sure, but at the moment I have all I need to start coding the new +backend. + +[mes-tcc-package]: https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm + +I spent countless hours trying to make a proper GCC package and trying to use +TinyCC as the C compiler for it with no success. This is why I decided to move +on and work in a more interesting and usable part: adding the RISC-V backend to +the Mes fork of TinyCC. + +Of course, I already started working on the RISC-V support of the TinyCC fork +from Mes, and started encountering API mismatches here and there. Most of them +related with some optimizations introduced after the fork, that I need to +review in more detail in the upcoming weeks. I also spent some time trying to +understand how TinyCC works, and it's a very interesting approach I have to +say[^maybe]. + +[^maybe]: Maybe I'll have the time to explain it in a future blog post, maybe + not. + + +### Conclusions + +I'd love to tackle all these problems together and fix the whole system, but +I'm just one guy coding from his couch. It's not realistic to think I can fix +everything, and trying to do so is detrimental to my mental health. + +So I decided to go for the RISC-V support for the TinyCC fork we have at Mes. +This would leave all the ingredients ready for someone more experienced than me +to make the final recipe. + +The same thing happened with the GCC backport. I didn't really finish the job: +there's no C++ compiler working yet, but that's not what matters. Anyone can +take what I did, package it properly, which it happened to be an impossible +task for me, and make it be ready. We already made a huge step. + +Fighting against a wall is bad for everyone, it's better to pick a task where +you can provide something. You feel better, and the overall state of the +project is improved. Achieving things is the best gasoline you can get for +achieving new things. + +Regarding the task I chose, I've already spent some hours working on it. It's +not an easy task. The internal TinyCC API changed a lot since the moment the +fork was done, and there are many commits related with RISC-V since then. One +of the most recent one fixes the RISC-V assembler after I reported it wasn't +working, few weeks ago. All these changes must be reviewed carefully, undoing +the API changes and also, most importantly, keeping the code compatible with +GNU Mes's C compiler. + +Not an easy task. -- cgit v1.2.3