summaryrefslogtreecommitdiff
path: root/content/bootstrapGcc/05_tcc_changes.md
blob: 0eb42c6dc89f3dc8d3247781419b5dc7826383d7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
Title: Adding TinyCC to the mix
Date: 2022-08-01
Category:
Tags: Bootstrapping GCC in RISC-V
Slug: bootstrapGcc5
Lang: en
Summary:
    Discussing what changes need to be done to make GCC compilable form a
    simpler C compiler, TinyCC.

In the [series]({tag}Bootstrapping GCC in RISC-V) we already introduced GCC,
made it able to compile C programs and so on, but we didn't solve how to build
that GCC with a simpler compiler. In this post I'll try to explain which
changes must be applied to all the ecosystem to be able to do this.

### The current status

I already talked about this in the past, but it's always a good moment to
remind the bootstrapping process we are immerse in. There are steps before of
these, but I'm going to start in GNU Mes, which is the core of all this.

From the part that interests us, GNU Mes has a C compiler, called MesCC. This C
compiler is the one we use to compile TinyCC and we use that TinyCC to compile
a really old version of GCC, the 2.95, and from that we compile more recent
versions until we reach the current one. From the current one we compile the
world.

That's the theory, and it's what we currently have in the most widely supported
architectures (`i386` and maybe some ARM flavour). Problems arise when you deal
with some new architecture, like the one we have to deal with: RISC-V.

RISC-V was invented recently, and the compilers did not add support for it
until some years ago. GCC added support for RISC-V in the 7.5 version, as we
have been discussing through this series, which needed a C++ compiler in order
to be built. That's a problem we almost solved in the previous steps,
backporting the RISC-V support to a GCC that only needed a C compiler to be
built.

Now, extra problems appear. Which C compiler are we going to use to build that
GCC 4.6.4 that has the RISC-V support we backported?

According to the process we described, we should use GCC 2.95, but it doesn't
support RISC-V so we would need to backport the RISC-V support to that one too.
That's not cool.

Another option would be to remove the GCC 2.95 from the equation and compile
the GCC 4.6.4 directly from TinyCC, if that's possible. Making the whole
process faster removing some dependencies. But this means TinyCC has to be able
to compile GCC 4.6.4. We are going to try to make this one, but that requires
some work we will describe today.

On the other hand, in order to be able to build all this for RISC-V, TinyCC and
MesCC have to be able to target RISC-V...

Too many conditions have to be true to all this to work. But hey! Let's go step
by step.

### RISC-V support in TinyCC

First, we have to make sure that TinyCC has RISC-V support, and it does. Since
not a long time ago, TinyCC is able to compile, assemble and link for RISC-V,
only for 64 bits.

I tested this support using a TinyCC cross-compiler and it works. If you want
to try it, I have a simple [Guix package][tcc-package] for the cross compiler,
and I also fixed the official Guix package for the native TinyCC, which have
been broken for long.

Still, I didn't test the RISC-V support natively, but if the cross-compiler
works, chances are the native will also work, so I'm not really worried about
this point.

[tcc-package]: https://github.com/ekaitz-zarraga/tcc/blob/guix_package/guix.scm


### GNU Mes compiling TinyCC

GNU Mes supports an old C standard that is simpler than the one TinyCC uses, so
it uses a fork of TinyCC with some C features removed. This fork was done way
before the RISC-V support was added to TinyCC and many things have changed
since then.

[We need to backport the TinyCC RISC-V support to Mes's own TinyCC fork,
then.](https://www.youtube.com/watch?v=-1qju6V1jLM) Or at least do something
about it.

When I first took a look into this issue, I thought it would be an easy fix, I
already backported GCC, which is orders of magnitude larger than TinyCC... But
it's not that easy. TinyCC's internal API changed quite a bit since the fork
was done, and I need to review all of it in order to make it work. Also, this
process includes the need to convert all the modern C that is not supported by
MesCC to the older C constructs that are available on it.

It's a lot of work, but it's doable to a certain degree, and this might suppose
a big step for the full source bootstrap process. Like what I did in GCC, it's
not going to solve everything, but it's a huge step in the right direction.


### GNU Mes supporting RISC-V

On the lower level part of the story, if we want to make all this process work
for RISC-V, GNU Mes itself should be runnable on it, and able to generate
binaries for it.

[There have been efforts][mes-riscv-effort] to make all this possible, and I
don't expect this support to take long to appear finally in GNU Mes. It's just
a matter of time and funding. I am aware that Jan is also interested on
spending time on this, so I think we are covered on this area.

[mes-riscv-effort]: https://lists.gnu.org/archive/html/bug-mes/2021-04/msg00031.html


### GCC compilation with TinyCC

The only point we are missing then is to be able to build the backported GCC
from TinyCC, without the intermediate GCC 2.95. This a tough one to test and
achieve, because the GCC compilation process is extremely complex, and we need
to make quite complex packages for this process to work.

On the other hand, the work I already did, packaging my backported GCC for guix
is not enough for several reasons: it was designed to work with a modern GCC
toolchain, and not with TinyCC; and a cross-compiler is not the same thing as a
native one.

GCC is normally compiled in stages, which are called *bootstrap* by the GCC
build system. I described a little bit of that process [in a footnote in
past][staged]. That process is not activated in a cross-compilation
environment, which is what I used when the backend I backported was
<del>back</del>tested. If the *bootstrap* process doesn't work, it means the
compilation process fails, so this introduces possible errors in the build
system which we were avoiding thanks to the cross-compilation trick.

[staged]: https://ekaitz.elenq.tech/bootstrapGcc3.html#fn:staged

I did this on purpose, of course. I just wanted a simple working environment
which was letting me test the backported RISC-V backend of the compiler, but
now we need to make a proper package for GCC 4.6.4, and make it work for
TinyCC.

I wouldn't mention this if I didn't try it and failed making this package. It's
not specially difficult to make a package, or it doesn't look like, until you
get errors like:

``` weird-error-lol
configure: error: C compiler cannot create executables

`¯\_(ツ)_/¯`
```

That being said, this is not only a packaging issue. As we already mentioned,
we are removing GCC 2.95 from the pipeline, so TinyCC has to be able to deal
with the GCC 4.6.4 codebase directly, including the backport I did.

The easiest way to test this is to compile GCC 4.6.4 for x86_64 in my machine,
with no emulation in between, so we can find the things TinyCC can't deal with.
Later we would be able to test this further in an emulated environment or
directly in a RISC-V machine to make sure TinyCC can deal with the RISC-V
backend, but for a first review in the GCC core, using x86_64 can be enough.
It requires no weird setup, further than a working package... Ouch!

I'm not really good at this part and I'm not sure if anyone else is, but I
don't feel like spending time in trying to make this package cascade. I feel
like my time is better spent on fixing stuff, or, once the package cascade is
done, fixing the compatibility.

During the whole project, making Guix packages and figuring out build systems
is the part where more time was spent, and it's the one with the lowest success
rate. It feels like I wasted hours trying to make the build process work for
nothing.

The funny part of this is Guix is partially the one to blame here, not
conforming the FHS and having this weird way to handle inputs is what makes the
whole process really complex. Code has to be patched to find the libraries,
scripts must be patched too, binaries are hard to find... On the good side,
it's Guix that makes this work worth the effort, and also what makes this
process reproducible, once it's done, to let everyone enjoy it.


#### Wait, but didn't Mes use a TinyCC fork?

Oh yeah of course. What I forgot to mention is the step we just described,
making TinyCC able to compile the backported GCC 4.6.4, is not just as simple
as I mentioned. If we use upstream TinyCC to compile GCC, who is going to
compile that TinyCC? We already said MesCC is not able to do that directly.

We could build that TinyCC with the TinyCC fork Mes has or make the TinyCC fork
go directly for the GCC 4.6.4, but in any case there's an obvious task to
tackle: The RISC-V support must arrive the TinyCC fork before we can do
anything else. And that's where I want to focus.

### This is not only about RISC-V

I have to be clear with you: I mixed two problems together and I did that on
purpose.

On the one hand we have the RISC-V support related changes. And on the other
hand we have the changes on the compilation pipeline: the removal of GCC 2.95.

The second part is just a consequence of the first, but it's not only related
with the RISC-V world. Once we have our compilers ready, we are going to apply
the change for the whole thing. Removing a step is a really important task for
many reasons but one is the obvious at this point: having a really old compiler
like GCC 2.95 forces us to stay with the architectures it was able to target,
or makes us add them and maintain them ourselves. It's a huge flexibility
issue for the little gain it gives: GCC 4.6.4 is already compilable from a C90
compiler.

So, this is an important milestone, not only for my part of the job but also
for the whole GNU Mes and bootstrapping effort. Skipping GCC 2.95 has to be
done in every architecture, and the packaging effort of that is unavoidable.

### What I already did

While I was reviewing what it needed to be done, I started doing things here
and there, preparing the work and making sure I was understanding the context
better.

First, I realized I introduced some non-C90 constructs in the backport of GCC,
because I directly copied some code from 7.5 and I removed those. This is
important, because we need to be able to compile all this with TinyCC, and I
don't expect TinyCC to support modern constructs.


I packaged a TinyCC RISC-V cross compiler [for the upstream
project][tcc-package], and also for [the Mes fork][mes-tcc-package] even
thought the latter is not available yet for compilation: we need to backport
the backend in order to make it work. Still, it's important work, because it
lets me start the backport easily. I'll need to apply more changes on top of
it, for sure, but at the moment I have all I need to start coding the new
backend.

[mes-tcc-package]: https://github.com/ekaitz-zarraga/tcc/blob/riscv-mes/guix.scm

I spent countless hours trying to make a proper GCC package and trying to use
TinyCC as the C compiler for it with no success. This is why I decided to move
on and work in a more interesting and usable part: adding the RISC-V backend to
the Mes fork of TinyCC.

Of course, I already started working on the RISC-V support of the TinyCC fork
from Mes, and started encountering API mismatches here and there. Most of them
related with some optimizations introduced after the fork, that I need to
review in more detail in the upcoming weeks. I also spent some time trying to
understand how TinyCC works, and it's a very interesting approach I have to
say[^maybe].

[^maybe]: Maybe I'll have the time to explain it in a future blog post, maybe
  not.


### Conclusions

I'd love to tackle all these problems together and fix the whole system, but
I'm just one guy coding from his couch. It's not realistic to think I can fix
everything, and trying to do so is detrimental to my mental health.

So I decided to go for the RISC-V support for the TinyCC fork we have at Mes.
This would leave all the ingredients ready for someone more experienced than me
to make the final recipe.

The same thing happened with the GCC backport. I didn't really finish the job:
there's no C++ compiler working yet, but that's not what matters. Anyone can
take what I did, package it properly, which it happened to be an impossible
task for me, and make it be ready. We already made a huge step.

Fighting against a wall is bad for everyone, it's better to pick a task where
you can provide something. You feel better, and the overall state of the
project is improved. Achieving things is the best gasoline you can get for
achieving new things.

Regarding the task I chose, I've already spent some hours working on it. It's
not an easy task. The internal TinyCC API changed a lot since the moment the
fork was done, and there are many commits related with RISC-V since then. One
of the most recent one fixes the RISC-V assembler after I reported it wasn't
working, few weeks ago. All these changes must be reviewed carefully, undoing
the API changes and also, most importantly, keeping the code compatible with
GNU Mes's C compiler.

Not an easy task.