summaryrefslogtreecommitdiff
path: root/content/bootstrapGcc/13_tcc_to_gcc.md
blob: 0d3bcde19172766dc54b916e6bdf40242812a0b5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
Title: TinyCC to GCC gap is slowly closing
Date: 2024-05-02
Category:
Tags: Bootstrapping GCC in RISC-V
Slug: bootstrapGcc13
Lang: en
Summary: The sidetrack we took in the past started to give us some good news.
    Here there are some.

In [previous episodes we talked about getting
sidetracked](/bootstrapGcc11.html) and we mentioned we needed to build Musl
because we had limitations in our standard library. We didn't explain them in
detail and I think it's the moment to do so, as many of the changes we proposed
there have been tested and upstreamed, and explain the ramifications that
process had.

#### Symptoms

TinyCC and our MeslibC are powerful enough to build Binutils. But not enough to
make some of the programs, like GNU As, work.

MeslibC is supersimple, meaning it doesn't really implement some of the things
you might consider obvious. One of the best examples is `fopen`. Instead of
returning a fresh `FILE` structure, in MeslibC `fopen` simply returns the
underlying file descriptor, as returned by the kernel's `open` call. This is
not a big problem, as the `fread` and `fclose` provided with MeslibC are
compatible with this behaviour, but there's a very specific case where this is
a problem. In GNU As, if no file is given as an input, it just tries to read
from standard input, and it fails, saying there was no valid file descriptor.
Why? Let's read the code GNU As uses to read files (`gas/input-file.c`):

``` clike
/* Open the specified file, "" means stdin.  Filename must not be null.  */

void
input_file_open (const char *filename,
		 int pre)
{
  int c;
  char buf[80];

  preprocess = pre;

  gas_assert (filename != 0);	/* Filename may not be NULL.  */
  if (filename[0])
    {
      f_in = fopen (filename, FOPEN_RT);
      file_name = filename;
    }
  else
    {
      /* Use stdin for the input file.  */
      f_in = stdin;
      /* For error messages.  */
      file_name = _("{standard input}");
    }

  if (f_in == NULL)
    {
      as_bad (_("can't open %s for reading: %s"),
	      file_name, xstrerror (errno));
      return;
    }

  c = getc (f_in);
  /* ... Continues ...*/
```

If MeslibC uses file descriptor integers as `FILE` structures, it's not hard to
detect the problem in the example. For the cases where the selected filename is
empty (no file to read from) `filename[0]` will be false (`\0` character), and
`f_in` will be set to `stdin`. That should normally mean some `FILE` structure
with an internal file descriptor of value `0`, the one corresponding to the
standard input. As the structure is not `NULL` the error message below won't
trigger. As I just explained, MeslibC uses kernel's file descriptors instead of
`FILE` structures so `stdin` in MeslibC is just `0`, which is equal to `NULL`
for the compiler, so the error message is triggered and the execution stops.

MeslibC's clever solution for filenames is simply failing due to the fact that
C has no error types, and errors are signalled in the standard library using
`NULL`.

This is just a simple case to exemplify how MeslibC affects our bootstrapping
chain, but there are others. For example, MeslibC can't `ungetc` more than once
because that was enough for the bootstrapping as it was designed for x86, but
as we moved to a more recent binutils version (the first one supporting
RISC-V), that became an obstacle, and it's preventing us from running GNU As.

Of course, all of these problems could be fixed in MeslibC, but in the end the
goal of MeslibC is not to be a proper C standard library implementation, but a
helper for the bootstrapping of more powerful standard libraries that already
exist. These problems, and some others we also found, are just drawing the line
of *when* do we need to jump to a more mature C standard library in our chain.
Looks like binutils is where that line is drawn.

#### Musl

The bootstrapping chain as conceived in Guix uses GLibC, as Guix is a GNU
project, but we found Musl to be a more suitable C standard library for these
initial steps as it is simple an easy to build while keeping all the
functionality you might expect from a proper C standard library.

We ran into some issues though, as upstream TinyCC's RISC-V backend wasn't
ready to build it.

First of all, TinyCC's RISC-V backend had no support for Extended ASM, so I
implemented it and sent it upstream.

Once I did that we built Musl and we realized we had issues in some functions.
The problem was the Extended Asm implementation was not understanding the
constraints properly and those parameters marked as read and write were not
considered correctly. I talked with Michael, the author of that piece of code,
because I didn't understand the behaviour well. He guided me a little and I
proceeded to fix it in all architectures.

Still, we couldn't build Musl because it was using some atomic instructions
that were not implemented in TinyCC's RISC-V assembler and we decided to avoid
them, patching around them in Musl. They happened to be important for memory
allocation (LOL) so I decided to implement them in TinyCC's assembler and push
the changes upstream. I implemented `lr` (load reserved), `sc` (store
conditional) and extended `fence`'s behaviour to match what the GNU Assembler
(the reference RISC-V assembler) would do.

Still this wasn't enough for Musl to build properly as TinyCC's RISC-V backed
was not implemented as a proper assembly but as instructions in human readable
text. RISC-V is a RISC architecture and makes a heavy use of pseudoinstructions
to ease the development of assembly programs. Before all this work, TinyCC
only implemented simple instructions and almost no pseudoinstruction expansion.

Also, its architecture couples argument parsing with relocation generation and
it doesn't really help to implement pseudoinstructions with variable argument
count or default values. I added enough code to avoid falling in the problems
this design decision had and pushed everything upstream. The list includes
support for many pseudoinstructions, proper relocation use for several
instruction families like `jal` and branches, and some other things. In the
end, we do not have a fully featured assembler yet, but we do have enough to
build the simple code we find in a C standard library like Musl. In fact, even
using the syntax that any RISC-V assembler would expect, as I explained in
more detail [here](/bootstrapGcc11.html).

#### Meslibc

Once all those changes are finally applied to TinyCC, we can remove the weird
split we needed to do in MeslibC to support make it match the TinyCC assembly
syntax, so I did that. Less code, less problems.

Also my colleague Andrius added a `realpath` stub, to make us able to build
upstream TinyCC without having to patch the places where `realpath` was used in
it. `realpath` is not a simple function to implement, and it's not critical in
TinyCC. Again MeslibC doesn't need to be perfect, only let us start building
everything.

#### TinyCC

With all those changes coming to MeslibC and the ones we upstreamed, we now
don't need to patch on top of upstream TinyCC, so all our small changes on top
of it are dropped now. Less code, less problems.

We could have kept these changes for ourselves, but sharing them is not only
easier, but also better for everyone. The following is the complete list of
changes I upstreamed to TinyCC, a project that we are not really part of, but
this is what we do and what we believe in.

* `0aca8611` fixup! riscv: Implement large addend for global address
* `8baadb3b` riscv: asm: implement `j offset`
* `15977630` riscv: asm: Add branch to label
* `671d03f9` riscv: Add full `fence` instruction support
* `c9940681` riscv: asm: Add load-reserved and store-conditional
* `0703df1a` Fix Extended Asm ignored constraints
* `6b3cfdd0` riscv: Add extended assembly support
* `e02eec6b` riscv: fix jal: fix reloc and parsing
* `02391334` fixup! riscv: Add .option assembly directive (unimp)
* `cbe70fa6` riscv: Add .option assembly directive (unimp)
* `618c1734` riscv: libtcc1.c support some builtins for \_\_riscv
* `3782da8d` riscv: Support $ in identifiers in extended asm.
* `e2d8eb3d` riscv: jal: Add pseudo instruction support
* `409007c9` riscv: jalr: implement pseudo and parse like GAS
* `8bfef6ab` riscv: Add pseudoinstructions
* `8cbbd2b8` riscv: Use GAS syntax for loads/stores:
* `019d10fc` riscv: Move operand parsing to a separate function
* `7bc0cb5b` riscv: Implement large addend for global address

#### Bootstrappable TinyCC

During the bootstrapping process we detected new issues and one of them was so
deep it took pretty long to detect and solve.

Most of the programs we were building with our Bootstrappable TinyCC worked:
GZip, Make... But we reached a point were we needed to rebuild upstream TinyCC
with Musl, in order to start using Musl to build the next programs. It didn't
work.

We had a really hard time finding the problem behind this because it appeared
too far in the chain to be easy. The process goes like this.

We use Mes to build our very first Bootstrappable TinyCC, which compiles itself
several times (6), until it reaches its final state. That then builds upstream
TinyCC and with that we build TinyCC again this time using Musl as its standard
library. We found this last one was unable to build simple files and we started
digging.

We realized TinyCC was using sign extension in `unsigned` values, and that was
messing up with the next TinyCC, making it unable to build programs correctly.
Researching this deeply we found the problem was in the `load` function of
TinyCC but a TinyCC built with GCC didn't have this problem. The only option
was that the Bootstrappable TinyCC had the bug that was later affecting the
compilers compiled with it.

Digging a little bit further I found the casts from Bootstrappable TinyCC had
some missing cases that I didn't backport properly but as I wasn't able to
understand them very well I decided to backport the full `gen_cast` function
from upstream to the Bootstrappable TinyCC. With that, the errors from TinyCC
were gone.

It feels like an accidental trusting trust attack, yes. This is the kind of
things we have to deal with, and they are pretty tiring and frustrating to
find.

#### The new Bootstrapping chain

So, all of this brings us to the new bootstrapping chain. We need to make
things very different to the way Guix does them right now, because we are
skipping many steps (GCC 2.95, now we need Musl for Binutils...) so I started
[a project](https://github.com/ekaitz-zarraga/commencement.scm) to track how we
go forward in the bootstrapping chain (it's just a wip, for our tests, take
that in account).

We had good and bad news in that regard. At the moment of writing we managed to
build up to the GCC 4.6.4 I added RISC-V support to, but the compiler is faulty
and it's unable to build itself again with the C++ support.

I'm using non-bootstrapped versions of `flex` and `bison`, but those
shouldn't be hard to bootstrap either. I just didn't have the time to make them
from scratch. And I'm using a `bash` instead of `gash` because we had found a
blocking error in `gash` that is not letting us continue forward from Binutils.

In any case, this means we are near from the next milestone: building GCC 4.6.4
with TinyCC; and as we described in the previous post we already built GCC 7.5
from GCC 4.6.4 so we solved the next already.

After those, we would need to clean this new bootstrapping chain and talk with
Guix for its inclusion in there. I hope we can finish all this before hitting
the deadline that is silently approaching...