1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
|
Title: Milestone — MesCC builds TinyCC and fun C errors for everyone
Date: 2023-10-30
Category:
Tags: Bootstrapping GCC in RISC-V
Slug: bootstrapGcc8
Lang: en
Summary:
We spent the last months making MesCC able to compile TinyCC and making the
result of that compilation able to compile TinyCC. Many cool problems
appeared, this is the summary of our work.
It's been a while since the latest technical update in the project and I am
fully aware that you were missing it so it's time to recap with a really cool
announcement:
<span style="font-size: larger">
**We finally made a self-hosted Bootstrappable TinyCC in RISC-V**
</span>
Most of you probably remember I [already backported](bootstrapGcc6.html) the
Bootstrappable TinyCC compiler, but I didn't test it in a proper environment.
Now, we can confidently say it is able to compile itself, a "large" program
that makes use of more complex C features than I did in the tests.
All this work was done by Andrius Štikonas and myself. Janneke helped us a lot
with Mes related parts, too. The work this time was pretty hard, honestly. Most
of the things we did here are not obvious, even for C programmers.
I'm not used to this kind of quirks of the C language. Most of them are really
specific, related with the standards and many others are just things were
missing. I hope the ones I chose to discuss here help you understand your
computing better, as they did to me.
This is going to be veery long post, so take a ToC to help you out:
1. [Context](#context)
1. [Why is this important?](#why-important)
2. [Problems fixed](#problems)
1. [TinyCC misses assembly instructions needed for MesLibC](#tinycc-missing-instructions)
2. [TinyCC's assembly syntax is weird](#tcc-assembly)
3. [TinyCC does not support Extended Asm in RV64](#extended-assembly)
4. [MesLibC `main` function arguments are not set properly](#main-args)
5. [TinyCC says `__global_pointer$` is not a valid symbol](#dollars)
6. [Bootstrappable TinyCC's casting issues](#tcc-casting-issues)
7. [Bootstrappable TinyCC's `long double` support was missing](#long-double)
8. [MesCC struct initialization issues](#mescc-struct-init)
9. [MesCC vs TinyCC size problems](#size-problems)
10. [MesCC add support for signed shift operation](#mes-signed-shift)
11. [MesCC switch/case falls-back to default case](#broken-case)
12. [Boostrappable TinyCC problems with GOT](#got)
13. [Bootstrappable TinyCC generates wrong assembly in conditionals](#wrong-conditionals)
14. [Support for variable length arguments](#varargs)
15. [MesLibC use `signed char` for `int8_t`](#int8)
16. [MesLibC Implement `setjmp` and `longjmp`](#jmp)
17. [More](#more)
3. [Reproducing what we did](#reproducing)
1. [Using live-bootstrap](#live-bootstrap)
1. [Using Guix](#guix)
4. [Conclusions](#conclusions)
5. [What is next?](#next)
### Context {#context}
You have many blogposts in the series to find the some context about the
project, and even a FOSDEM talk about it, but they all give a very broad
explanation, so let's focus on what we are doing right now.
Here we have Mes, a Scheme interpreter, that runs MesCC, a C compiler, that is
compiling our simplified fork of TinyCC, let's call that Bootstrappable TinyCC.
That Bootstrappable TinyCC compiler then tries to compile its own code. It
compiles it's own code because it's goal is to add more flags in each
compilation, so it has more features in each round[^rounds]. We do all this
because TinyCC is way faster than MesCC and it's also more complex, but MesCC
is only able to build a simple TinyCC with few features enabled.
[^rounds]: There are many rounds. Like 7 or so.
During all this process we use a standard library provided by the Mes project,
we'll call it MesLibC, because we can't build glibc at this point, and TinyCC
does not provide it's own C standard library.
With all this well understood, this is the achievement:
**We made MesCC able to compile the Bootstrappable TinyCC, using MesLibC, to an
executable that is able to compile the Bootstrappable TinyCC's codebase to a
binary that works and has all the features we need enabled.**[^self-hosted]
[^self-hosted]: So it can compile itself again an again, but who would want to
do that?
The process affected all the pieces in the system. We added changes in MesCC,
MesLibC and the Bootstrappable TinyCC.
#### Why is this important? {#why-important}
We already talked long about the bootstrapping issue, the trusting trust attack
and all that. I won't repeat that here. What I'll do instead is to be specific.
This step is a big thing because this allows us to go way further in the chain.
All the steps before Mes were already ported to RISC-V mostly thanks to Andrius
Štikonas who worked in [Stage0-POSIX][stage0] and the rest of glue projects
that are needed to reach Mes.
[stage0]: https://github.com/oriansj/stage0-posix
Mes had been ported to RISC-V (64 bit) by W. J. van der Laan, and some patches
were added on top of it by Andrius Štikonas himself before our current effort
started.
At this moment in time, Mes was unable to build our bootstrappable TinyCC in
RISC-V, the next step in the process, and the bootstrappable TinyCC itself was
unable to build itself either. This was a very limiting point, because TinyCC
is the first "proper" C compiler in the chain.
When I say "proper" I mean fast and fully featured as a C compiler. In x86,
TinyCC is able to compile old versions of GCC. If we manage to port it to
RISC-V we will eventually be able to build GCC with it and with that the world.
In summary, TinyCC is a key step in the bootstrapping chain.
### Problems fixed {#problems}
This work can be easily followed in the commits in my TCC fork's
[`riscv-mes`][tcc] branch, and in my Mes clone's [`riscv-tcc-boot`][mes]
branch. We are also identifying the contents of this blogpost in the git
history by adding the git tag `self-hosted-tcc-rv64` to both of my forks. We
will try to keep both for future reference.
In Mes the process might be a little bit harder to follow because we sent most
of the patches to Janneke and he merged them so when we were about to release
this post I continued from Janneke's branch to avoid divergences (I had some
problems with that before). In any case, the code is there and searching by
authors (Andrius and myself) would guide you to the changes we did.
[tcc]: https://github.com/ekaitz-zarraga/tcc/tree/riscv-mes
[mes]: https://github.com/ekaitz-zarraga/mes/tree/riscv-tcc-boot
Many commits have a long message you can go read there, but this post was born
to summarize the most interesting changes we did, and write them in a more
digestible way. Lets see if I manage to do that.
The following list is not ordered in any particular way, but we hope the
selection of problems we found is interesting for you. We found some errors
more, but these are the ones we consider more relevant.
#### TinyCC misses assembly instructions needed for MesLibC {#tinycc-missing-instructions}
TinyCC is not like GCC, TinyCC generates binary code directly, no assembly code
in between. TinyCC has a separate assembler that doesn't follow the path that C
code follows.
It works the same in all architectures, but we can take RISC-V as an example:
TinyCC has `riscv64-gen.c` which generates the binary files, but
`riscv64-asm.c` file parses assembly code and also generates binary. As you can
see, binary generation is somehow duplicated.
In the RISC-V case, the C part had support for mostly everything since my
backport, but the assembler did not support many instructions (which, by the
way are supported by the C part).
MesLibC's `crt1.c` is written in assembly code. Its goal is to prepare the
`main` function and call it. For that it needs to call `jalr` instruction and
others that were not supported by TinyCC, neither upstream nor our
bootstrappable fork.
These changes appear in several commits because I didn't really understood how
the TinyCC assembler worked, and some instructions need to use relocations
which I didn't know how to add. The following commit can show how it feels to
work on this, and shares how relocations are done:
[lla-commit]: https://github.com/ekaitz-zarraga/tcc/commit/1e597f3d239d9119d2ea4bb3ca29b587ea594dcc
There you can see we started to understand things in TinyCC, but some other
changes came after this.
A very important not here is upstream TinyCC does not have support for these
instructions yet so we need to patch upstream TinyCC when we use it, contribute
the changes or find other kind of solutions. Each solution has its downsides
and upsides, so we need to take a decision about this later.
#### TinyCC's assembly syntax is weird {#tcc-assembly}
Following with the previous fix, TinyCC does not support GNU-Assembler's syntax
in RISC-V. It uses a simplified assembly syntax instead.
When we would do:
``` asm
sd s1, 8(a0)
```
In TinyCC's assembly we have to do:
``` asm
sd a0, s1, 8
```
This requires changes in MesLibC, and it makes us create a separate folder for
TinyCC in MesLibC. See `lib/riscv64-mes-tcc/` and `lib/linux/riscv64-mes-tcc`
for more details.
#### TinyCC does not support Extended Asm in RV64 {#extended-assembly}
Way later in time we also found TinyCC does not support [Extended Asm][ext-asm]
in RV64. The functions that manage that are simply empty.
[ext-asm]: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
We spent some time until we realized what was going on in here for two reasons.
First, there are few cases of Extended Asm in the code we were compiling.
Second, it was failing silently.
Extended Asm is important because it lets you tell the compiler you are going
to touch some registers in the assembly block, so it can protect variables and
apply optimizations properly.
In our case, our assembly blocks were clobbering some variables that would have
been protected by the compiler if the Extended Asm support was implemented.
Andrius found all the places in MesLibC where Extended Asm was used and rewrote
the assembly code to keep variables safe in the cases it was needed.
The other option was to add Extended Asm support for TinyCC, but we would need
to add it in the Bootstrappable TinyCC and also upstream. This also means
understanding TinyCC codebase very well and making the changes without errors,
so we decided to simplify MesLibC, because that is easier to make right. We are
probably going to need to do this later on anyway, but we'll try to delay this
as much as possible.
#### MesLibC `main` function arguments are not set properly {#main-args}
Following the previous problem with assembly, we later found input arguments of
the `main` function, that come from the command line arguments, were not
properly set by our MesLibC. Andrius also took care of that in
[4f4a1174][main-ext] in Mes.
[main-ext]: https://github.com/ekaitz-zarraga/mes/commit/4f4a11745d1c7ed0995e9d31c7994abfb4a60b25
This error was easier to find than others because when we found issues with
this we already had a compiled TinyCC. So we just needed to fix simple things
around it.
#### TinyCC says `__global_pointer$` is not a valid symbol {#dollars}
This is a small issue that was a headache for a while, but it happened to be a
very simple issue.
In RISC-V there's a symbol, `__global_pointer$`, that is used for dynamic
linking, defined in the ABI. But TinyCC had issues to parse code around it and
it took us some time to realize it was the dollar sign (`$`) which was causing
the issues in this point.
TinyCC does not process dollars in identifiers unless you specifically set a
flag (`-fdollars-in-identifiers`) when running it. In the RISC-V case, that
flag must be always active because if it isn't the `__global_pointer$` can't be
processed.
We tried to set that flag in the command line but we had other issues in the
command line argument parsing (we found and fixed them later later) so we just
hardcoded it.
This issue is interesting because it's an extremely simple problem, but its
effect appears in weird ways and it's not always easy to know where the problem
is coming from.
#### Bootstrappable TinyCC's casting issues {#tcc-casting-issues}
This one was a really hard one to fix.
When running our Bootstrappable TinyCC to build MesLibC we found this error:
``` nothing
cannot cast from/to void
```
We managed to isolate a piece of C code that was able to replicate the
problem.[^reproducer]
``` clike
long cast_charp_to_long (char const *i)
{
return (long)i;
}
long cast_int_to_long (int i)
{
return (long)i;
}
long cast_voidp_to_long (void const *i)
{
return (long)i;
}
void main(int argc, char* argv[]){
return;
}
```
Compiling this file raised the same issue, but then I realized I could remove
two of the functions on the top and the error didn't happen. Adding one of
those functions back raised the error again.
I tried to change the order of the functions and the functions I chose to add,
and I could reproduce it: if there were two functions it failed but it could
build with only one.
Andrius found that the function type was not properly set in the RISC-V code
generation and its default value was `void`, so it only failed when it compiled
the second function.
Knowing that, we could take other architectures as a reference to fix this, and
so we did.
See [6fbd1785][tcc-casting-commit].
[tcc-casting-commit]: https://github.com/ekaitz-zarraga/tcc/commit/6fbd17852aa11a2d0bc047183efaca4ff57ab80c
[^reproducer]: This is how we managed to fix most of the problems in our code:
make a small reproducer we can test separately so we can inspect the
process and the result easily.
#### Bootstrappable TinyCC's `long double` support was missing {#long-double}
When I backported the RISC-V support to our Bootstrappable TinyCC I missed the
`long double` support and I didn't realize that because I never tested large
programs with it.
The C standard doesn't define a size for `long double` (it just says it has to
be at least as long as the `double`), but its size is normally set to 16 bytes.
All this is weird in RV64, because it doesn't have 16 byte size registers. It
needs some extra support.
Before we fixed this, the following code:
``` clike
long double f(int a){
return a;
}
```
Failed with:
``` nothing
riscv64-gen.c:449 (`assert(size == 4 || size == 8)`)
```
Because it was only expecting to use `double`s (8 bytes) or `float`s (4 bytes).
In upstream TinyCC there were some commits that added `long double` support
using, and I quote, a *mega hack*, so I just copied that support to our
Bootstrappable TinyCC.
See [a7f3da33456b][tcc-long-double].
[tcc-long-double]: https://github.com/ekaitz-zarraga/tcc/commit/a7f3da33456b4354e0cc79bb1e3f4c665937395b
After this commit, some extra problems appeared with some missing symbols. But
these errors were link-time problems, because TinyCC had the floating point
helper functions needed for RISC-V defined in `lib/lib-arm64.c`, because they
were reusing aarch64 code for them.
After this, we also compile and link `lib-arm64.c` and we have `long double`
support.
#### MesCC struct initialization issues {#mescc-struct-init}
This one was a lot of fun. Our Bootstrappable TinyCC exploded with random
issues: segfaults, weird branch decisions...
After tons of debugging Andrius found some values in `struct`s were not set
properly. As we don't really know TinyCC's codebase really well, that was hard
to follow and we couldn't really know where was the value coming from.
Andrius finally realized some `struct`s were not initialized properly. Consider
this example:
``` clike
typedef struct {
int one;
int two;
} Thing;
Thing a = {0};
```
That's supposed to initialize *all* fields in the `Thing` `struct` to `0`,
according to the C standard[^cppref].
As a first solution we set struct fields manually to `0`, to make sure they
were initialized properly. See [29ac0f40a7afb][tinycc-struct-0]
[tinycc-struct-0]: https://github.com/ekaitz-zarraga/tcc/commit/29ac0f40a7afba6a2d055df23a8ee2ee2098529e
After some debugging we found that the fields that were not explicitly set were
initialized to `22`. So I decided to go to MesCC and see if the struct
initialization was broken.
This was my first dive in MesCC's code, and I have to say it's really easy to
follow. It took me some time to read through it because I'm not that used to
`match`, but I managed to find the struct initialization code.
What I found in MesCC is there was a `22` hardcoded in the struct
initialization code, probably coming from some debug code that never was
removed. As no part of the x86 bootstrapping used that kind of initializations,
or nothing relied on them, the error went unnoticed.
I set that to `0`, as it should be, and continued with our life.
[^cppref]: You can see an explanation in the (1) case at
[cppreference.com](https://en.cppreference.com/w/c/language/struct_initialization)
#### MesCC vs TinyCC size problems {#size-problems}
The C standard does not set a size for integers. It only sets relative sizes:
`short` has to be shorter or equal to `int`, `int` has to be shorter or equal
to a `long`, and so on. If you platform wants, all the integers, including the
`char`s can have 8 bits, and that's ok for the C standard.
TinyCC's RISC-V backed was written under the assumption that `int` is 32 bit
wide. You can see this happening in `riscv64-gen.c`, for example, here:
``` clike
EI(0x13, 0, rr, rr, (int)pi << 20 >> 20); // addi RR, RR, lo(up(fc))
```
The bit shifting there is done to clear the upper 20 bits of the pi variable.
This code's behavior might be different from one platform to another. Taking
the example before, of that possible platform that only has 8 bit integers,
this code would send a `0` instead of the lower 12 bits of `pi`.
In our case, we had MesCC using the whole register width, 64bits, for temporary
values so the lowest `44` bits were left and the next assertion that checked
the immediate was less than 12 bits didn't pass.
This is a huge problem, as most of the code in the RISC-V generation is written
using this style.
There are other ways to do the same thing (`pi & 0xFFF` maybe?) in a more
portable way, but we don't know why upstream TinyCC decided to do it this way.
Probably they did because GCC (and TinyCC itself) use 32 bit integers, but they
didn't handle other possible cases, like the one we had here with MesCC.
In any case, this made us rethink MesCC, dig on how are its integers defined,
how to change this to be compatible with TinyCC and so on, but I finally
decided to add casts in the middle to make sure all this was compiled as
expected.
It was a good reason to make us re-think MesCC's integers, but it took a very
long time to deal with this, that could be better used in something else. Now,
we all became paranoids about integers and we still think some extra errors
will arise from them in the future. Integers are hard.
#### MesCC add support for signed shifting {#mes-signed-shift}
Integers were in our minds for long, as described in the previous block, but I
didn't talk about signedness in that one.
Following one of the crazy errors we had in TinyCC, I somehow realized (I don't
remember how!) that we were missing signed shifting support in MesCC. I think
that I found this while doing some research of the code MesCC was outputting
when I spotted some bit shifts done using unsigned instructions for signed
values and I started digging in MesCC to find out why. I finally realized that
there was no support for that and the shift operation wasn't selected
depending on the signedness of the value being shifted.
Let's see this with an example:
``` clike
signed char a = 0xF0;
unsigned char b = 0xF0;
// What is this? (Answer: 0xFF => 255)
a >> 4;
// And this? (Answer: 0x0F => 15)
b >> 4;
```
In the example you can see the shifting operation does not work the same way if
the value is signed or not. If you always use the unsigned version of the `>>`
operation, you don't have the results you expected. Signs are also hard.
In this case, like in many others, the fix was easier than realizing what was
going wrong. I just added support for the signed shifting operation, not only
for RISC-V but for all architectures, and I added the correct signedness check
to the shifting operation to select the correct instruction. The patch (see
[88f24ea8][signed-rotation] in Mes) is very clean and easy to read, because
MesCC's codebase is really well ordered.
> EDIT: Some person in the web noted I called the *bit-shift* operations
> *rotation* operations. I normally use both words interchangeably but it is
> true they don't mean the exact same thing. A shift is when the values are
> lost, and a rotation when they come from the other side of the register. I
> edited the article to use the correct word.
[signed-rotation]: https://github.com/ekaitz-zarraga/mes/commit/88f24ea8661dd279c2a919f8fbd5f601bb2509ae
#### MesCC switch/case falls-back to default case {#broken-case}
In the early bootstrap runs, our Bootstrappable TinyCC it did weird things.
After many debugging sessions we realized the `switch` statements in
`riscv64-gen.c`, more specifically in `gen_opil`, were broken. The fall-backs
in the `switch` were automatically directed to the `default` case. Weird!
MesCC has many tests so I read all that were related with the `switch`
statements and the ones that handled the fall-backs were all falling-back to
the `default` case, so our weird behavior wasn't tested.
I added the tests for our case and read the disassemble of simple examples when
I realized the problem.
Each of the `case` blocks has two parts: the clause that checks if the value
of the expression is the one of the case, and the body of the case itself.
The `switch` statement generation was doing some magic to deal with `case`
blocks, but it was failing to deal with complex fall-through schemes because
the clause of the target `case` block was always run, making the code fall to
the `default` case, as the clause was always false because the one that matched
was the one that made the fall-back.
There were some problems to fix this, as NyaCC (MesCC's C parser) returns
`case` blocks as nested when they don't have a `break` statement:
``` lisp
(case testA
(case testB
(case testC BODY)))
```
Instead of doing this, I decided to flatten the `case` blocks with empty
bodies. This way we can deal with the structure in a simpler way.
``` lisp
((case testA (expr-stmt))
(case testB (expr-stmt))
(case testC BODY))
```
Once this is done, I expanded each `case` block to a jump that jumps over the
clause, the clause and then its body. Doing this, the fall-back doesn't
re-evaluate the clause, as it doesn't need to. The generated code looks like
this in pseudocode:
``` assembly
;; This doesn't have the jump because it's the first
CASE1:
testA
CASE1_BODY:
...
goto CASE2_BODY
CASE2:
testB
CASE2_BODY:
...
goto CASE3_BODY
CASE3:
testB
CASE3_BODY:
...
```
If one of the `case`s has a `break`, it's treated as part of its body, and it
will end the execution of the `switch` statement normally, no fall-back.
This results in a simpler `case` block control. The previous approach dealt
with nested `case` blocks and tried to be clever about them, but
unsuccessfully. The best thing about this commit is most of the cleverness was
simply removed with a simple solution (flatten all the things!).
It wasn't that easy to implement, but I first built a simple prototype and
Janneke's scheme magic made my approach usable in production.
All this is added in Mes's codebase in several commits, as we needed some
iterations to make it right. [22cbf823582][cases] has the base of this commit,
but there were some iterations more in Mes.
[cases]: https://github.com/ekaitz-zarraga/mes/commit/22cbf823582e3699b6a21ee0cf74c2dbf0a6a4e9
#### Boostrappable TinyCC problems with GOT {#got}
The Global Offset Table is a table that helps with relocatable binaries. Our
Bootstrappable TinyCC segfaulted because it was generating an empty GOT.
Andrius debugged upstream TinyCC alongside ours and realized there was a
missing check in an `if` statement. He fixed it in
[f636cf3d4839d1ca][got-commit].
The problem with this kind of errors is TinyCC's codebase is really hard to
read. It's a very small compiler but it's not obvious to see how things are
done on it, so we had to spend many hours in debugging sessions that went
nowhere. If we had a compiler that is easier to read and change, it would be
way simpler to fix and we would have had a better experience with it.
[got-commit]: https://github.com/ekaitz-zarraga/tcc/commit/f636cf3d4839d1ca3f5af9c0ad9aef43a4bfccd9
#### Bootstrappable TinyCC generates wrong assembly in conditionals {#wrong-conditionals}
We spent a long time debugging a bug I introduced during the backport when I
tried to undo some optimization upstream TinyCC applied to comparison
operations.
Consider the following code:
``` clike
if ( x < 8 )
whatever();
else
whatever_else();
```
Our Bootstrappable TinyCC was unable to compile this code correctly, instead,
it outputted a code that always took the same branch, regardless of the value
in `x`.
In TinyCC, a conditional like `if (x < CONSTANT)` has a special treatment, and
it's converted to something like this pseudoassembly:
``` pseudo
load x to a0
load CONSTANT to a1
set a0 if less than a1
branch if a0 not equal 0 ; Meaning it's `set`
```
This behaviour uses the `a0` register as a flag, emulating what other CPUs
use for comparisons. RISC-V doesn't need that, but it's still done here
probably for compatibility with other architectures. In RISC-V it could look
like this:
``` pseudo
load x to a0
load CONSTANT to a1
branch if a0 less than a1
```
You can easily see the `branch` "instruction" does a different comparison in
one case versus the other. In the one in the top it checks if `a0` is set,
and in the other checks if `a0` is smaller than `a1`.
TinyCC handles this case in a very clever way (maybe too clever?). When they
emit the `set a0 if less than a1` instruction they replace the current
comparison operation with `not equal` and they remove the `CONSTANT` and
replace it with a `0`. That way, when the `branch` instruction is generated,
they insert the correct clause.
In my code I forgot to replace the comparison operator so the branch checked
`if a0 is less than 0` and it was always false, as the `set` operation writes
a `0` or a `1` and none of them is less than `0`.
The commit [5a0ef8d0628f719][branch-tcc] explains this in a more technical way,
using actual RISC-V instructions.
This was also a hard to fix, because TinyCC's variable names (`vtop->c.i`) are
really weird and they are used for many different purposes.
[branch-tcc]: https://github.com/ekaitz-zarraga/tcc/commit/5a0ef8d0628f719ebb01c952797a86a14051228c
#### Support for variable length arguments {#varargs}
In C you can define functions with variable argument length. In RISC-V, those
arguments are sent using registers while in other architectures are sent using
the stack. This means the RISC-V case is a little bit more complex to deal
with, and needs special treatment.
Andrius realized in our Bootsrappable TinyCC we had issues with variable length
arguments, specially in the most famous function that uses them: `printf`. He
also found that the problem came from the arguments not being properly set and
found the problem.
Reading upstream TinyCC we found they use a really weird system for the defines
that deal with this. They have a header file, `include/tccdefs.h`, which is
included in the codebase, but also processed by a tool that generates strings
that are later injected at execution time in TinyCC.
This was too much for us so we just extracted the simplest variable arguments
definitions for RISC-V and introduced that in MesLibC and our Bootstrappable
TinyCC.
##### Extra: files generated with no permissions
The bootstrappable TinyCC built using MesCC generated files with no permissions
and Andrius found that this problem came from the variable length argument
support definitions. So he fixed that, too[^stikonas].
The macro that defined `va_start` was broken pointer arithmetic. At the
beginning he thought it was related with MesCC's internals but he tested in GCC
later and realized the problem was in the macro definition. That's why
currently the commit says "workaround" in the name, but it's more than a
workaround: it's a proper fix. We are rewording that, but that would happen
after we release this post.
[^stikonas]: He is like that.
#### MesLibC use `signed char` for `int8_t` {#int8}
We already had a running Bootstrappable TinyCC compiled using MesCC when we
stumbled upon this issue. Somehow, when assembling:
``` asm
addi a0, a0, 9
```
The code was trying to read `9` as a register name, and failed to do it (of
course). It was weird to realize that the following code (in `riscv64-asm.c`)
was always using the true branch in the `if` statement, even if
`asm_parse_regvar` returned `-1`:
``` clike
int8_t reg;
...
if ((reg = asm_parse_regvar(tok)) != -1) {
...
} else ...
```
I disassembled and saw something like this:
``` pseudoassembly
call asm_parse_regvar ;; Returns value in a0
reg = a0
a0 = a0 + 1
branch if a0 equals 0
```
This looks ok, it does some magic with the `-1` but it makes sense anyway. The
problem is that it didn't branch because `a0` was `256` even when
`asm_parse_regvar` returned `-1`.
During some of the `int` related problems someone told me in the Fediverse that
`char`'s default signedness is not defined in the C standard. I read MesLibC
and, exactly: `int8_t` was defined as an alias to `char`.
In RISC-V `char` is by default `unsigned` (don't ask me why) but we are used to
x86 where it's `signed` by default. Only saying `char` is not portable.
Replacing:
``` clike
typedef char int8_t;
```
With:
``` clike
typedef signed char int8_t;
```
Fixed the issue.
From this you can learn several things:
1. Don't assume `char`'s signedness in C
2. If you design a programming language, be consistent with your decisions. In
C `int` is always `signed int`, but `char`'s don't act like that. Don't do
this.
#### MesLibC Implement `setjmp` and `longjmp` {#jmp}
Those that are not that versed in C, as I was before we found this issue, won't
know about `setjmp` and `longjmp` but they are, simplifying a lot, like a
`goto` you can use in any part of the code. `setjmp` needs a buffer and it
stores the state of the program on it, `longjmp` sets the status of the program
to the values on the buffer, so it jumps to the position stored in `setjmp`.
Both functions are part of the C standard library and they need specific
support for each architecture because they need to know which registers are
considered part of the state of the program. They need to know how to store the
program counter, the return address, and so on, and how to restore them.
In their simplest form they are a set of stores in the case of the `setjmp` and
a set of loads in the case of `longjmp`.
In RISC-V they only need to store the `s*` registers, as they are the ones that
are not treated as temporary. It's simple, but it needs to be done, which
wasn't in neither for GCC nor for RISC-V in MesLibC.
Andrius is not convinced with our commit in here, and I agree with his
concerns. We added the full `setjmp` and `longjmp` implementations directly
<del>stolen from</del> inspired in the ones in Musl[^stolen] but it has also
floating point register support, using instructions that are not implemented in
TinyCC yet. This is going to be a problem in the future because later
iterations will try to execute instructions they don't actually understand.
There are two (or three) possible solutions here. The first is to remove the
floating point instructions for now (another flavor for this solution is to
hide them under an `#ifdef`). The second is to implement the floating point
instructions in TinyCC's RISC-V assembler, which sounds great but forces us to
upstream the changes, and that process may take long and we'd need to patch it
in our bootstrapping scripts until it happens.
We just added the `#ifdef`s because our code is full of them anyway and sent it
to Mes: [0e2c5569][setjmp].
[setjmp]: https://github.com/ekaitz-zarraga/mes/commit/0e2c55697df285250c8a24442f169bc52d729c31
[^stolen]: Yo, if it's free software it's not stealing! Please steal my code.
Make it better.
#### More {#more}
Those are mostly the coolest errors we needed to deal with but we stumbled upon
a lot of errors more.
Before this effort started Andrius added support for 64 bit instructions in Mes
and fixed some issues 64bit architectures had in M2.
I found a [bug in Guix shell](https://issues.guix.gnu.org/65225) (it's still
open) and had to fix some ELF headers in MesCC generated files because objdump
and gdb refused to work on them.
Andrius also found issues with weak symbols in MesLibC that were triggered
because TCC didn't have support for them, thankfully upstream TCC had that
issue fixed and we just cherry-picked for the win.
He even had the energy to test all this in real RISC-V we specifically acquired
for this task.
There are many more things to tell, but this is already getting too long and if
I continue writing we'll probably end up fixing some stuff more.
In the end, a project like this is like hitting your head against a wall until
one of them breaks. Sometimes it feels like the head did, but it's all good.
#### Reproducing what we did {#reproducing}
All we did means nothing if you can't reproduce it. We provide two ways to
reproduce this process: live-bootstrap and Guix.
Both provide a similar thing but there are some differences from the
high-level that is worth mention now.
Comparing with `live-bootstrap`, using Guix helps because it reuses the
previous steps if they didn't change. This results in shorter waits once Mes is
sorted out.
On the other hand, I've have had issues with the failed builds in Guix (in
emulated systems). It was hard to jump inside the build container and play
around inside so the development cycle suffered a lot. In `live-bootstrap`, if
you are good with `bwrap` you can jump and tweak things with no issues.
For those who enjoy digging in the code and trying to follow the process I
recommend following `live-bootstrap`'s scripts. The directory structure is a
little bit confusing but the scripts are very plain and linear. The ones in the
Guix process come from previous bootstrap efforts and they are designed to do
many things automagically, that makes them a hard to follow.
##### Using live-bootstrap {#live-bootstrap}
Andrius is part of the `live-bootstrap` effort and he's doing all the scripting
there to keep the process reproducible.
[Live-bootstrap](https://github.com/fosslinux/live-bootstrap) is...
> An attempt to provide a reproducible, automatic, complete end-to-end
> bootstrap from a minimal number of binary seeds to a supported fully
> functioning operating system.
That's the official description of the project. From a more practical
perspective, it's a set of scripts that build the whole operating system from
scratch, depending on few binary seeds.
That's not very different to what Guix provides from a bootstrapping
perspective. Guix is "just" an environment where you can run "scripts" (the
packages define how they are built) in a reproducible way. Of course, Guix is
way more than that, but if we focus on what we are doing right now it acts like
the exact same thing.
> NOTE: `live-bootstrap`'s project description is a little bit outdated. If you
> read the comparison with Guix, what you'd read is old information. If you
> want to read a more up-to-date information about Guix's bootstrapping process
> I suggest you to read this page of Guix manual:
> <https://guix.gnu.org/manual/devel/en/html_node/Full_002dSource-Bootstrap.html>
Being very different projects, in a practical level, the main difference
between them is `live-bootstrap` is probably easier for you to test if you are
working on any GNU/Linux distribution[^in-guix].
[^in-guix]: If you run it in Guix or in a distribution that doesn't follow FHS
you'd probably need to touch the path of your Qemu installation or be
careful with the options you send to the `rootfs.py` script.
If you want to reproduce this exact point in time you only need to use my fork
of [live-bootstrap](https://github.com/ekaitz-zarraga/live-bootstrap/), branch
`riscv-tcc-boot`. I also made a tag on it, `self-hosted-tcc-rv64`, to make it
easier to remember when was this post released. Andrius made all the magic to
set that process to take all the inputs from Mes and TinyCC from the correct
tag.
Clone the repository, set up the dependencies and run this (if you are not in a
RISC-V host you need to configure Qemu and binfmt):
``` bash
./rootfs.py --bwrap --arch riscv64 --preserve
```
That should, after a long time, reach a point where there's a properly compiled
bootstrappable TinyCC.
#### Using Guix for a reproducible environment {#guix}
I made a Guix recipe that can replicate the whole process, too. It took me long
time to make it work but it finally does.
From my TCC fork reproducing this should be easy for the people versed in Guix.
There's a `guix` folder with some files, (most of them broken, not gonna lie)
but there are two you should pay attention to:
- `channels.scm` stores the state of my Guix checkout so you can reproduce it
in the future using `guix time-machine`. At the moment it doesn't feel
necessary but if something fails when you try it, please refer to that.
- `commencement.scm` is an edited copy of the Guix bootstrapping process,
directly obtained from `gnu/packages/commencement.scm` from Guix's codebase.
I patched this to make it work for RISC-V, using some more modern commits in
the dependencies.
In order to reproduce all our work in Guix you just need to build `tcc-boot0`
package from the `commencement.scm` file using `riscv64-linux` as your
`--system`. I'm a nice guy so I just added a command there you can use for
this, just run:
``` bash
./tcc-boot0-from-source.sh
```
And that should build the whole thing. It takes hours, you have been warned.
Also it adds `--no-grafts` (thanks Efraim), because if you keep the grafts it
compiles the world from scratch (curl, x11... not good).
If you just want to build `mes-boot` as an intermediate step, I also made a
file for that:
``` bash
./mes-boot-from-source.sh
```
The both scripts will load variables from the `commencement.scm` module
provided. The module is not complex if you are used to Guix, but it calls
some complex shell scripts in both Mes and TinyCC to build. Those contain all
the magic.
### Conclusions {#conclusions}
Of course, the problems we fixed now look easy and simple to fix. This blog
post doesn't really do justice to the countless debugging hours and all the
nights we, Andrius and I, spent thinking about where could the issues be
coming from.
The debugging setup wasn't as good as you might imagine. The early steps of the
bootstrap don't have all the debug symbols as a "normal" userspace program
would. In many cases, function names were all we had.
I have thank my colleague Andrius here because he did a really good debugging
job, and he provided me with small reproducers that I could finally fix. Most
of the times he made the assist and I scored the goal.
He also did a great job with the testing which I couldn't do because I was
struggling with Guix from the early days, trying to make the compilers find the
header files and libraries.
In the emotional part it is also a great improvement to have someone to rely
on. Andrius, Janneke and I had a good teamwork and we supported each other when
our faith started to crumble. And believe, it does crumble when a new bug
appears after you fixed one that you needed a week for. There were times this
summer I thought we would never reach this point.
It's also worth mention here that the bootstrapping process is extremely slow:
it takes hours. This kills the responsiveness and makes testing way harder than
it should be. Not to mention that we are working on a foreign architecture,
which has it's own problems too.
If you have to take some lesson from something like this, here you have a
suggestion list:
- The simplest error can take ages to debug if your code is crazy enough.
- Don't be clever. It sets a very high standard for your future self and people
who will read your code in the future.
- I guess we can summarize the previous two points in one: If we could remove
TinyCC from the chain, we would. It's a source of errors and it's hard to
debug. The codebase is really hard to read for no apparent reason.
- When build times are long, small reproducers help.
- Add tests for each new case you find.
- Don't trust, disassemble and debug.
- Be careful with C and standards and undefined behavior.
- Integers are hard. Signedness makes them harder.
- Being surrounded by the correct people makes your life easier.
Also, as a personal note I noticed I'm a better programmer since the previous
post in the this series. I feel way more comfortable with complex reasoning and
even writing new programs in other languages, even if I spent almost no time
coding anything from scratch. It's like dealing with this kind of issues about
the internals give you some level of awareness that is useful in a more general
way than it looks. Crazy stuff.
If you can, try to play with the internals of things from time to time. It
helps. At least it helped me.
### What is next? {#next}
Now we have a fully featured Bootstrappable TinyCC we need to decide what to do
next.
On the short term, all this has to be released in the original projects: Mes,
M2, and so on. That's the easy part, as everything has proved to be ready.
On the mid term, it's not very clear what to do first. We suspect we'll need
upstream TinyCC for the next steps, because we many different tools to
continue with the bootstrapping chain, and the bootstrappable TinyCC might not
be enough to build them. On the other hand, when we go for a standard library
we'll miss the extended assembly support we already mentioned. There's some
uncertainty in the next step.
The long-term is pretty much clear though, the goal is GCC. First GCC for C and
then for C++ to make it able build GCC 7.5 which should enable the rest of the
chain pretty easily (famous last words). I anticipate we are going to have
problems with GCC (I know this because I left them there last time) so we'll
need to fix those, too. Once that is done, we would use GCC to compile more
recent versions of GCC until we compile the world.
That's more or less the description of what we will do in the next months.
And this is pretty much it. I hope you learned something new about C, the
Bootstrapping process or at least had a good time reading this wall of text.
We'll try to work less for the next one, but we can't promise that. 😉
Take care.
---
<!--
MANY OF THIS ARE REALLY HARD TO REASON ABOUT!!!!
WITH THIS WE START PASSING MANY MORE TESTS IN MESCC AND ALSO ADDED SOME EXTRA
TESTS THAT CHECK COMPLEX BEHAVIOR HERE AND THERE
- `int`s are 64 bit in MesCC and TinyCC is written like they are 32 bit.
- TinyCC's assembly for RISC-V is not complete and we need some of that in
meslibc. We implemented the missing instructions (jal, jalr, lla and some
pseudoinstructions).
- TinyCC's assembler for RISC-V uses a simplified syntax, so we need to rewrite
our meslibc according to that.
- RISC-V uses a `__global_pointer$` symbol, but TinyCC does not allow dollars
in identifiers by default. The `-fdollars-in-identifiers` flag exploded when
used so we hardcoded the flag to true.
- We backported the `long double` support from TinyCC's `mob` branch.
- And large constant generation.
- Fixed some weird casting issues in TinyCC (see Fix casting issues (missing
func_vt in riscvgen.c)
- MesCC produced binaries that were impossible to debug with GDB and OBJDUMP
complained about them. We fixed those too (some archs are missing)
- MesCC's struct initialization to zeroes like `Whatever a = {0};` initialized
everything to `22` and is now working as expected.
- `switch/case` statements in MesCC fallback always to default because they
check the fallback clause and then jump to default.
- Mes had some incompatibilities with Guile that prevented us from running the
code fast. Fixed those.
- Added support for RISC-V instruction formats in MesCC
(https://git.savannah.gnu.org/cgit/mes.git/commit/?h=wip-riscv&id=e42cf58d14520a5360d7d527d1c2c18c0a498c28)
- Added support for signed rotation in MesCC. (all arches affected)
- And also fixed some M2 things that allow all this 64 bit support happen in
MesCC, which didn't have 64 bit support before. Stikonas?
- Stikonas also fixed problems in M2:
https://github.com/oriansj/M2-Planet/commit/85dd953b70c5f607769016bbf2a0aa3de7e41b6c
- Fix Bootstrappable TinyCC's GOT (global offset table). It was just a broken
condition in an if (stikonas dealt with that)
- Meslibc again! Tinycc does not support [extended
asm](https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html) in RV64 but
stikonas fixes it replacing the extended asm by abi-compatible handwired asm.
The good fix would be to implement it, but upstream doesn't have it either...
- `int size = 0; if (size < 8) size = 8;` does not work because TCC generated
wrong assembly and it jumps over the true branch even if it checks the
condition is ok. (reproducer in `C_TESTS/if.c`)
- Variable length arguments were broken in Bootstrappable TCC. Upstream TCC
does some string magic to support them (c2str) where the same header file is
used twice: one in the binary and one in runtime. That functionality was lost
in the ~translation~ backport. We had to push some defines to Meslibc that
support that.
- Meslibc had `typedef char int8_t` in `stdint.h` but that's not reliable,
because the C standard doesn't define the signedness of the `char`. In RISC-V
the signedness of the char is `unsigned` by default, so we have to be
explicit and say `signed char`, to avoid issues.
- Remove some 0bXXXX literals I introduced in the assembler to simplify
things... They happen not to be standard C but a GCC extension.
- Add a setjmp and longjmp implementation to meslibc that also support tinycc
assembler syntax. (copy from musl but with our syntax)
-->
|