Title: GNU Mes and garbage collection Date: 2025-09-17 Category: Tags: GNU Mes interpreter speedup Slug: fasterMes2 Lang: en Summary: How does the GNU Mes garbage collector work and how to improve it. In our previous blogpost in the series we described how GNU Mes' scheme interpreter handles evaluation and how fast it was. There were some details that weren't really obvious, though: why does it really do all that magic to emulate function calls? Is it just to make it look like scheme? In this blog post we'll explain how does the garbage collector work in GNU Mes, try to take a new look to the `eval_apply` function from the perspective of the garbage collector, and propose an improvement that would enable a different style of programming. ### GNU Mes' garbage collection GNU Mes' scheme interpreter has a very simple approach to memory management. It allocates a big chunk of memory when the interpreter is run and that is used for every scheme value. All scheme values are represented with the same `struct`, called `struct scm`, and every `make_`whatever function returns a pointer to an `struct scm` bump-allocated in that big chunk of memory which we'll call *arena*. The bump-allocation is very easy to do, and doesn't require much computation. It's just in increase in the pointer to the first free element in the *arena*. If we just had this, our scheme interpreter would just eat memory and never release or reuse the space we allocated for the scheme values we are not using anymore. This is a possible approach, really, but GNU Mes does provide garbage collection so the memory is reused and we don't run out of our precious resource. Mes uses a stop-the-world copying garbage collector. Similar to the one described in the "Structure and Interpretation of Computer Programs" (aka SICP) book or the one described in [This Wikipedia entry about Cheney's garbage collection algorithm based on a paper from 1970][cheney-wiki]. It starts in few scheme values and copies them from the current arena to some other piece of memory, one after the other. It also check their internal values like elements in a list, a vector or so on, and copies those too, adjusting the pointers that refer to them. When a scheme value is copied, its memory representation is tagged with a *broken heart* (this appears in SICP), something that identifies that value as already processed, and the pointer to the new location replaces its value. Now, it can check if an object that is being copied points to a value that has been already copied, and in that case, it only needs to update the pointer of its reference. [cheney-wiki]: https://en.wikipedia.org/wiki/Cheney%27s_algorithm Once all the objects and its references have been processed, what we have is a new memory space that contains only the objects in use, one after the other, with no empty spaces in between. This is why this process is called also a compacting garbage collector. In Mes, once the new piece of memory is set up, it is moved back to the original position, element by element (remember all elements are the same size). In a process that in Mes is called `flip`. The goal of the flip is to keep the memory footprint low, only requiring a maximum memory of the twice of the current memory usage (and oftentimes way less) which, potentially, is smaller than having two fixed semi-spaces as this kind of garbage collectors often have. But now two things are left to explain: when is this whole process run and how it is kickstarted. The garbage collector is run during the interpreter execution. When the interpreter's programmer decides to (by adding a call to the garbage collection function, `gc_check`). This function will check if the memory usage is large enough so the garbage collecting is worth the effort and, in affirmative case, then run the garbage collector process. The first thing the garbage collection process does is copy a group of objects that will kickstart the copying of the rest of the *live* objects, meaning those objects that the running scheme program still has access to. Those objects are called *roots*, and they are fundamental to understand how this whole thing can be improved. The roots, more than being a great hip hop band, are easy to deduce what they can be in the interpreter: the current *environment*, the *stack*, the known symbols, and such. Currently in Mes the roots are fixed, meaning they are hardcoded in the code, like so: ``` clike struct scm *new_cell_nil = g_free; struct scm *s; for (s = cell_nil; s < g_symbol_max; s = s + M2_CELL_SIZE) gc_copy (s); g_symbols = gc_copy (g_symbols); g_macros = gc_copy (g_macros); g_ports = gc_copy (g_ports); scm_hash_table_type = gc_copy (scm_hash_table_type); scm_variable_type = gc_copy (scm_variable_type); M0 = gc_copy (M0); M1 = gc_copy (M1); long i; for (i = g_stack; i < STACK_SIZE; i = i + 1) copy_stack (i, gc_copy (g_stack_array[i])); gc_loop (new_cell_nil); ``` ### The issues This kind of copying garbage collectors have a really two very obvious problems that affect the way we write our C code. Consider this code: ``` clike struct scm *val = make_number (10); // Makes a scheme value = 10 gc_check (); // May garbage collect val->value; // What do we have there? ``` If we make a number like in the example and then garbage collect, the number will be allocated in the *arena*, but during garbage collection it won't have anything pointing to it so it would be cleaned from the memory. Interestingly enough, it won't probably produce a segfault or any kind of error like that, because it will try to access to the *arena*, and that's valid memory, but it would still explode in very dirty ways afterwards, maybe after some other garbage collection loops. Nasty stuff. Also, even if you keep the variable from being collected, it would be moved by the garbage collector, which will give you a very low chance of the value falling in the same spot, so your pointer would be outdated, and you wouldn't have any way to know where the variable is now. Do you want an easy solution? First, register that `val` as another root in the garbage collector, so it's not lost in each loop. The current garbage collector in Mes needs to hardcode the roots, so if this example were just a local variable for a function, you'd need to make it global for the garbage collector to find it. Of course, you shouldn't do this for every single variable in your code, but if you make sure you only need a couple of temporaries you could call them `R1` and `R2` and use those all the time for everything[^oh-yes]. [^oh-yes]: Oh! Yes! I know what you are thinking. Trust me. That would leave you with the moving problem. After a garbage collector run, your variables `R1` and `R2` could potentially point to a memory location where they are not anymore, because of the copying. At this point if you were really inspired, you could implement a stack of those lets call them "*registers*" (`R1`, `R2`...), register it as a root, and there you would have emulated a function call system that respects your garbage collector. Oh! *But isn't that what we described in the previous blog post?* Implementing a stack that knows where to push and where to pop (those global registers) would make every single piece of your code keep your pointers to the variables coherent in the C world. You would just need to refer to `R1` or whatever, and it would always point to the proper memory location after the pop is run. But of course this is a little bit of a Henry Ford situation here: *"everybody will have a car of the color they like as long as it is black"*. So you can have all the variables you want as long as they are `R0`, `R1`... ### One solution Consider this code: ``` clike struct scm * macro_expand_program (struct scm *program) { struct scm *cursor = program; gc_preserve (&program); for (cursor = program; cursor != cell_nil; cursor = cursor->cdr) { gc_preserve (&cursor); cursor->car = macro_expand_expr (cursor->car); // May GC gc_release (1); // cursor } gc_release (1); // program return program; } ``` In this piece of code we can see how `macro_expand_expr` could try to garbage collect and make us lose references to our local variables. By calling `gc_preserve` we register them as roots, and we can release them later, when we don't need them anymore. We send the address to the pointer to the garbage collector so it's able to fix the pointers when the copying is done. This looks boring and boilerplatey, but we would only need it for those variables that would cross the function barrier to functions that we know that may garbage collect. The rest of the cases we can just use an ordinary C variable without preservation. The improvements I see here is we can call the variables whatever we want, use as many as we want and actually use C function calls without all the `goto` and `push_cc` ceremony we described in the previous article, which is also extremely confusing to me. Also, we could get rid of all the other global roots pushing them first and never releasing them. #### How is this thing implemented I have to admit it's not the greatest solution in the world but it works well enough if you are careful. First, I added a new scheme value type for roots, that can store a reference to an scheme value and the address of the pointer to update. It just covers the simplest case. The `gc_preserve` function adds a root to a global list of preserved roots. The `gc_release` function removes the top `n` elements from the list of preserved roots. With that ready, the garbage collector only needs to handle that global list as any other hardcoded root we mentioned before, but after the `gc_flip` the pointers stored in the roots themselves are updated. This means the pointers in our C code would be updated when we exit the garbage collection operation, keeping our C pointers consistent with the memory. So, that leaves us with this copy step: ``` diff @@ -647,6 +695,8 @@ gc_ () gc_copy (s); g_symbols = gc_copy (g_symbols); + if (g_preserved != NULL) + g_preserved = gc_copy (g_preserved); g_macros = gc_copy (g_macros); g_ports = gc_copy (g_ports); scm_hash_table_type = gc_copy (scm_hash_table_type); ``` And this flip step: ``` diff @@ -464,6 +503,15 @@ gc_flip () long dist = g_news - g_cells; g_free = g_free - dist; g_symbols = g_symbols - dist; + if (g_preserved != NULL) + { + g_preserved = g_preserved - dist; + /* Update the preserved pointers */ + struct scm *s; + for (s = g_preserved; s != cell_nil; s = s->cdr) + *(s->car->ptr) = s->car->car; + } + g_macros = g_macros - dist; g_ports = g_ports - dist; scm_hash_table_type = scm_hash_table_type - dist; ``` As my roots are copied and flipped the values they point at are also copied and flipped. That way I only need to update the pointers using the new address of the pointed objects. And that's it. The only thing missing is to use the `gc_preserve` and `gc_release` properly. It's easy to misuse them and overrelease or overpreserve. I'm writing a macro expander using this and happens to be somewhat comfortable, at least way more comfortable than the goto and `push_cc` style. I was having a very hard time with that one. We'll see how it goes. Cheers!