Title: GNU Mes and garbage collection
Date: 2025-09-17
Category:
Tags: GNU Mes interpreter speedup
Slug: fasterMes2
Lang: en
Summary: How does the GNU Mes garbage collector work and how to improve it.

In our previous blogpost in the series we described how GNU Mes' scheme
interpreter handles evaluation and how fast it was. There were some details
that weren't really obvious, though: why does it really do all that magic to
emulate function calls? Is it just to make it look like scheme?

In this blog post we'll explain how does the garbage collector work in GNU Mes,
try to take a new look to the `eval_apply` function from the perspective of the
garbage collector, and propose an improvement that would enable a different
style of programming.

### GNU Mes' garbage collection

GNU Mes' scheme interpreter has a very simple approach to memory management. It
allocates a big chunk of memory when the interpreter is run and that is used
for every scheme value. All scheme values are represented with the same
`struct`, called `struct scm`, and every `make_`whatever function returns a
pointer to an `struct scm` bump-allocated in that big chunk of memory which
we'll call *arena*.

The bump-allocation is very easy to do, and doesn't require much computation.
It's just in increase in the pointer to the first free element in the *arena*.

If we just had this, our scheme interpreter would just eat memory and never
release or reuse the space we allocated for the scheme values we are not using
anymore. This is a possible approach, really, but GNU Mes does provide garbage
collection so the memory is reused and we don't run out of our precious
resource.

Mes uses a stop-the-world copying garbage collector. Similar to the one
described in the "Structure and Interpretation of Computer Programs" (aka SICP)
book or the one described in [This Wikipedia entry about Cheney's garbage
collection algorithm based on a paper from 1970][cheney-wiki]. It starts in few
scheme values and copies them from the current arena to some other piece of
memory, one after the other. It also check their internal values like elements
in a list, a vector or so on, and copies those too, adjusting the pointers that
refer to them. When a scheme value is copied, its memory representation is
tagged with a *broken heart* (this appears in SICP), something that identifies
that value as already processed, and the pointer to the new location replaces
its value. Now, it can check if an object that is being copied points to a
value that has been already copied, and in that case, it only needs to update
the pointer of its reference.

[cheney-wiki]: https://en.wikipedia.org/wiki/Cheney%27s_algorithm

Once all the objects and its references have been processed, what we have is a
new memory space that contains only the objects in use, one after the other,
with no empty spaces in between. This is why this process is called also a
compacting garbage collector.

In Mes, once the new piece of memory is set up, it is moved back to the
original position, element by element (remember all elements are the same
size). In a process that in Mes is called `flip`. The goal of the flip is to
keep the memory footprint low, only requiring a maximum memory of the twice of
the current memory usage (and oftentimes way less) which, potentially, is
smaller than having two fixed semi-spaces as this kind of garbage collectors
often have.

But now two things are left to explain: when is this whole process run and how
it is kickstarted.

The garbage collector is run during the interpreter execution. When the
interpreter's programmer decides to (by adding a call to the garbage collection
function, `gc_check`). This function will check if the memory usage is large
enough so the garbage collecting is worth the effort and, in affirmative case,
then run the garbage collector process.

The first thing the garbage collection process does is copy a group of objects
that will kickstart the copying of the rest of the *live* objects, meaning
those objects that the running scheme program still has access to. Those
objects are called *roots*, and they are fundamental to understand how this
whole thing can be improved. The roots, more than being a great hip hop band,
are easy to deduce what they can be in the interpreter: the current
*environment*, the *stack*, the known symbols, and such.

Currently in Mes the roots are fixed, meaning they are hardcoded in the code,
like so:

``` clike
  struct scm *new_cell_nil = g_free;
  struct scm *s;
  for (s = cell_nil; s < g_symbol_max; s = s + M2_CELL_SIZE)
    gc_copy (s);

  g_symbols = gc_copy (g_symbols);
  g_macros = gc_copy (g_macros);
  g_ports = gc_copy (g_ports);
  scm_hash_table_type = gc_copy (scm_hash_table_type);
  scm_variable_type = gc_copy (scm_variable_type);
  M0 = gc_copy (M0);
  M1 = gc_copy (M1);

  long i;
  for (i = g_stack; i < STACK_SIZE; i = i + 1)
    copy_stack (i, gc_copy (g_stack_array[i]));

  gc_loop (new_cell_nil);
```

### The issues

This kind of copying garbage collectors have a really two very obvious
problems that affect the way we write our C code.

Consider this code:

``` clike
struct scm *val = make_number (10); // Makes a scheme value = 10
gc_check ();                        // May garbage collect
val->value;                         // What do we have there?
```

If we make a number like in the example and then garbage collect, the number
will be allocated in the *arena*, but during garbage collection it won't have
anything pointing to it so it would be cleaned from the memory. Interestingly
enough, it won't probably produce a segfault or any kind of error like that,
because it will try to access to the *arena*, and that's valid memory, but
it would still explode in very dirty ways afterwards, maybe after some other
garbage collection loops. Nasty stuff.

Also, even if you keep the variable from being collected, it would be moved by
the garbage collector, which will give you a very low chance of the value
falling in the same spot, so your pointer would be outdated, and you wouldn't
have any way to know where the variable is now.

Do you want an easy solution?

First, register that `val` as another root in the garbage collector, so it's
not lost in each loop. The current garbage collector in Mes needs to hardcode
the roots, so if this example were just a local variable for a function, you'd
need to make it global for the garbage collector to find it. Of course, you
shouldn't do this for every single variable in your code, but if you make sure
you only need a couple of temporaries you could call them `R1` and `R2` and use
those all the time for everything[^oh-yes].

[^oh-yes]: Oh! Yes! I know what you are thinking. Trust me.

That would leave you with the moving problem. After a garbage collector run,
your variables `R1` and `R2` could potentially point to a memory location where
they are not anymore, because of the copying.

At this point if you were really inspired, you could implement a stack of those
lets call them "*registers*" (`R1`, `R2`...), register it as a root, and there
you would have emulated a function call system that respects your garbage
collector. Oh! *But isn't that what we described in the previous blog post?*

Implementing a stack that knows where to push and where to pop (those global
registers) would make every single piece of your code keep your pointers to the
variables coherent in the C world. You would just need to refer to `R1` or
whatever, and it would always point to the proper memory location after the pop
is run.

But of course this is a little bit of a Henry Ford situation here: *"everybody
will have a car of the color they like as long as it is black"*. So you can
have all the variables you want as long as they are `R0`, `R1`...


### One solution

Consider this code:

``` clike
struct scm *
macro_expand_program (struct scm *program)
{
  struct scm *cursor = program;
  gc_preserve (&program);
  for (cursor = program; cursor != cell_nil; cursor = cursor->cdr)
    {
      gc_preserve (&cursor);
      cursor->car = macro_expand_expr (cursor->car); // May GC
      gc_release (1); // cursor
    }
  gc_release (1); // program
  return program;
}
```

In this piece of code we can see how `macro_expand_expr` could try to garbage
collect and make us lose references to our local variables. By calling
`gc_preserve` we register them as roots, and we can release them later, when we
don't need them anymore. We send the address to the pointer to the garbage
collector so it's able to fix the pointers when the copying is done.

This looks boring and boilerplatey, but we would only need it for those
variables that would cross the function barrier to functions that we know that
may garbage collect. The rest of the cases we can just use an ordinary C
variable without preservation.

The improvements I see here is we can call the variables whatever we want, use
as many as we want and actually use C function calls without all the `goto` and
`push_cc` ceremony we described in the previous article, which is also
extremely confusing to me.

Also, we could get rid of all the other global roots pushing them first and
never releasing them.

#### How is this thing implemented

I have to admit it's not the greatest solution in the world but it works well
enough if you are careful.

First, I added a new scheme value type for roots, that can store a reference to
an scheme value and the address of the pointer to update. It just covers the
simplest case.

The `gc_preserve` function adds a root to a global list of preserved roots. The
`gc_release` function removes the top `n` elements from the list of preserved
roots.

With that ready, the garbage collector only needs to handle that global list as
any other hardcoded root we mentioned before, but after the `gc_flip` the
pointers stored in the roots themselves are updated. This means the pointers in
our C code would be updated when we exit the garbage collection operation,
keeping our C pointers consistent with the memory.

So, that leaves us with this copy step:

``` diff
@@ -647,6 +695,8 @@ gc_ ()
     gc_copy (s);

   g_symbols = gc_copy (g_symbols);
+  if (g_preserved != NULL)
+    g_preserved = gc_copy (g_preserved);
   g_macros = gc_copy (g_macros);
   g_ports = gc_copy (g_ports);
   scm_hash_table_type = gc_copy (scm_hash_table_type);
```

And this flip step:

``` diff
@@ -464,6 +503,15 @@ gc_flip ()
   long dist = g_news - g_cells;
   g_free = g_free - dist;
   g_symbols = g_symbols - dist;
+  if (g_preserved != NULL)
+    {
+      g_preserved = g_preserved - dist;
+      /* Update the preserved pointers */
+      struct scm *s;
+      for (s = g_preserved; s != cell_nil; s = s->cdr)
+        *(s->car->ptr) = s->car->car;
+    }
+
   g_macros = g_macros - dist;
   g_ports = g_ports - dist;
   scm_hash_table_type = scm_hash_table_type - dist;
```

As my roots are copied and flipped the values they point at are also copied and
flipped. That way I only need to update the pointers using the new address of
the pointed objects.

And that's it.

The only thing missing is to use the `gc_preserve` and `gc_release` properly.
It's easy to misuse them and overrelease or overpreserve.

I'm writing a macro expander using this and happens to be somewhat comfortable,
at least way more comfortable than the goto and `push_cc` style. I was having a
very hard time with that one.

We'll see how it goes.

Cheers!