Garbage collection in Julia

Introduction

Julia implements a non-relocatable, with support for partial simultaneous execution, parallel, generationally valid and mostly accurate collector with support for tagging and cleaning (users who want to call Julia from C are provided with an interface for conservative stack scanning).

Selection

There are two types of allocation tools in Julia, and the specific type used is determined by the size request for allocation. Objects up to 2 KB in size are allocated using the free memory list pool allocation tool for each thread, and objects larger than 2 KB are allocated via libc malloc.

The Julia pool allocation tool divides objects into classes of different sizes, so the memory page managed by the pool allocation tool (which covers 4 operating system pages on 64-bit platforms) contains only objects of the same size class. Each memory page from the allocation pool is paired with certain page metadata stored in lock-free lists for each thread. The page metadata contains information such as the presence of active objects on the page, the number of free slots, and offsets for the first and last objects in the free memory list contained on this page. This metadata is used to optimize the collection stage: for example, a page with no active object on it can be returned to the operating system without having to check it.

While a page with no objects on it may be returned to the operating system, the associated metadata is allocated on an ongoing basis and may continue to exist longer than a given page. As mentioned above, the metadata for the allocated pages is stored in lock-free lists for each thread. However, metadata for free memory pages can be stored in three separate lock-free lists, depending on whether the page was mapped but not accessed (page_pool_clean), or the page was cleaned up in deferred mode and is waiting for a madvise call from a background garbage collection thread (page_pool_lazily_freed), or the page received a madvise (`page_pool_freed') call.

The Julia pool allocation tool uses a "multi-level" allocation principle. When requesting a memory page for the allocation tool, Julia will perform the following actions:

Will try to get a page from page_pool_lazily_freed containing pages that were empty at the last stop stage, but have not yet received a madvise call from the parallel GC cleanup thread.
If it was not possible to get a page from page_pool_lazily_freed, an attempt will be made to get a page from the page_pool_clean containing pages that received an mmap request in a previous page allocation request, but were never accessed.
If it was not possible to get a page from pool_page_clean and from page_pool_lazily_freed, an attempt will be made to get a page from page_pool_freed containing pages that have already received a madvise call from the parallel GC cleanup thread and whose base virtual address can be recycled.
If all the above attempts failed, an mmap call to the page package will be executed, one page is requested for itself, and all the others are inserted into 'page_pool_clean'.

Generation-based tagging and collection

The tagging stage in Julia is implemented through a parallel iterative depth-first search through the object graph. The collector in Julia is immobile, so information about the age of an object cannot be determined only through the memory area in which the object is located, but must be somehow encoded in the object header or side table. The two lowest bits of the object header are used to store, respectively, the tagging bit, which is set when scanning the object at the tagging stage, and the age bit for generation-based collection.

Generation-based collection is implemented using pinning bits: objects are placed on the tagging stack, which means they are tracked only if their tagging bits are not specified. When objects enter the oldest generation, their tagging bits are not reset during the so-called "quick cleanup", which leads to the fact that these objects are not tracked at the subsequent tagging stage. However, with a "complete cleanup", the tagging bits of all objects are reset, which leads to tracing all objects at a subsequent tagging stage. Objects are promoted to the next generation during each completed stage of cleaning. On the modifier side, the recording of fields is intercepted using a recording barrier, which places the address of the object in a memorized set for each stream if the object is in the last generation, and if the object in the field being recorded is not in it. Objects from this memorized set are then tracked at the tagging stage.

Cleaning

The cleanup of object pools for Julia can be divided into two categories: if this page, managed by the pool allocation tool, contains at least one active object, then it is necessary to list free memory through its inactive objects; if this page does not contain active objects, then its basic physical memory can be returned to the operating system, for example using madvise system calls in Linux.

The first category of cleaning is parallelized by intercepting work. For the second category of cleanup: If parallel page cleanup is enabled using the --gcthreads=X,1 flag, madvise system calls are executed in the background cleanup thread in parallel with the modifier threads. During the stage of stopping the collector, pages selected in the pool that do not contain active objects are initially sent to pool_page_lazily_freed'. Then the background cleanup thread is awakened, which is responsible for removing pages from `pool_page_lazily_freed, making a call to madvise for them and inserting them into pool_page_freed'. As described above, `pool_page_lazily_freed' is also used in conjunction with modifier streams. This means that in multi-threaded workloads with a high allocation level, modifier threads often avoid a page error when allocating (as a result of accessing the page with a call to mmap or accessing the page with a call to madvise) by directly performing allocation from the page to `pool_page_lazily_freed, while the background cleanup thread must perform a call to madvise to reducing the number of pages, since some of them have already been requested by modifiers.

Heuristics

The garbage collection heuristic corrects garbage collection by changing the allocation interval between collections.

The garbage collection heuristic measures how large the heap size is after collection and sets the next collection according to the algorithm described on the page at https://dl.acm.org/doi/10.1145/3563323 . As a result, it says that the resulting heap volume should have a square root-based ratio to the active heap, and that it should also scale to take into account how quickly the garbage collector frees objects and how quickly the modifiers perform allocation. Heuristics measure heap size by counting the number of pages used and objects using malloc. Previously, we measured the heap size by counting active objects, but this did not take into account fragmentation, which could lead to incorrect decisions. This also meant that we used local information about the flow (allocations) to make decisions about the process as a whole (when to perform garbage collection), the page dimension means that the decision is global.

The GC will do a full build when the heap size reaches 80% of the maximum allowed.