Static analyzer annotations for proper garbage collection in C code
Conducting an analysis
The analyzer plugin is supplied with Julia. Its source code can be found in src/clangsa'. To run the plugin, you need to build the clang dependency. Set the `BUILD_LLVM_CLANG
variable in the Make.user file to build the corresponding clang version. You may also need to use pre-assembled binaries using the USE_BINARYBUILDER_LLVM
parameters.
Alternatively (or if that’s not enough), try using
make -C src install-analysis-deps
from the Julia top-level catalog.
After that, it is enough to run the make -C src analyzegc
command to analyze the source code tree.
General overview
Since Julia’s garbage collection is accurate, it is necessary to have correct root information for any value that may be referenced at the time of garbage collection. Such places are called safepoints
, and in the local context of a function, this concept applies to any function call that can recursively terminate at a safe point.
In the generated code, this happens automatically due to the placement of garbage collection roots (see the chapter on garbage collection roots in the LLVM code generation documentation for developers). However, in the C code, the runtime needs to manually report the garbage collection roots. This is done using the following macros.
// The value assigned to any slot passed as an argument to these // is rooted for the duration of this GC frame. JL_GC_PUSH{1,...,6}(args...) // The values assigned into the size `n` array `rts` are rooted // for the duration of this GC frame. JL_GC_PUSHARGS(rts, n) // Pop a GC frame JL_GC_POP
If these macros are not used where they should be, or are applied incorrectly, memory corruption occurs without notification. For this reason, it is very important to add them correctly in all relevant places of the code.
To ensure that these macros are used correctly, we use static analysis (in particular, the clang static analyzer). The remainder of this document provides an overview of static analysis and describes the support actions necessary for the Julia code base to work.
Garbage collection invariants
The correctness of invariants is determined by two conditions.
-
All GC_PUSH calls must be followed by the corresponding GC_POP call (in practice, this is implemented at the function level).
-
If the value has not previously been made the root at any safe point, it cannot be referenced in the future.
As usual, the problem is in the details. In particular, in order for the second of the above conditions to be satisfied, it is necessary to know the following:
-
which calls are safe points and which are not;
-
which values are root values at a particular safe point, and which are not;
-
when the value is accessed.
Especially with regard to the second point: it is necessary to know which memory areas will be considered root at runtime (that is, the values assigned to these areas will be root). This includes areas explicitly designated as such by passing them to one of the GC_PUSH macros, areas and values that are root at the global level, as well as areas that are recursively accessible from one of these areas.
Static analysis algorithm
The idea itself is very simple, although the implementation is quite complicated (primarily due to the many special cases and subtleties of C and C++In fact, we keep track of all root areas, all values that can be root, and all expressions (assignments, memory allocations, etc.) that affect whether such values are root. Then, at any safe point, we perform symbolic garbage collection and poison all values that are not rooted at that location. If these values are subsequently accessed, an error is returned.
The clang static analyzer’s job is to build a state graph and examine it for error sources. Several nodes of this graph are created by the analyzer itself (for example, to ensure the order of execution), but the above definitions complement this graph with our own states.
The static analyzer is interprocedural and can analyze the execution order with function boundaries crossing. However, the static analyzer is not completely recursive and makes heuristic decisions about which calls should be investigated (in addition, some calls require analysis between program units and are invisible to the analyzer). In our case, determining correctness requires complete information. In this regard, it is necessary to annotate the prototypes of any function calls using all the information required for analysis, even if this information could be obtained as a result of interprocedural static analysis.
Fortunately, we can use this interprocedural analysis to ensure that annotations added to a particular function are correct, taking into account its implementation.
Analyzer Annotations
These annotations are located in the src/support/analyzer_annotations.h file. They are active only when the analyzer is used, and they expand either to nothing (for prototype annotations) or to idle operations (for annotations like functions).
JL_NOTSAFEPOINT
This is probably the most common annotation. It should be added to any function that is known to fail to reach a safe garbage collection point. As a rule, safe operations for such functions are only arithmetic calculations, memory accesses, and function calls that either have the annotation JL_NOTSAFEPOINT
or, according to other available data, are not safe points (for example, it may be a function in the C standard library that is hard-coded as such in the analyzer).
In calls to any function annotated with this attribute, the values may remain non-root.
Usage example:
void jl_get_one() JL_NOTSAFEPOINT {
return 1;
}
jl_value_t *example() {
jl_value_t *val = jl_alloc_whatever();
// This is valid, even though `val` is unrooted, because
// jl_get_one is not a safepoint
jl_get_one();
return val;
}
JL_MAYBE_UNROOTED
/JL_ROOTS_TEMPORARILY
When 'JL_MAYBE_UNROOTED` is annotated as a function argument, it means that this argument can be passed even if it is not the root. In the normal course of events, the julia ABI ensures that callers make the values root before passing them to the called objects. However, some functions do not follow this ABI rule and allow values to be passed in, even if they are not root values. Keep in mind that it does not automatically follow that such an argument will persist. The ROOTS_TEMPORARILY
annotation provides a more reliable guarantee that the value can not only cease to be the root during transmission, but will also be preserved by the called party between internal safe points.
Note that JL_NOTSAFEPOINT
actually implies JL_MAYBE_UNROOTED
/JL_ROOTS_TEMPORARILY
, because whether the argument is root does not matter if the function does not contain safe points.
It is also important to note that these annotations are applied on both the caller and the called side. On the calling side, they remove the restriction on the root character, which usually applies to julia ABI functions. On the called side, they have the opposite effect: such arguments are not considered root implicitly.
If any of these annotations applies to a function as a whole, it applies to all of its arguments. In general, this is only necessary for functions with a variable number of arguments.
Usage example:
JL_DLLEXPORT void JL_NORETURN jl_throw(jl_value_t *e JL_MAYBE_UNROOTED);
jl_value_t *jl_alloc_error();
void example() {
// The return value of the allocation is unrooted. This would normally
// be an error, but is allowed because of the above annotation.
jl_throw(jl_alloc_error());
}
JL_PROPAGATES_ROOT
This annotation is often found in access functions that return one object, which may be the root stored in another. When applied to a function argument, this annotation tells the analyzer that the root of this argument is also applied to the value returned by the function.
Usage example:
jl_value_t *jl_svecref(jl_svec_t *t JL_PROPAGATES_ROOT, size_t i) JL_NOTSAFEPOINT;
size_t example(jl_svec_t *svec) {
jl_value_t *val = jl_svecref(svec, 1)
// This is valid, because, as annotated by the PROPAGATES_ROOT annotation,
// jl_svecref propagates the rooted-ness from `svec` to `val`
jl_gc_safepoint();
return jl_unbox_long(val);
}
JL_ROOTING_ARGUMENT
/JL_ROOTED_ARGUMENT
This is essentially an analog of JL_PROPAGATES_ROOT
for assignment. When assigning a value to a field of another value that is already the root, the assigned value will inherit the root of the value to which it is assigned.
Usage example:
void jl_svecset(void *t JL_ROOTING_ARGUMENT, size_t i, void *x JL_ROOTED_ARGUMENT) JL_NOTSAFEPOINT
size_t example(jl_svec_t *svec) {
jl_value_t *val = jl_box_long(10000);
jl_svecset(svec, val);
// This is valid, because the annotations imply that the
// jl_svecset propagates the rooted-ness from `svec` to `val`
jl_gc_safepoint();
return jl_unbox_long(val);
}
JL_GC_DISABLED
This annotation implies that this function is called only when garbage collection is disabled at runtime. Functions of this kind are most often used during startup and in the garbage collector code itself. Note that this annotation is checked for compliance with calls to enable and disable the runtime, so clang will determine its correctness. This is not the best way to disable processing of a certain function if garbage collection is not actually disabled (if necessary, use ifdef__clang_analyzer__
).
Usage example:
void jl_do_magic() JL_GC_DISABLED {
// Wildly allocate here with no regard for roots
}
void example() {
int en = jl_gc_enable(0);
jl_do_magic();
jl_gc_enable(en);
}
JL_REQUIRE_ROOTED_SLOT
This annotation requires the caller to pass values in the slot that is the root (that is, the values assigned to this slot will be the root).
Usage example:
void jl_do_processing(jl_value_t **slot JL_REQUIRE_ROOTED_SLOT) {
*slot = jl_box_long(1);
// Ok, only, because the slot was annotated as rooting
jl_gc_safepoint();
}
void example() {
jl_value_t *slot = NULL;
JL_GC_PUSH1(&slot;);
jl_do_processing(&slot;);
JL_GC_POP();
}
JL_GLOBALLY_ROOTED
This annotation implies that this value is always the root value at the global level. It can be applied to declarations of global variables (in this case, it applies to the values of these variables or to the values of an array, if an array is declared) or to functions (in this case, it applies to their return values; for example, these can be functions that return some private value that is the root at the global level).
Usage example:
extern JL_DLLEXPORT jl_datatype_t *jl_any_type JL_GLOBALLY_ROOTED; jl_ast_context_t *jl_ast_ctx(fl_context_t *fl) JL_GLOBALLY_ROOTED;
JL_ALWAYS_LEAFTYPE
This annotation is essentially equivalent to JL_GLOBALLY_ROOTED' except that it should be used only if the values are root at the global level due to the fact that they belong to the final type. Defining finite types as root types has its own characteristics. They are usually made root by means of the `cache
field of the corresponding TypeName
object, which itself is made root by means of the containing module (that is, they will be root as long as this is true for the containing module). In the general case, we can assume that finite types are root types where they are used, but in the future this property may be clarified, so a separate annotation helps to justify the reason for the root character at the global level.
The analyzer also automatically detects checks for the final nature of types and will not warn about the absence of garbage collection roots along these paths.
JL_DLLEXPORT jl_value_t *jl_apply_array_type(jl_value_t *type, size_t dim) JL_ALWAYS_LEAFTYPE;
JL_GC_PROMISE_ROOTED
This is an annotation like a function. Any value passed to this annotation will be considered the root value in the scope of the current function. It is intended for cases of incorrect analyzer operation or difficult situations. However, it should not be abused - it is better to optimize the analyzer itself.
void example() { jl_value_t *val = jl_alloc_something(); if (some_condition) { // We happen to know for complicated external reasons // that val is rooted under these conditions JL_GC_PROMISE_ROOTED(val); } }
Completeness of the analysis
The analyzer searches only for local information. In particular, in the case of the PROPAGATES_ROOT
above, it assumes that it sees all the relevant memory changes, that is, that memory does not change in called functions (if they are not taken into account during analysis) and in parallel threads. For this reason, some problematic cases may not be taken into account, although in practice such parallel changes are quite rare. Optimizing the analyzer to cover such situations may be an interesting line of work in the future.