Working with LLVM
Overview of the interface between Julia and LLVM
By default, dynamic linking in Julia is done with LLVM. For static layout, run the build with the parameter USE_LLVM_SHLIB=0
.
The code for downgrading the AST Julia representation to an intermediate representation (IR) or its direct interpretation is located in the directory src/
.
File | Description |
---|---|
|
Entering the C interface of the compiler and issuing an object file |
|
Built-in functions |
|
Downgrading |
|
Lowering auxiliary functions, primarily for accessing arrays and tuples |
|
Top-level code generation, list of passes, lowering of built-in functions |
|
Tracks debugging information for the JIT code |
|
Processes a machine object file and performs JIT code disassembly |
|
Universal functions |
|
Lowering of internal functions |
|
JIT-related code, levels, and ORC compilation aids |
|
Escape analysis related to Julia |
|
LLVM custom pass for downgrading heap allocations with stack transfer |
|
LLVM custom pass to downgrade CPU-based functions (e.g. haveFMA) |
|
LLVM custom pass to downgrade 16-bit floating point operations to 32-bit floating point operations |
|
LLVM custom pass to downgrade garbage collector calls to final form |
|
LLVM Custom Pass for verifying Julia garbage collection Invariants |
|
LLVM custom pass to raise or lower Julia’s internal functions |
|
An LLVM custom pass to "root" values tracked by the garbage collector |
|
LLVM custom pass for lowering try-catch blocks |
|
LLVM Custom Pass for Fast FMA Matching |
|
LLVM custom pass for generating system image code for multiple architectures |
|
LLVM User Pass for Canonicalization of Address Spaces |
|
LLVM Custom Pass to downgrade TLS operations |
|
LLVM Custom Pass to Delete Julia Address Spaces |
|
LLVM Custom Pass for Removing Julia’s Non-integer Address Spaces |
|
LLVM User Pass for |
|
New Aisle Manager Pipeline, Aisle Pipeline Analysis |
|
I/O and auxiliary functions of the operating system |
Some .cpp
files form a group that is compiled into a single object.
The difference between an internal function and an embedded one is that the embedded function is fully functional and can be used like any other Julia function. The built-in function can only work with decompressed data, so its arguments must be statically typed.
Alias Analysis
Julia currently applies https://llvm.org/docs/LangRef.html#tbaa-metadata [type-based alias analysis] LLVM. You can find comments documenting the inclusion relationship by using static MDNode*
in the file `src/codegen.cpp `.
The '-O` parameter includes https://llvm.org/docs/AliasAnalysis.html#the-basic-aa-pass [basic alias analysis] LLVM.
Building Julia with a different version of LLVM
The default LLVM version is specified in the deps/llvm' file.version
. It can be redefined by creating a file named Make.user
in the top-level directory and adding the following line to it:
LLVM_VER = 13.0.0
In addition to the LLVM release numbers, you can also set the parameter DEPS_GIT = llvm
in combination with 'USE_BINARYBUILDER_LLVM = 0` to build using the latest LLVM development version.
You can also build with the debug version of LLVM by specifying the parameter LLVM_DEBUG = 1
or LLVM_DEBUG = Release
in the Make.user' file. In the first case, the LLVM build will be completely unoptimized, and in the second case it will be optimized. Depending on your needs, the second option may be enough, and it is much faster. When using the `LLVM_DEBUG = Release
parameter, it may be desirable to also set LLVM_ASSERTIONS = 1
to enable diagnostics for different passes. By default, this parameter is enabled only when `LLVM_DEBUG = 1'.
Passing parameters to LLVM
You can pass parameters to LLVM using an environment variable. JULIA_LLVM_ARGS
. Here is an example of parameters using the bash
syntax:
-
export JULIA_LLVM_ARGS=-print-after-all
outputs IR after each pass; -
export JULIA_LLVM_ARGS=-debug-only=loop-vectorize
outputs diagnosticsDEBUG(...)
LLVM for the loop vectorizer. If you receive warnings about an unknown command line argument, re-build LLVM with the parameterLLVM_ASSERTIONS = 1
. -
export JULIA_LLVM_ARGS=-help
displays a list of available parameters.export JULIA_LLVM_ARGS=-help-hidden
displays additional parameters. -
export JULIA_LLVM_ARGS="-fatal-warnings-print-options"
is an example of using multiple parameters.
Useful parameters of JULIA_LLVM_ARGS
-
-print-after=PASS
: outputs IR after any execution ofPASS
; useful for checking changes made by the pass. -
-print-before=PASS
: outputs IR before any execution ofPASS
; useful for verifying input data for a pass. -
-print-changed
: outputs IR whenever a pass changes IR; useful for identifying passages that cause problems. -
-print-(before|after)=MARKER-PASS
: The Julia pipeline includes several marker passes that can be used to identify places where problems occur or optimizations occur. A marker pass is defined as a pass that appears once in the pipeline and does not perform any transformations in the IR. It is only useful for pre- or post-execution conclusions. Currently, the following marker passes exist in the pipeline:-
BeforeOptimization
-
BeforeEarlySimplification
-
AfterEarlySimplification
-
BeforeEarlyOptimization
-
AfterEarlyOptimization
-
BeforeLoopOptimization
-
BeforeLICM
-
AfterLICM
-
BeforeLoopSimplification
-
AfterLoopSimplification
-
AfterLoopOptimization
-
BeforeScalarOptimization
-
AfterScalarOptimization
-
BeforeVectorization
-
AfterVectorization
-
BeforeIntrinsicLowering
-
AfterIntrinsicLowering
-
BeforeCleanup
-
AfterCleanup
-
AfterOptimization
-
-
-time-passes
: Displays the time spent on each pass; useful for identifying time-consuming passes. -
-print-module-scope': used in combination with `-print-(before|after)
, gets the entire module, not the IR unit received during the pass. -
`-debug': outputs a large amount of debugging information on LLVM.
-
-debug-only=NAME': Outputs debugging messages from files in which `DEBUG_TYPE
is defined asNAME
; useful for getting additional context about the problem.
Isolated debugging of LLVM transformations
Sometimes it may be useful to debug LLVM transformations separately from the rest of the Julia system, for example, because reproducing the problem in julia would take too much time, or because you need to use LLVM tools (for example, bugpoint).
To begin with, you can install the developer tools for working with LLVM as follows.
make -C deps install-llvm-tools
To get an unoptimized IR representation for the entire system image, pass the parameter --output-unpt-bc unpt.bc
in the process of building the system image. As a result, the unoptimized IR representation will be output to the file unpt.bc
. This file can then be transferred to LLVM tools in the usual way. The libjulia library can act as a plug-in for LLVM passages and be loaded into LLVM tools so that Julia-related passages are available in the appropriate environment. In addition, it provides the -julia
meta-pass, which executes the entire Julia pipeline of passes applied to IR. For example, to create a system image using the old aisle manager, you can do the following.
llc -o sys.o opt.bc cc -shared -o sys.so sys.o
To create an image of the system using the new aisle manager, you can do the following.
opt -load-pass-plugin=libjulia-codegen.so --passes='julia' -o opt.bc unopt.bc llc -o sys.o opt.bc cc -shared -o sys.so sys.o
This system image can then be loaded by julia in the usual way.
In addition, you can output a dump of the LLVM IR module for only one Julia function as follows:
fun, T = +, Tuple{Int,Int} # Подставьте здесь интересующую вас функцию
optimize = false
open("plus.ll", "w") do file
println(file, InteractiveUtils._dump_function(fun, T, false, false, false, true, :att, optimize, :default, false))
end
These files can be processed in exactly the same way as the above unoptimized representation of the IR system image.
Running the LLVM Test Suite
To run llvm tests locally, you first need to install the tools and build Julia:
make -C deps install-llvm-tools make -j julia-src-release make -C test/llvmpasses
If you want to run individual test files directly using the commands at the top of each test file, the first step is to install the tools in ./usr/tools/opt'. Then you need to manually replace `%s
with the name of the test file.
Improving LLVM optimizations for Julia
To improve LLVM code generation, it is usually necessary either to make the Julia code downgrade more compatible with LLVM passes, or to optimize the pass.
If you are going to optimize the passage, be sure to check out https://llvm.org/docs/DeveloperPolicy.html [LLVM policy for developers]. The best strategy is to create a sample code in a form that allows you to use the LLVM opt
tool to study it and the passage you are interested in in isolation.
-
Create an example of the Julia code you need.
-
Use the parameter
JULIA_LLVM_ARGS=-print-after-all
to get the IR dump. -
Select the IR in the location immediately before performing the pass you are interested in.
-
Delete the debugging metadata and fix the TBAA metadata manually.
The latter will require effort. We would be grateful if you could suggest a more convenient way.
jlcall Call Agreement
Julia has a general calling convention for non-optimized code that looks something like this:
jl_value_t *any_unoptimized_call(jl_value_t *, jl_value_t **, int);
Here, the first argument is a packed function object, the second is an array of arguments placed on the stack, and the third is the number of arguments. Now we could perform the downgrade directly and call the alloca function for an array of arguments. However, this would violate the principles of using SSA at the call location and would significantly complicate optimizations (including placing garbage collection roots). Instead, we will call it as follows:
call %jl_value_t *@julia.call(jl_value_t *(*)(...) @any_unoptimized_call, %jl_value_t *%arg1, %jl_value_t *%arg2)
This allows you to follow the principles of using SSA in all operations of the optimizer. By placing garbage collection roots, this call will later be downgraded to the original ABI C.
Placing garbage collection roots
Garbage collection roots are placed as part of one of the late LLVM passes in the pass pipeline. By placing garbage collection roots within this late pass, LLVM can make more aggressive optimizations for code that requires garbage collection roots, and also makes it possible to reduce the required number of garbage collection roots and garbage collection root preservation operations (since the LLVM platform does not support our garbage collector, otherwise it would be prohibited from doing so with values stored in the garbage collection frame, therefore, for security reasons, its operation would be limited). For example, consider the error call path:
if some_condition()
#= Возможно, здесь используются какие-либо переменные =#
error("An error occurred")
end
During the collapse of constants, LLVM can detect that the condition is always false and delete the base block. However, if the garbage collection roots are lowered early, the garbage collection root slots used in the remote block, as well as any values stored in these slots due to use in the error path, will be saved by the LLVM platform. With a late downgrade of the garbage collection roots, we give LLVM permission to perform the usual optimizations (constant convolution, elimination of useless code, etc.), without worrying (too much) about which values may or may not be tracked by the garbage collector.
However, in order for later placement of garbage collection roots to be possible, we need to be able to determine the following: a) pointers tracked by the garbage collector; b) all use cases of such pointers. So the purpose of placing garbage collection roots is simple.:
minimizing the number of necessary garbage collection roots and saving operations in them, taking into account the restriction that at each safe point any active pointer tracked by the garbage collector (that is, for which there is a path after this point in which it is used) is located in some slot of the garbage collector.
Performance
Thus, the main difficulty lies in choosing an IR representation that allows you to identify pointers tracked by the garbage collector and their use cases even after running the program through the optimizer. To do this, our approach involves using three LLVM functions:
-
user-defined address spaces;
-
operand packages;
-
non-integer pointers.
User-defined address spaces allow us to mark each location with an integer, which should be saved during the optimization process. The compiler cannot add casts between address spaces that were not present in the original program, and should never change the address space of a pointer during loading, saving, etc. This allows you to annotate pointers tracked by the garbage collector so that it cannot be influenced by the optimizer. Please note that it is not possible to implement the same thing using metadata. It is assumed that any metadata can be deleted without changing the meaning of the program. However, the inability to determine the pointer tracked by the garbage collector fundamentally changes the behavior of the program — it may crash or return incorrect results. We are currently using three different address spaces (their numbers are defined in the file `src/codegen_shared.cpp `):
-
Pointers tracked by the garbage collector (currently 10): These are pointers to packed values that can be placed in a garbage collection frame. They are approximately similar to the pointer
jl_value_t*
in C. Note: There should be no pointers in this address space that cannot be stored in the garbage collector slot. -
Derived pointers (currently 11): These are pointers derived from any pointer tracked by the garbage collector. The use of such pointers entails the use of the original pointer. However, they themselves do not necessarily have to be known to the garbage collector. The garbage collection root placement pass MUST necessarily find the pointer tracked by the garbage collector from which this pointer is derived, and use it to create the root.
-
Root pointers of the called party (currently 12): this is an auxiliary address space for expressing the concept of the root value of the called party. All values in this address space MUST be able to be stored at the root of garbage collection (although this condition may become less strict in the future), but unlike other pointers, they do not have to be root when passed to the call (however, they must still be root if they are active at another safe point between definitions). and a challenge).
-
Pointers loaded from the monitored object (currently 13): used by arrays that themselves contain a pointer to the managed data. This data area belongs to the array, but it is not itself an object monitored by the garbage collector. The compiler guarantees that as long as this pointer is active, the object from which it was loaded will remain active.
Invariants
The garbage collection root placement pass uses several invariants that must be respected by the interface part and preserved by the optimizer.
First, only the following address space conversions are allowed:
-
0->{Tracked,Derived,CalleeRooted} (tracked, derived, root of the called party): An untraceable pointer can be degenerated into any other. However, please note that the optimizer has the right not to make such a value the root value. Having a value in the address space of 0 in any part of the program is unsafe if this value requires a garbage collection root (or is derived from such a value).
-
Tracked (tracked)->Derived (derived): This is the standard degeneracy path for internal values. The placement pass looks for such values to determine the base pointer for any use case.
-
Tracked (tracked)->CalleeRooted (root of the called party): the CalleeRooted address space simply indicates that the garbage collection root is not required. However, note that the degeneration of Derived (tracked)->CalleeRooted (root of the called side) is prohibited, since pointers should generally be able to be stored in the garbage collection slot even in this address space.
Now let’s look at what applies to use cases.:
-
operations for loading values that are located in one of the address spaces;
-
operations for saving values located in one of the address spaces in a specific location;
-
saving operations in a pointer in one of the address spaces;
-
calls for which the operand is a value in one of the address spaces;
-
calls to the jlcall ABI for which the argument array contains the value;
-
return instructions.
We explicitly allow loading and saving operations and simple calls in the Tracked and Derived address spaces. The elements of the jlcall argument arrays must always be in the Tracked address space (according to the ABI, they must be valid jl_value_t*
pointers). The same is true for return instructions (however, note that the returned arguments in the form of structures can be in any address space). The only valid way to use a pointer in the CalleeRooted address space is to pass it to a call (which must have an operand of the appropriate type).
In addition, it is forbidden to find getelementptr
in the Tracked address space. The reason is that if the operation is not idle, the pointer will eventually be unable to be stored in the garbage collection slot and therefore it will not be able to reside in this address space. If such a pointer is required, it must first be brought to the Derived address space.
Finally, the inttoptr
and ptrtoint
instructions are prohibited in these address spaces. Having such instructions would mean that some of the i64
values are actually being tracked by the garbage collector. And this would create a problem, since it would violate the requirement to be able to define pointers related to garbage collection. This invariant is provided by LLVM’s "non-integer pointers" feature, which appeared in LLVM 5.0. It prohibits the optimizer from performing optimizations that would lead to such operations. Please note: we can still introduce static constants during JIT using inttoptr
in address space 0, and then cast them to the appropriate address space.
Support ccall
An important aspect that has not yet been discussed is processing ccall
. A feature 'ccall` is that the location and area of use do not match. Consider the following example:
A = randn(1024)
ccall(:foo, Cvoid, (Ptr{Float64},), A)
When downgraded, it adds a conversion of the array to a pointer, as a result of which the reference to the array value is removed. However, it is definitely necessary to make sure that the array remains active while it is running. ccall
. To understand how this is achieved, let’s look at a hypothetical example of a possible downgrade of the above code.:
return $(Expr(:foreigncall, :(:foo), Cvoid, svec(Ptr{Float64}), 0, :(:ccall), Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), 0, :(:ccall), :(A)), :(A)))
The last element :(A)
is an additional list of arguments that is added during the downgrade and tells the code generator which values at the Julia level should remain active during the downgrade. ccall
. Then we take this information and present it as an "operand package" at the IR level. An operand package is essentially a fictitious use case, tied to the call location. At the IR level, it looks like this:
call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ]
During the garbage collection root placement pass, the operand packet `jl_roots' is treated as a regular operand. However, in the last step, after adding the garbage collection roots, the operand package is deleted so as not to confuse the choice of instructions.
Support pointer_from_objref
A feature pointer_from_objref
is that the user must explicitly control the garbage collection roots. According to the above invariants, this function is invalid, since it performs conversion from the address space 10 to 0. However, in some situations it can be useful, so we provide a special internal function.:
declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*)
It is lowered to the appropriate address space reduction after lowering the garbage collection roots. However, note that by using this internal function, the caller assumes full responsibility for ensuring that the value is the root. In addition, this internal function is not considered a use case, so during the garbage collection root placement pass, the garbage collection root is not provided for it. As a result, it is necessary to provide external control of the roots while the value is monitored by the system. In other words, it is unacceptable to try to use the result of this operation to create a global root — the optimizer could have already deleted the value.
Keeping values active in the absence of use cases
In some cases, the object must remain active, even if the compiler is not aware of its use cases. This may be true in the case of low-level code that directly operates with the representation of an object in memory, or with code that must interact with C code. To do this, we provide the following internal functions at the LLVM level:
token @llvm.julia.gc_preserve_begin(...) void @llvm.julia.gc_preserve_end(token)
(The llvm' element.`is required for using the `token
type.) These internal functions have the following meaning: at any safe point that is controlled by the gc_preserve_begin
call, but is not controlled by the corresponding gc_preserve_end
call (that is, a call whose argument is the token returned by the gc_preserve_begin
call), the values passed as arguments to this gc_preserve_begin
call will remain active. Keep in mind that gc_preserve_begin
is still considered a common use case for these values, so the standard lifetime semantics will ensure that the values are active before entering the save area.