Compilation before execution

The page is in the process of being translated.

This document describes the design and structure of the ahead-of-time (AOT) compilation system in Julia. This system is used when creating system images and package images. Most of the implementation process described here is in files aotcompile.cpp `, `staticdata.c and `processor.cpp `.

Introduction

Although JIT compilation is usually performed in Julia, the code can be compiled in advance before execution and the result saved to a file. This can be useful for a number of reasons.:

To reduce the time required to start the Julia process.
To reduce the time spent in the JIT compiler instead of executing code (time to first execution, TTFX).
To reduce the amount of memory used by the JIT compiler.

General overview

The following provides information about the current implementation of the complete process, which is performed internally when the user compiles a new AOT module, for example, when he enters `using Foo'. Most likely, this information will change over time as more efficient processing methods are introduced, so current implementations may not fully match the data flow and functions described below.

Compilation of code images

First, it is necessary to define the methods that should be compiled into machine code. This can only be done when the compiled code is actually running, because the set of methods that need to be compiled depends on the types of arguments passed to the methods, and method calls with certain combinations of types may not be known until runtime. During this process, the exact methods that the compiler sees are tracked for subsequent compilation, creating a compilation trace.

Currently, when compiling images, Julia starts trace generation in a different process, rather than in the process performing AOT compilation. This may affect the attempt to use the debugger during pre-compilation. To debug a pre-compilation using a debugger, it is best to use an rr debugger, record the entire process tree, use rr ps to identify the corresponding failure process, and then use rr replay -p PID to reproduce only the failure process.

After defining the methods to be compiled, the jl_create_system_image function is passed. This function sets a number of data structures to be used when serializing machine code to a file, and then calls 'jl_create_native` with an array of methods. 'jl_create_native` performs code generation (codegen) for methods, creates one or more LLVM modules. 'jl_create_system_image` then records useful information about what was created when generating code from modules.

After that, the modules are passed to the jl_dump_native function along with the information recorded by the jl_create_system_image function. 'jl_dump_native` contains the code needed to serialize modules into bitcode, object, or assembly files, depending on the command line parameters passed to Julia. The serialized code and information are then written to a file as an archive.

The last step is to run the system linker for the object files in the archive created using 'jl_dump_native'. At the end of this step, a shared library containing the compiled code is created.

Uploading Code Images

When loading the code image, the shared library created by the linker is loaded into memory. Then the system image data is loaded from the shared library. This data contains information about types, methods, and code instances that have been compiled into a shared library. The data is used to restore the state of the runtime environment to what it was at the time the code image was compiled.

If the code image was compiled with multiple version control enabled, the loader will select the appropriate version of each function based on the processor capabilities available on the current computer.

For system images: since no other code has been loaded, the state of the runtime environment is now the same as it was at the time the code image was compiled. For package images, the state of the environment may change from what it was at the time of code compilation, so each method should be checked for correctness using the global method table.

Compilation of methods

Tracing compiled methods

Julia has a command-line flag for writing all methods that are compiled by the JIT compiler, — `--trace-compile=filename'. When a function is compiled and this flag contains the file name, Julia will output a precompilation report to this file indicating the method and the types of arguments with which it was called. This creates a pre-compilation script that can be used later in the AOT compilation process. In the package https://julialang .github.io/PrecompileTools.jl/stable /[PrecompileTools] contains tools to simplify this process for developers.

`jl_create_system_image`

'jl_create_system_image` saves all Julia-specific metadata necessary for subsequent restoration of the runtime state. This includes data such as code instances, method instances, method tables, and type information. This function also defines the data structures needed to serialize the machine code into a file. Finally, it calls jl_create_native' to create one or more LLVM modules containing machine code for the passed methods. The 'jl_create_native function is responsible for generating the code (codegen) for the methods passed to it.

`jl_dump_native`

'jl_dump_native` is responsible for serializing the LLVM module containing the machine code into a file. In addition to the module, the system image data created by 'jl_create_system_image` is compiled as a global variable. The output of this method is an archive of bitcode, objects, and/or assemblies containing the code and data of the system image.

jl_dump_native is usually one of the most time-consuming when generating machine code, with most of the time spent optimizing LLVM’s IR and creating machine code. Thus, this function is capable of performing multithreaded optimization and machine code generation. This multithreading depends on the size of the module, but can be explicitly overridden by setting an environment variable. JULIA_IMAGE_THREADS. By default, the maximum number of threads is equal to half the number of available threads, but if you set a lower value, you can reduce peak memory consumption during compilation.

jl_dump_native can also create machine code optimized for various architectures when integrated with the Julia loader. To run, you need to set an environment variable. JULIA_CPU_TARGET and indirectly perform a multiple version control pass in the optimization pipeline. To work with multithreading, before splitting a module into submodules that are created in their own threads, an annotation step is added that uses the information available throughout the module to decide which functions should be cloned for different architectures. After annotation, individual threads can generate code for different architectures in parallel, given that another submodule is guaranteed to create the necessary functions that will be called by the cloned function.

The archive also contains some other metadata about how the module was serialized, such as the number of threads used to serialize the module and the number of functions that were compiled.

Static layout

The last step in the AOT compilation process is to run the linker for the object files in the archive created using `jl_dump_native'. As a result, a shared library is created containing the compiled code. This shared library can then be loaded into Julia to restore the state of the runtime environment. When compiling a system image, the native linker used by the C compiler creates the final shared library. For package images, the LLVM linker LLD provides a more consistent layout interface.

Uploading Code Images

Uploading a shared library

The first step when uploading a code image is to download the shared library created by the linker. To do this, call the jl_dlopen function in the path to the shared library. This function loads the shared library and resolves all the symbols in it.

Loading machine code

First, the loader needs to determine whether the compiled machine code is suitable for the loader’s execution architecture. This is necessary to avoid executing instructions that are not recognized by older processors. To do this, compare the CPU capabilities available on the current computer with the CPU capabilities for which the code was compiled. If multiple version control is enabled, the loader will select the appropriate version of each function based on the processor capabilities available on the current computer. If none of the feature sets are covered by multiple version control, the loader will return an error.

During the multiple version control process, several global arrays of all module functions are created. If the process is multithreaded, an array of arrays is created, which the loader rearranges into one large array with all the functions compiled for this architecture. A similar process occurs with global module variables.

Julia’s Tincture of Fortune

The loader then uses global variables and functions resulting from loading the machine code to configure the basic data structures of the Julia runtime in the current process. In this process, types and methods are added to the Julia runtime, and the cached machine code becomes available for use by other Julia functions and interpreter. For package images: each method must be checked, since the state of the global method table must correspond to the state for which the package image was compiled. In particular, if there are different sets of methods during downloading and compiling the package image, then the method should be canceled and recompiled the first time it is used. This is necessary to ensure that the execution semantics remain the same regardless of whether the package was pre-compiled or the code was executed directly. The system images do not need to perform this check, because the global method table is empty at boot time. This way, system images load faster than package images.