Building a system image
Building a Julia system image
Julia includes a pre-analyzed system image with the contents of the Base
module named `sys.ji'. This file is also pre-compiled into the sys shared library.{so,dll,dylib}`for as many platforms as possible to significantly improve the launch time. On systems that do not ship with a pre-compiled system image file, this file can be generated from source code files located in the `DATAROOTDIR/julia/base' Julia folder.
By default, Julia will generate a system image using half of the available system threads. This moment can be controlled using the environment variable JULIA_IMAGE_THREADS
.
This operation is useful for several reasons. The user can perform the following tasks.
-
Build a precompiled system image of shared libraries on a platform that does not ship with this way, thereby improving startup time.
-
Modify the
Base
module, reassemble the system image and use the newBase
module the next time Julia is launched. -
Include the 'userimg' file.jl`, which contains the packages, into the system image, thereby creating a system image in which the packages are embedded in the startup environment.
The package `PackageCompiler.jl' contains convenient wrapper functions to automate this process.
A system image optimized for multiple microarchitectures
The system image can be compiled simultaneously for several CPU microarchitectures in the same instruction set architecture (ISA). It is possible to create multiple versions of the same function with a minimal dispatch point inserted into common functions in order to take advantage of various ISA extensions or other microarchitecture features. The version that provides the best performance will be selected automatically at runtime based on the available CPU characteristics.
Specifying multiple system image targets
A system image with multiple microarchitectures can be included by transferring multiple targets during the compilation of the system image. This can be done either by using the create parameter JULIA_CPU_TARGET
, or using the -C
parameter of the command line when executing the compilation command manually. Multiple targets are separated in the parameter string by the ;
symbol. The syntax for each target object is the name of the CPU, followed by several functions separated by the character ,
. All features supported by LLVM are supported. The function can be disabled using the prefix -
. (The +
prefix is also allowed and ignored to be consistent with LLVM syntax.) In addition, several special functions are supported to control the cloning behavior of functions.
It is recommended to specify either |
-
clone_all
By default, only those functions that are most likely to be able to increase their efficiency due to the capabilities of the microarchitecture will be cloned. However, if the
clone_all
function is specified for the target object, all system image functions will be cloned for the target object. The negative form of the function-clone_all
can be used to prohibit cloning of all functions by built-in heuristics. -
base(<n>)
<n>
is a placeholder for a non-negative number (for example,base(0)
,base(1)
). By default, a partially cloned (i.e. withoutclone_all
) target will use functions from the target By default (the first one specified) if the function is not cloned. This behavior can be changed by specifying a different base using thebase(<n>)
parameter. Then’th target object (based on 0) will be used as the base target object instead of the standard one (`0’th). The base target must be either `0
or another target objectclone_all'. Specifying a target other than `clone_all
as the base object will result in an error. -
opt_size
As a result, the function for the target object will be optimized in size, which does not have a significant impact on runtime performance. This corresponds to the
-Os
parameter of GCC and Clang. -
min_size
As a result, the function for the target object will be optimized in size, which can have a significant impact on runtime performance. This corresponds to the
-Oz
Clang parameter.
For example, at the time of writing this document, when creating official x86_64
Julia binaries downloaded from the site julialang.org , the following line is used.
generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)
This creates a system image with three separate targets: one for the universal processor x86_64
, one with the sandybridge
ISA (explicitly excluding xsaveopt
), which explicitly clones all functions, and one designed for the haswell
ISA, based on the sysimg version of sandybridge
and also excluding rdrnd'. When the Julia implementation loads the generated sysimg, it checks the host processor for the appropriate CPU capability flags that allow the highest ISA level to be used. Please note that the basic level (`generic
) requires the instruction cx16
, which is disabled in some virtualization programs and must be enabled to load the generic
target. Alternatively, you can generate a sysimg with a generic,-cx16
target object for greater compatibility. However, it should be noted that this can lead to performance and stability problems in some codes.
Implementation overview
This is a brief overview of the various parts involved in the implementation process. Details of the implementation are given in the comments to the code of each component.
-
Compiling a system image
Analysis and cloning are performed in
src/processor*
. Currently, cloning of a function based on the presence of loops, simd instructions, or other mathematical operations (for example, fastmath, fma, muladd) is supported. This information is transferred to a filesrc/llvm-multiversioning.cpp `, which performs the actual cloning. In addition to cloning and inserting dispatching slots (for how this is done, see the comments in `MultiVersioning::runOnModule
), the transfer also generates metadata so that the runtime environment can properly load and initialize the system image. A detailed description of the metadata is available in the file `src/processor.h'. -
Loading the system image
Loading and initialization of the system image is performed in
src/processor*
by analyzing the metadata stored when generating the system image. The definition of the main functions and their selection are carried out in the filesrc/processor_*.cpp
depending on the ISA. When choosing a target object, it is recommended to adhere to the exact match of the CPU name, as well as choose a large vector register size and a larger number of functions. An overview of this process is provided in the file `src/processor.cpp `.