Engee documentation

Building a system image

Building a Julia system image

Julia includes a pre-analyzed system image with the contents of the Base module named `sys.ji'. This file is also pre-compiled into the sys shared library.{so,dll,dylib}`for as many platforms as possible to significantly improve the launch time. On systems that do not ship with a pre-compiled system image file, this file can be generated from source code files located in the `DATAROOTDIR/julia/base' Julia folder.

By default, Julia will generate a system image using half of the available system threads. This moment can be controlled using the environment variable JULIA_IMAGE_THREADS.

This operation is useful for several reasons. The user can perform the following tasks.

  • Build a precompiled system image of shared libraries on a platform that does not ship with this way, thereby improving startup time.

  • Modify the Base module, reassemble the system image and use the new Base module the next time Julia is launched.

  • Include the 'userimg' file.jl`, which contains the packages, into the system image, thereby creating a system image in which the packages are embedded in the startup environment.

The package `PackageCompiler.jl' contains convenient wrapper functions to automate this process.

A system image optimized for multiple microarchitectures

The system image can be compiled simultaneously for several CPU microarchitectures in the same instruction set architecture (ISA). It is possible to create multiple versions of the same function with a minimal dispatch point inserted into common functions in order to take advantage of various ISA extensions or other microarchitecture features. The version that provides the best performance will be selected automatically at runtime based on the available CPU characteristics.

Specifying multiple system image targets

A system image with multiple microarchitectures can be included by transferring multiple targets during the compilation of the system image. This can be done either by using the create parameter JULIA_CPU_TARGET, or using the -C parameter of the command line when executing the compilation command manually. Multiple targets are separated in the parameter string by the ; symbol. The syntax for each target object is the name of the CPU, followed by several functions separated by the character ,. All features supported by LLVM are supported. The function can be disabled using the prefix -. (The + prefix is also allowed and ignored to be consistent with LLVM syntax.) In addition, several special functions are supported to control the cloning behavior of functions.

It is recommended to specify either clone_all or base(<n>) for each target platform, and not just for the first one. This allows you to explicitly tell which target platforms all features will be cloned for and which target platforms are based on other target platforms. If you do not do this, not all functions will be cloned By default, and if the function is not cloned, the function definition for the first target platform will be used as a backup option.

  1. clone_all

    By default, only those functions that are most likely to be able to increase their efficiency due to the capabilities of the microarchitecture will be cloned. However, if the clone_all function is specified for the target object, all system image functions will be cloned for the target object. The negative form of the function -clone_all can be used to prohibit cloning of all functions by built-in heuristics.

  2. base(<n>)

    <n> is a placeholder for a non-negative number (for example, base(0), base(1)). By default, a partially cloned (i.e. without clone_all) target will use functions from the target By default (the first one specified) if the function is not cloned. This behavior can be changed by specifying a different base using the base(<n>) parameter. The n’th target object (based on 0) will be used as the base target object instead of the standard one (`0’th). The base target must be either `0 or another target object clone_all'. Specifying a target other than `clone_all as the base object will result in an error.

  3. opt_size

    As a result, the function for the target object will be optimized in size, which does not have a significant impact on runtime performance. This corresponds to the -Os parameter of GCC and Clang.

  4. min_size

    As a result, the function for the target object will be optimized in size, which can have a significant impact on runtime performance. This corresponds to the -Oz Clang parameter.

For example, at the time of writing this document, when creating official x86_64 Julia binaries downloaded from the site julialang.org , the following line is used.

generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)

This creates a system image with three separate targets: one for the universal processor x86_64, one with the sandybridge ISA (explicitly excluding xsaveopt), which explicitly clones all functions, and one designed for the haswell ISA, based on the sysimg version of sandybridge and also excluding rdrnd'. When the Julia implementation loads the generated sysimg, it checks the host processor for the appropriate CPU capability flags that allow the highest ISA level to be used. Please note that the basic level (`generic) requires the instruction cx16, which is disabled in some virtualization programs and must be enabled to load the generic target. Alternatively, you can generate a sysimg with a generic,-cx16 target object for greater compatibility. However, it should be noted that this can lead to performance and stability problems in some codes.

Implementation overview

This is a brief overview of the various parts involved in the implementation process. Details of the implementation are given in the comments to the code of each component.

  1. Compiling a system image

    Analysis and cloning are performed in src/processor*. Currently, cloning of a function based on the presence of loops, simd instructions, or other mathematical operations (for example, fastmath, fma, muladd) is supported. This information is transferred to a file src/llvm-multiversioning.cpp `, which performs the actual cloning. In addition to cloning and inserting dispatching slots (for how this is done, see the comments in `MultiVersioning::runOnModule), the transfer also generates metadata so that the runtime environment can properly load and initialize the system image. A detailed description of the metadata is available in the file `src/processor.h'.

  2. Loading the system image

    Loading and initialization of the system image is performed in src/processor* by analyzing the metadata stored when generating the system image. The definition of the main functions and their selection are carried out in the file src/processor_*.cpp depending on the ISA. When choosing a target object, it is recommended to adhere to the exact match of the CPU name, as well as choose a large vector register size and a larger number of functions. An overview of this process is provided in the file `src/processor.cpp `.