Engee documentation

LLVM User Passes

The page is in the process of being translated.

There are a number of custom LLVM passes in Julia. In general, they can be divided into passes that need to be performed to maintain Julia semantics, and passes that take advantage of Julia semantics to optimize LLVM’s IR.

Semantic passages

These passes are used to convert LLVM IR into code that can be run on the CPU. Their main purpose is to create simpler IR representations during code generation, due to which optimization of common patterns will be performed in other LLVM passes.

CPUFeatures

  • File name: `llvm-cpufeatures.cpp `

  • Class name: CPUFeaturesPass

  • Optimization name: module(CPUFeatures)

This pass reduces the value of the internal function julia.cpu.have_fma.(f32|f64) to true or false, depending on the target architecture and the target features present in the function. This internal function is often used to determine how much use algorithms depend on fast operations. https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply%E2%80%93add [combined multiplication-addition] is better than using standard algorithms that do not depend on such instructions.

DemoteFloat16

  • File name: `llvm-demote-float16.cpp `

  • Class name: DemoteFloat16Pass

  • Optimization name: function(DemoteFloat16)

This pass replaces the operations https://en.wikipedia.org/wiki/Half-precision_floating-point_format [float16] by float32 operations in architectures that do not natively support float16 operations. To do this, insert the fpext and fptrunc instructions around any float16 operation. On architectures that support native float16 operations, this pass does not work.

LateGCLowering

  • File name: `llvm-late-gc-lowering.cpp `

  • Class name: LateLowerGCPass

  • Optimization name: function(LateLowerGCFrame)

This pass does most of the work with garbage collection roots needed to keep track of pointers between garbage collection safety points. It also reduces the level of several internal functions before the corresponding transformation of the instruction and may violate previously established non-integer invariants (pointer_from_objref is lowered here to the ptrpoint instruction). This pass usually lasts longer than all Julia user passes due to the data flow algorithm minimizing the number of objects located at any safe point.

FinalGCLowering

  • File name: `llvm-final-gc-lowering.cpp `

  • Class name: FinalLowerGCPass

  • Optimization name: module(FinalLowerGC)

This pass reduces the level of the last few internal functions to their final version, targeting functions in the libjulia library. Separating this pass from LateGCLowering allows other backends (GPU compilation) to provide their own downgrades for these internal functions so that the Julia pipeline can be used on these backends as well.

LowerHandlers

  • File name: `llvm-lower-handlers.cpp `

  • Class name: LowerExcHandlersPass

  • Optimization name: function(LowerExcHandlers)

This pass reduces the level of internal exception handling functions to calls to runtime functions that are actually called during exception handling.

RemoveNI

  • File name: `llvm-remove-ni.cpp `

  • Class name: RemoveNIPass

  • Optimization name: module(RemoveNI)

This transfer removes non-integer address spaces from the datalayout row of the module. In this case, the backend can reduce the level of Julia user address spaces directly into the machine code without expensive rewriting of each operation with a pointer to address space 0.

SIMDLoop

  • File name: `llvm-simdloop.cpp `

  • Class name: LowerSIMDLoopPass

  • Optimization name: loop(LowerSIMDLoop)

This passage acts as the main driving factor of the @simd annotation. Code generation inserts the marker !llvm.loopid into the back branch of the loop, which in this pass is used to identify loops that were originally labeled @simd'. This pass then looks for a chain of floating-point operations that form reduce, and adds the `contract and reassoc flags of fast math to allow for reconnection (and therefore vectorization). This pass does not save information about the loop or the correctness of the conclusions, so it may unexpectedly violate Julia semantics. If the loop has also been annotated with ivdep, then the pass marks it as having no cyclically generated dependencies (the resulting behavior will be undefined if the user annotation was incorrect or applied to the wrong loop).

LowerPTLS

  • File name: `llvm-ptls.cpp `

  • Class name: LowerPTLSPass

  • Optimization name: module(LowerPTLSPass)

This pass reduces the flow-local level of Julia’s internal functions to assembly instructions. Julia relies on thread-local storage for garbage collection and multithreaded task scheduling. When compiling code for system images and package images, this pass replaces calls to internal functions with loads from global variables that are initialized during loading.

If, during code generation, a function is created with the swiftself argument and a calling convention, this pass assumes that the swiftself argument is pgcstack, and replaces the internal function with this argument. This allows you to speed up work in architectures with slow access to thread-local storage.

RemoveAddrspaces

  • File name: `llvm-remove-addrspaces.cpp `

  • Class name: RemoveAddrspacesPass

  • Optimization name: module(RemoveAddrspaces)

This pass renames pointers in one address space to another address space. This is used to remove Julia-related address spaces from LLVM’s IR.

RemoveJuliaAddrspaces

  • File name: `llvm-remove-addrspaces.cpp `

  • Class name: RemoveJuliaAddrspacesPass

  • Optimization name: module(RemoveJuliaAddrspaces)

This pass removes Julia-specific address spaces from LLVM’s IR. It is mainly used to display LLVM IR in a simpler format. Internally, this passageway is implemented based on the RemoveAddrspaces passageway.

Multiple version control

  • File name: `llvm-multiversioning.cpp `

  • Class name: MultiVersioningPass

  • Optimization name: module(JuliaMultiVersioning)

This pass introduces changes to the module to create functions optimized to work in different architectures (for more information, see sysimg.md and pkgimg.md ). In terms of implementation, it clones functions and applies various platform-specific attributes to them so that the optimizer can take advantage of advanced features such as vectorization and instruction scheduling for a given platform. It also creates a specific infrastructure that allows the Julia image loader to select the appropriate version of the function to call, depending on the architecture on which this loader is running. The attributes specific to the target object are controlled by the module flag julia.mv.specs, which is extracted from the environment variable during compilation. JULIA_CPU_TARGET. To activate the pass, set the module flag julia.mv.enable to 1.

Using llvmcall with multiple version control is dangerous. llvmcall allows you to access functions that are not normally provided by Julia APIs, and therefore are not usually available on all architectures. If multiple version control is enabled and code generation is requested for a target architecture that does not support the function required by the expression llvmcall, an error is likely to occur in LLVM, most likely with shutdown and the message: LLVM ERROR: Do not know how to split the result of this operator!.

GCInvariantVerifier

  • File name: `llvm-gc-invariant-verifier.cpp `

  • Class name: GCInvariantVerifierPass

  • Optimization name: module(GCInvariantVerifier)

This pass is used to verify Julia’s invariants with respect to LLVM’s IR. This includes, for example, the non-existence of `ptrpoint' in https://llvm.org/docs/LangRef.html#non-integral-pointer-type [non-integer address spaces] Julia [1] and the existence of only approved nislides instructions (Tracked-> Derived, 0-> Tracked, etc.). It does not perform any conversion to IR.

Optimization Passes

These passes are used to perform transformations in LLVM’s IR that LLVM will not implement on its own, such as for rapid propagation of mathematical flags, escape analysis, and optimization of Julia-specific internal functions. To perform these optimizations, they use knowledge about Julia semantics.

CombineMulAdd

  • File name: `llvm-muladd.cpp `

  • Class name: CombineMulAddPass

  • Optimization name: function(CombineMulAdd)

This pass is used to optimize a specific combination of a regular fmul function with a fast fadd function into a compressed fmul function with a fast fadd function. Subsequently, the backend optimizes this option into instructions. https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply%E2%80%93add [combined multiplication-addition] to significantly speed up operations at the cost of more https://simonbyrne .github.io/notes/fastmath /[unpredictable semantics].

This optimization occurs only if fmul' has a single use, which is a fast `fadd function.

AllocOpt

  • File name: `llvm-alloc-opt.cpp `

  • Class name: AllocOptPass

  • Optimization name: function(AllocOpt)

In Julia, there is no concept of a software stack as a place to allocate mutable objects. However, allocating objects on the stack reduces the load on garbage collection and is very important for compilation on GPUs. Thus, AllocOpt performs a conversion from the heap to the stack for objects that, as it can prove, are not https://en.wikipedia.org/wiki/Escape_analysis [exit] the current function. It also performs a number of other allocation optimizations, such as removing selections that are never used, optimizing typeof calls to newly allocated objects, and removing saves to selections that are immediately overwritten. The implementation of the escape analysis is in the file `llvm-alloc-helpers.cpp `. Currently, this pass does not use information from EscapeAnalysis.jl, but this situation may change in the future.

PropagateJuliaAddrspaces

  • File name: `llvm-propagate-addrspaces.cpp `

  • Class name: PropagateJuliaAddrspacesPass

  • Optimization name: function(PropagateJuliaAddrspaces)

This pass is used to distribute Julia-related address spaces during pointer operations. LLVM cannot insert or delete addrspacecast instructions using optimizations, so this pass serves to eliminate redundant addrspacecast instructions by replacing operations with their counterpart in the Julia address space. For more information about Julia address spaces, see (TODO link to llvm.md ).

JuliaLICM

  • File name: `llvm-julia-licm.cpp `

  • Class name: JuliaLICMPass

  • Optimization name: loop(JuliaLICM)

This pass is used to output Julia-related internal functions from loops. In particular, it performs the following transformations.

  1. Inferring gc_preserve_begin and gc_preserve_end from loops if the objects being saved are loop -invariant…​ Since objects being saved in a loop are likely to persist throughout the entire loop, this conversion can reduce the number of gc_preserve_begin/`gc_preserve_end' pairs in IR. So LateLowerGCPass can easily determine the storage locations of specific objects.

  2. Output of record barriers with invariant objects.

  3. Here we assume that there are only two generations to which an object can belong. Given this point, the write barrier should be executed only once for any pair of identical objects. Thus, we can remove write barriers from loops if the object being written to is loop-invariant.

  4. Getting distributions out of loops if they don’t exit the loop

    1. Here we use a very conservative definition of output, the same as in `AllocOptPass'. This conversion allows you to reduce the number of selections in the IR, even if the selection goes beyond the function altogether.