IR representation in SSA form in Julia
Julia uses a static intermediate representation with a single assignment to perform optimization (https://en.wikipedia.org/wiki/Static_single-assignment_form [SSA IR]). This IR representation is different from LLVM’s IR representation and is unique to Julia. It allows you to optimize your work with Julia.
-
The base blocks (areas without execution order) are annotated explicitly.
-
If/else statements and loops are converted to
goto
statements. -
Lines with multiple operations are split into multiple lines by introducing variables.
For example, the following Julia code:
function foo(x)
y = sin(x)
if x > 5.0
y = y + cos(x)
end
return exp(2) + y
end
when called with the Float64
argument, it is converted to the following:
using InteractiveUtils
@code_typed foo(1.0)
CodeInfo(
1 ─ %1 = invoke Main.sin(x::Float64)::Float64
│ %2 = Base.lt_float(x, 5.0)::Bool
└── goto #3, если не %2
2 ─ %4 = invoke Main.cos(x::Float64)::Float64
└── %5 = Base.add_float(%1, %4)::Float64
3 ┄ %6 = φ (#2 => %5, #1 => %1)::Float64
│ %7 = Base.add_float(7.38905609893065, %6)::Float64
└── return %7
) => Float64
In this example, we can see all the changes.
-
The first basic block is everything that is below
1 ─ %1 = invoke Main.sin(x::Float64)::Float64
│ %2 = Base.lt_float(x, 5.0)::Bool
└── goto #3, если не %2
-
The
if
statement is converted to the expressiongoto #3 if not %2
, which goes into the third base block ifx>5
is not executed, and otherwise goes into the second base block. -
'%2` is the SSA value entered to represent `x > 5'.
Background of the issue
Starting with Julia 0.7, some compiler components use the new intermediate representation (IR) in https://en.wikipedia.org/wiki/Static_single_assignment_form [SSA form]. Previously, the compiler directly generated the LLVM IR representation from the reduced AST Julia representation. In this form, syntactic abstractions were mostly eliminated, but still it was very similar to an abstract syntax tree. To simplify optimizations, over time, SSA values were introduced into this IR representation and the IR representation was reduced to a linear form (that is, to one where function arguments could only be SSA values or constants). However, values other than SSA (slots) remained in the IR due to the absence of phi nodes in the IR (which are necessary for reverse edges and re-merging of the conditional execution order). This negated many of the advantages of the SSA representation when performing intermediate optimizations. Heroic efforts were made to make optimizations work without a complete representation in SSA form, but the lack of such representation eventually proved to be an insurmountable obstacle.
Categories of IR nodes
There are four categories of IR nodes in the SSA IR representation: phi nodes, pi nodes, PHIS nodes, and upsilon nodes (the latter two are used only for exception handling).
Phi and pi nodes
Phi nodes are part of the universal SSA abstraction (if this is a new concept for you, see the information on the link above). In IR Julia, these nodes are represented as follows:
struct PhiNode edges::Vector{Int32} values::Vector{Any} end
where it is guaranteed that both vectors always have the same length. In the canonical representation (processed by the code generator and interpreter), the edge values indicate the numbers of the source operators (that is, if the edge has the value 15
, there must be a transition goto
, gotoifnot
or an implicit transfer of control from the operator 15
, the purpose of which is this node). The values can be either SSA values or constants. The value may not be assigned if the variable along this path is not defined. However, uncertainty checks are explicitly added and represented by Boolean values after intermediate optimizations, so code generators can assume that any time a phi node is used, the assigned value will be in the appropriate slot. In addition, an incomplete representation is allowed, that is, the phi node may lack incoming edges. In this case, it is required to dynamically ensure that the corresponding value is not used.
Note that semantically, SSAS are applied after the completion symbol of the corresponding predecessor ("on the edge"). Therefore, if several phi nodes appear at the beginning of the base block, they are executed simultaneously. This means that in the next IR fragment, if we came from block 23
, %46
will take the value associated with %45
before we enter this block.
%45 = φ (#18 => %23, #23 => %50)
%46 = φ (#18 => 1.0, #23 => %45)
The pi nodes encode statically validated information that can be implicitly assumed in the base blocks controlled by the given pi node. According to the principle of operation, they are equivalent to the technique presented in the document. https://dl.acm.org/citation.cfm?id=358438.349342 [ABCD:]https://dl.acm.org/citation.cfm?id=358438.349342 []https://dl.acm.org/citation.cfm?id=358438.349342 [Eliminating Array Bounds Checks on Demand] (ABCD: eliminating array bounds checks on demand), or predicate information nodes in LLVM. Let’s take their work as an example.
%x::Union{Int, Float64} # %x — это некоторое значение ssa типа Union{Int, Float64}
if isa(x, Int)
# используем x
else
# используем x
end
We can insert the predicate and convert this code to the following:
%x::Union{Int, Float64} # %x — это некоторое значение ssa типа Union{Int, Float64}
if isa(x, Int)
%x_int = PiNode(x, Int)
# используем %x_int
else
%x_float = PiNode(x, Float64)
# используем %x_float
end
Pi nodes are usually ignored by the interpreter, since they have no effect on the values, but sometimes they can lead to the formation of code by the compiler (for example, to convert a representation with implicit division of a union into a simple unpackaged representation). The main benefit of pi nodes is explained by the fact that path conditions for values can accumulate as a result of a simple passage through the definition-use chain, which is somehow carried out with most optimizations that take these conditions into account.
PHIS and upsilon nodes
Exception handling slightly complicates the SSA operation scheme, as it introduces additional execution order edges in the IR, along which values must be tracked. One of the approaches that is used in LLVM is to make calls that can pass exceptions to the base block terminators, and add an explicit execution order edge to the interception handler.:
invoke @function_that_may_throw() to label %regular unwind to %catch regular: # Порядок выполнения продолжается здесь catch: # Исключения поступают сюда
However, in languages like Julia, where it is not known at the beginning of the optimization pipeline which calls create exceptions, this can be problematic. Just in case, we would have to assume that every call throws an exception (that is, every statement in Julia). Such an approach would have a number of negative consequences. First, the scope of each base block would actually be reduced to a single call, which would make it pointless to perform operations at the base block level. Secondly, each basic catch block would have n*m
node arguments (where n
is the number of statements in a critical section of the code, and m
is the number of active values in the catch block).
To get around this problem, we use a combination of the nodes Upsilon
and PhiC
(where C stands for catch
; in the IR structural printout program, this is written as φᶜ
, since the subscript for the character "c" is unavailable in Unicode). The purpose of these nodes can be explained in different ways, but it’s probably easiest to imagine each node as a PhiC
. as loading from a unique slot for multiple saving and single reading, and Upsilon
as the corresponding saving operation. The node PhiC
contains a list of operands of all upsilon nodes that perform saving to its implicit slot. The 'Upsilon` nodes, however, do not register information about the PhiC
node they are saving to. This ensures a more natural integration with the rest of the IR SSA representation. For example, if the PhiC node is no longer in use, you can safely delete it. The same is true for the 'Upsilon` node. In most IR passes, the PhiC
nodes can be treated as Phi
nodes. They allow you to follow the "use - definition" chains, and they can be upgraded to new PhiC
and Upsilon
nodes (in the same places as the original Upsilon
nodes). The result of this scheme is that the number of Upsilon
nodes (and PhiC
arguments) is proportional to the number of values assigned to a particular variable (before the SSA transformation), rather than the number of statements in a critical section of the code.
To see this scheme in action, consider the following function.
@noinline opaque() = invokelatest(identity, nothing) # Что-то непрозрачное
function foo()
local y
x = 1
try
y = 2
opaque()
y = 3
error()
catch
end
(x, y)
end
The corresponding IR representation (with the removal of irrelevant types) looks like this:
1 ─ nothing::Nothing 2 ─ %2 = $(Expr(:enter, #4)) 3 ─ %3 = ϒ (false) │ %4 = ϒ (#undef) │ %5 = ϒ (1) │ %6 = ϒ (true) │ %7 = ϒ (2) │ invoke Main.opaque()::Any │ %9 = ϒ (true) │ %10 = ϒ (3) │ invoke Main.error()::Union{} └── $(Expr(:unreachable))::Union{} 4 ┄ %13 = φᶜ (%3, %6, %9)::Bool │ %14 = φᶜ (%4, %7, %10)::Core.Compiler.MaybeUndef(Int64) │ %15 = φᶜ (%5)::Core.Const(1) └── $(Expr(:leave, Core.SSAValue(2))) 5 ─ $(Expr(:pop_exception, :(%2)))::Any │ $(Expr(:throw_undef_if_not, :y, :(%13)))::Any │ %19 = Core.tuple(%15, %14) └── return %19
In particular, note that each active value in a critical section of the code receives an upsilon node at the top of this critical section. The reason is that catch blocks are considered to have an invisible edge in the order of execution from outside the function. Therefore, the catch blocks are not controlled by any SSA value and all incoming values must pass through the node φᶜ
.
Basic SSA Data Structure
The basic data structure of SSAIR' deserves attention. It was inspired by LLVM and Webkit B3 IR. The data structure is based on a flat vector of operators. Each operator is implicitly assigned an SSA value according to its position in the vector (that is, the result of the operator at position idx 1 can be accessed using `SSAValue(1)
, etc.). For each SSA value, its type is additionally determined. Since, by definition, SSA values are assigned only once, the type is also the resulting type of the expression at the corresponding index. However, although this representation is quite effective (since assignments do not need to be explicitly encoded), it has the obvious disadvantage that the order has semantic significance, which is why operator numbers change during reordering and insertion. In addition, usage lists are not maintained (that is, it is impossible to go from a definition to all its use cases without explicitly calculating the mapping; however, definition lists do not pose a particular problem, since the corresponding operator can be found by the index), therefore the RAUW operation ("replace all use cases with") in the LLVM style is not available.
Instead, we do the following.
-
We keep a separate buffer of nodes for insertion (including the insertion position, the type of the corresponding value, and the node itself). The nodes are numbered according to the order in the insert buffer, which allows their values to be used directly in other places of the IR (that is, if there are 12 operators in the original list of operators, the first new operator will be available as
SSAValue(13)
). -
RAUW-style operations are performed by assigning a new value to the corresponding operator index.
-
To delete an operator, it is assigned the value
nothing
(this is essentially just a special case within the framework of the convention described above). -
If there are cases where the deleted operator is used, they are assigned `nothing'.
There is a function called compact!
that compresses the data structure described above by inserting nodes in appropriate locations, distributing copies normally, and renaming use cases according to the new SSA values. However, a notable feature of this scheme is that compression can be performed "lazily" as part of a subsequent pass. Most optimization passes must go through the entire list of operators, performing analysis or changes in the process. To iterate through the list of operators, we provide the IncrementalCompact
iterator. It performs the necessary compression and returns the new node index, as well as the node itself. At this stage, it is allowed to bypass the definition-use chains, as well as make changes and delete elements in the IR (inserts, however, are prohibited).
The point of this scheme is that, since the optimization passes must in any case access the appropriate memory areas, which entails overhead, the cost of additional cleanup will be relatively small (and the cost of maintaining these data structures during the IR change will be eliminated).