Abstract Syntax Trees (AST) in Julia
There are two code representations in Julia. First comes the AST tree of the surface syntax returned by the analyzer (for example, the function Meta.parse
) and macro-driven. This is a structured representation of the code in the form in which it is written, built using `julia-parser.scm' from a stream of characters. Next comes the reduced form, or IR (intermediate representation), which is used in type inference and code generation. In this form, there are fewer node types, all macros are expanded, and the entire control order is transformed into explicit branches and sequences of operators. It is built using `julia-syntax.scm'.
First, we will look at the AST tree, since it is necessary for writing macros.
The AST of the surface syntax
Interface AST consists almost entirely of expressions (Expr
) and small units (for example, characters, numbers). As a rule, each visually distinguishable syntactic form has its own vertex of expression. Examples will be given in the syntax of the s-expression. Each parenthesized list corresponds to an expression where the first element is a vertex. For example, (call f x)
corresponds to Expr(:call, :f, :x)
in Julia.
Challenges
Input | AST |
---|---|
|
|
|
|
|
|
|
|
the do
syntax:
f(x) do a,b
body
end
it is analyzed as (do (call f x) (-> (tuple a b) (block body)))
.
Operators
Most often, operators are used simply for function calls, so they are analyzed using a vertex call. However, some operators are special forms (not necessarily function calls), and in these cases the operator itself is the top of the expression. In julia-parser.scm they are called "syntactic operators". Some operators (+
and *
) N-ary analysis is used. Call chains are analyzed as a single N-argument call. Finally, comparison chains have their own special expression structure.
Input | AST |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Forms enclosed in square brackets
Input | AST |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Macros
Input | AST |
---|---|
|
|
|
|
|
|
Lines
Input | AST |
---|---|
|
|
|
|
|
|
|
|
|
|
Syntax of documentation strings:
"some docs"
f(x) = x
it is analyzed as (macrocall (|.|Core'@doc) (line) "some docs" (= (call f x) (block x)))
.
Import and other
Input | AST |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
using
has the same representation as import', but with the top of the expression `:using
instead of :import
.
Numbers
Julia supports more numeric types than many schema implementations, so not all numbers are represented in the AST directly as schema numbers.
Input | AST |
---|---|
|
|
|
|
|
|
Block Shapes
The operator block is analyzed as (block stmt1 stmt2 ...)
.
The if operator:
if a
b
elseif c
d
else
e
end
it is analyzed as:
(if a (block (line 2) b) (elseif (block (line 3) c) (block (line 4) d) (block (line 6 e))))
The while
loop is analyzed as (while condition body)
.
The for
loop is analyzed as (for (= var iter) body)
. If there are several iteration specifications, they are analyzed as a block: (for (block (= v1 iter1) (= v2 iter2)) body)
.
break
and continue
are analyzed as null-argument expressions (break)
and `(continue)'.
let
is analyzed as (let (= var val) body)
or (let (block (= var1 val1) (= var2 val2) ...) body)
is similar to the for
loops.
The basic definition of a function is analyzed as (function (call f x) body)
. Here is a more complex example:
function f(x::T; k = 1) where T
return x+1
end
it is analyzed as:
(function (where (call f (parameters (kw k 1)) (:: x T)) T) (block (line 2) (return (call + x 1))))
Type definition:
mutable struct Foo{T<:S}
x::T
end
it is analyzed as:
(struct true (curly Foo (<: T S)) (block (line 2) (:: x T)))
The first argument is logical and indicates whether the type is modifiable.
The 'try` blocks are analyzed as (try try_block var catch_block finally_block)
. If there is no variable after catch
, then the variable (var
) will look like #f'. If there is no `finally
clause, then the last argument is missing.
Expressions with quotes
The Julia source syntax forms for enclosing code in quotation marks (quote
and :( )
) support interpolation with $
. In Lisp terminology, this means that they are actually forms of using backquotes. At the internal level, there is also a need to enclose the code in quotation marks without interpolation. In the Julia schema code, a non-interpolating quotation mark is represented by the vertex of the expression inertia
.
The inert
expressions are converted to Julia QuoteNode
objects. These objects encapsulate a single value of any type and simply return it when evaluated.
The `quote' expression, whose argument is a small element, is also converted to a `QuoteNode'.
Line numbers
Information about the location of the source code is presented in the form of (line line_num file_name)
, where the third component is optional (and omitted when the current line number changes, but not the file name).
These expressions are represented in Julia as `LineNumberNode'.
Macros
The hygiene of the macro is represented through a pair of vertices of the expression escape
and hygienic-scope'. The result of macro expansion is automatically a `(hygienic-scope block module)
to represent the result of a new area. The user can insert (escape block)
inside to interpolate the code from the caller.
Reduced form
The reduced form (IR) is more important to the compiler because it is used for type inference, optimization such as embedding, and code generation. It is also less obvious to humans, as it results from a significant reordering of the input syntax.
In addition to symbols (Symbol
) and some numeric types, the following data types exist in reduced form.
-
Expr
It has the node type specified by the
head
field and theargs
field, which is aVector{Any}
subexpressions . Whereas almost every part of the surface AST is represented by theExpr
type, IR uses only a limited number ofExprs
and mostly only for calls and some top-level forms. -
SlotNumber
Identifies arguments and local variables using sequential numbering. It has an
id' field with an integer value representing the slot index. The types of these slots can be found in the `slottypes
field of theirCodeInfo
object. -
Argument
The same as `SlotNumber', but appears only after optimization. Indicates that the slot referenced is an argument of the enabling function.
-
CodeInfo
Wraps the IR of a group of operators. Its
code
field is an array of expressions to execute. -
GotoNode
The unconditional branch. The argument is the branch target object, represented as an index in the array of code to be moved to.
-
GotoIfNot
A conditional branch. If the
cond
field is set to false, it goes to the index defined by thedest
field. -
ReturnNode
Returns its argument (the
val
field) as the value of the enclosing function. If the field isval
undefined, represents an unavailable operator. -
QuoteNode
Wraps an arbitrary value that will be referenced as data. For example, the function
f() = :a
contains aQuoteNode
, thevalue
field of which is the charactera
, in order to return the character itself, rather than defining it. -
GlobalRef
Refers to the global variable
name
in themod
module. -
SSAValue
Refers to a sequentially numbered (starting from 1) static single assignment variable (SSA) inserted by the compiler. The
SSAValue
number (id
) is the index of the array of expression code whose value it represents. -
NewvarNode
Marks the point where the variable (slot) is created. This causes the variable to be reset to undefined.
Types of Expr
These characters are displayed in the 'head` field of expressions (Expr
) in reduced form.
-
call
Function call (dynamic dispatching).
args[1]
is the function being called,args[2:end]
are the arguments. -
invoke
Function call (static dispatching).
args[1]
is the MethodInstance being called,args[2:end]
are the arguments (including the function being called withargs[2]
). -
static_parameter
Indicates a static parameter by index.
-
=
Assignment. In IR, the first argument is always
SlotNumber
or `GlobalRef'. -
method
Adds a method to a universal function and assigns the result if necessary.
It has a one-document form and a three-document form. The source of the single-argument form is the syntax `function foo end'. In single-argument form, an argument is a symbol. If this symbol already names a function in the current scope, nothing happens. If the symbol is not defined, a new function is created and assigned to the identifier specified by the symbol. If the symbol is defined but does not name the function, an error occurs. The definition of "names a function" is that the binding is permanent, and refers to an object of a single type. This is because an instance of a single type identifies the type to which the method needs to be added. When a type has fields, it will be unclear whether the method is added to the instance or its type.
The three-argument form has the following arguments:
* `args[1]` Имя функции или `nothing`, если неизвестно или не требуется. Если символ, то выражение сначала ведет себя как одноаргументная форма выше. В дальнейшем этот аргумент игнорируется. Может быть `nothing`, когда методы добавляются строго по типу, `(::T)(x) = x`, или когда метод добавляется к существующей функции, `MyModule.f(x) = x`. * `args[2]` `SimpleVector` данных типа аргумента. `args[2][1]` — это `SimpleVector` типов аргументов, а `args[2][2]` — это `SimpleVector` переменных типов, соответствующих статическим параметрам метода. * `args[3]` `CodeInfo` самого метода. Для определений методов «вне области» (добавление метода к функции, которая также имеет методы, определенные в других областях) это выражение, которое вычисляется в выражение `:lambda`.
-
struct_type
A semiargument expression that defines a new structure (
struct
).* `args[1]` Имя структуры `struct`. * `args[2]` Выражение `call`, которое создает `SimpleVector`, указывая его параметры. * `args[3]` Выражение `call`, которое создает `SimpleVector`, указывая его имена полей. * `args[4]` `Symbol`, `GlobalRef` или `Expr`, указывающие супертип (например, `:Integer`, `GlobalRef(Core, :Any)` или `:(Core.apply_type(AbstractArray, T, N))`). * `args[5]` Выражение `call`, которое создает `SimpleVector`, указывая его типы полей. * `args[6]` Логический, имеет значение true, если является изменяемым (`mutable`). * `args[7]` The number of arguments to initialize. This will be the number of fields or the minimum number of fields called by the `new` operator of the internal constructor.
-
abstract_type
A three-argument expression that defines a new abstract type. The arguments match arguments 1, 2, and 4 of the
struct_type
expressions. -
primitive_type
A four-argument expression that defines a new primitive type. Arguments 1, 2, and 4 are the same as in `struct_type'. Argument 3 is the number of bits.
!!! compat "Julia 1.5"
struct_type
,abstract_type
andprimitive_type
were removed in Julia 1.5 and replaced with calls to new embedded objects. -
global
Declares a global binding.
-
const
Declares a (global) variable as a constant.
-
new
Highlights a new structure-like object. The first argument is the type. The pseudo-function level
new
is downgraded to this, and the type is always inserted by the compiler. It’s pretty much just an internal function that doesn’t perform validation. Defining arbitrarynew
expressions can easily lead to failure. -
splatnew
It is similar to
new
, except that the field values are passed as a single tuple. It works the same way assplat(new)
ifnew
was a first-class function. Hence the name. -
isdefined
Expr(:isdefined, :x)
returns a boolean value indicating whetherx
has already been defined in the current scope. -
the_exception
Throws a caught exception inside the
catch
block as returned by `jl_current_exception(ct)'. -
enter
Introduces an event handler ('setjmp`).
args[1]
is the label of the catch block to switch to in case of an error . Issues the token used by `pop_exception'. -
leave
Returns exception handlers.
args[1]
is the number of handlers to return. -
pop_exception
Returns the state of the current exceptions to the state in the associated
enter
argument when exiting the catch block. 'args[1]` contains a token from the associatedenter
argument.!!! compat "Julia 1.1" The
pop_exception
argument is new in Julia 1.1. -
inbounds
Controls whether border checks are enabled or disabled. A stack is being maintained. If the first argument of this expression is true or false (
true
means that bounds checking is disabled), it is sent to the stack. If the first argument is:pop
, the stack is removed. -
boundscheck
It has the value
false
if it is inserted into a section of code marked with the macro `@inbounds', otherwise it has the value `true'. -
loopinfo
Marks the end of the cycle. It contains metadata that is passed to the
LowerSimdLoop
to either mark the inner loop of the expression@simd
or to distribute information in the LLVM loop transmission. -
copyast
Part of the quasi-quotation mark implementation. The argument is the surface syntax of the AST, which is simply recursively copied and returned at runtime.
-
meta
Metadata. 'args[1]` is usually a character indicating the type of metadata, and the rest of the arguments are in free form. The following types of metadata are commonly used.
* `:inline` and `:noinline`: Inlining hints.
-
foreigncall
A statically calculated container for information on the keyword `ccall'. The following fields are used.
* `args[1]` : name Выражение, которое будет проанализировано для внешней функции. * `args[2]::Type` : RT (Литеральный) возвращаемый тип, вычисленный статически, когда был определен содержащий метод. * `args[3]::SimpleVector` (of Types) : AT (Литеральный) вектор типов аргументов, вычисленный статически, когда был определен содержащий метод. * `args[4]::Int` : nreq Количество необходимых аргументов для определения функции с переменным количеством аргументов. * `args[5]::QuoteNode{Symbol}` : calling convention Соглашение о вызовах для вызова. * `args[6:5+length(args[3])]` : arguments Значения для всех аргументов (типы каждого из них указаны в args[3]). * `args[6+length(args[3])+1:end]` : gc-roots Дополнительные объекты, которым может потребоваться предоставить права суперпользователя при сборке мусора на время вызова. Сведения об источнике этих объектов и способе их обработки см. в главе [Работа с LLVM](@ref Working-with-LLVM).
-
new_opaque_closure
Creates a new opaque closure. The following fields are used.
* `args[1]` : signature Сигнатура функции для непрозрачного замыкания. Непрозрачные замыкания не используются в диспетчеризации, но входные типы могут быть ограничены. * `args[2]` : isva Указывает, принимает ли замыкание переменное количество аргументов. * `args[3]` : lb Нижняя граница для типа вывода. (По умолчанию имеет значение `Union{}`) * `args[4]` : ub Верхняя граница для типа вывода. (По умолчанию имеет значение `Any`) * `args[5]` : method Фактический метод как выражение `opaque_closure_method`. * `args[6:end]` : captures Значения, записываемые непрозрачным замыканием.
!!! compat "Julia 1.7" Opaque closures were added in Julia 1.7.
Method
A unique container describing common metadata for a single method.
-
name
,module
,file
,line
,sig
Metadata for a unique method definition for computer and human.
-
ambig
A cache of other methods that may be ambiguous with this.
-
specializations
A cache of all MethodInstance ever created for this Method, used to ensure uniqueness. Uniqueness is necessary to ensure efficiency, especially for incremental pre-compilation and tracking method invalidations.
-
source
The original source code (if available, usually in compressed form).
-
generator
A callable object that can be executed to obtain a specialized source for a specific method signature.
-
roots
Pointers to non-AST objects that have been interpolated into the AST, required for AST compression, type inference, or native code generation.
-
nargs
,isva
,called
,is_for_opaque_closure
,Descriptive bit fields for the source code of this Method.
-
primary_world
The "Age of the world" (hierarchy of method definitions) for this Method.
MethodInstance
A unique container describing a single callable signature for a Method. For important information about safely changing these fields, see Proper maintenance of multithreaded locks.
-
specTypes
The primary key for this MethodInstance. Uniqueness is guaranteed by searching for `def.specializations'.
-
def
The method (`Method') whose specialization this function describes. Or a module (`Module'), if it is a top-level lambda extended in Module and which is not part of the Method.
-
sparam_vals
Values of static parameters in
specTypes'. For the `MethodInstance
container inMethod.unspecified
is an empty vectorSimpleVector
. But for theMethodInstance
container of the runtime environment from theMethodTable
cache, this value will always be defined and indexed. -
uninferred
Uncompressed source code for the top-level converter. In addition, for the generated function, this is one of the many places where the source code can be located.
-
backedges
We keep a reverse list of cache dependencies to effectively track additional reanalysis or recompilation work that may be required after the new method definitions. To do this, we keep a list of other
MethodInstance
containers that have been derived or optimized to contain a possible call to thisMethodInstance
container. Optimization results may be stored somewhere in the cache (cache
), or it may be the result of something that does not need to be cached, such as constant propagation. Thus, we combine all this information into various cache entries (there is almost always only one applicable cache entry with a reference value for max_world). -
cache
Cache of `CodeInstance' objects for which this template instance is shared.
CodeInstance
-
def
The `MethodInstance' container from which this cache entry was obtained.
-
owner
A token representing the owner of this
CodeInstance'. Will use `jl_egal
to match. -
rettype
/rettype_const
The output return type for the 'specFunctionObject` field, which (in most cases) is also the calculated return type for the function as a whole.
-
inferred
It may contain a cache of the output source code for this function, or it may have the value
nothing
to simply indicate that nothing is output ('rettype'). -
ftpr
The universal jlcall entry point.
-
jlcall_api
The ABI used when calling `fptr'. Below are some important values.
* 0 - Not compiled yet * 1 - `JL_CALLABLE` `jl_value_t *(*)(jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)` * 2 - Constant (value stored in `rettype_const`) * 3 - With Static-parameters forwarded `jl_value_t *(*)(jl_svec_t *sparams, jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)` * 4 - Run in interpreter `jl_value_t *(*)(jl_method_instance_t *meth, jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)`
-
min_world
/max_world
The range of "ages of the world" (hierarchies of method definitions) for which this method instance is valid to call. If max_world is a special value of the -1 token, its value is not yet known. You can continue to use it until there is information that needs to be reviewed.
CodeInfo
A (usually temporary) container for storing downgraded source code.
-
code
An array of
Any
operators. -
slotnames
An array of characters specifying names for each slot (argument or local variable).
-
slotflags
An array of
UInt8
slot properties represented as bit flags:* 0x02 - assigned (only false if there are *no* assignment statements with this var on the left) * 0x08 - used (if there is any read or write of the slot) * 0x10 - statically assigned once * 0x20 - might be used before assigned. This flag is only valid after type inference.
-
ssavaluetypes
Any array or integer (
Int
).If an integer (
Int
), it specifies the number of temporary locations inserted by the compiler in the function (the length of thecode
array). If it is an array, it sets the type for each location. -
ssaflags
32-bit operator-level flags for each expression in the function. For more information, see the definition of
jl_code_info_t
in the julia.h file. -
linetable
An array of source code location objects.
-
codelocs
An array of integer indexes in a
linetable
indicating the location associated with each operator.
Optional fields:
-
slottypes
An array of types for slots.
-
rettype
The output is a reduced-form return type (IR). The default value is `Any'.
-
method_for_inference_limit_heuristics
method_for_inference_heuristics
extends the generator of this method, if necessary during output. -
parent
The `MethodInstance' container that "owns" this object (if applicable).
-
edges
Passing edges to method instances that should be invalidated.
-
min_world
/max_world
The range of "ages of the world" (hierarchies of method definitions) for which this code was valid at the time of its output.
Logical properties:
-
inferred
Whether it was obtained by type inference.
-
inlineable
Whether it is suitable for embedding.
-
propagate_inbounds
Should
@inbounds
be distributed when embedding in order to skip the@boundscheck
blocks.
UInt8
parameters:
-
constprop
-
0 — use heuristics
-
1 — aggressive mode
-
2 — none
-
-
purity
Consists of 5-bit flags:-
0x01 << 0
— this method is guaranteed to return control or terminate execution consistently (:consistent
) -
0x01 << 1
— this method is devoid of externally semantically visible side effects (:effect_free
) -
0x01 << 2
— this method is guaranteed not to raise an exception (:nothrow
) -
0x01 << 3
— this method is guaranteed to terminate execution (:terminates_globally
) -
0x01 << 4
— the syntactic order of execution inside this method is guaranteed to terminate execution (:terminates_locally
)For more information, see the
Base' documentation.@assume_effects
.
-