Abstract Syntax Trees (AST) in Julia
There are two code representations in Julia. First comes the AST tree of the surface syntax returned by the analyzer (for example, the function Meta.parse) and macro-driven. This is a structured representation of the code in the form in which it is written, built using `julia-parser.scm' from a stream of characters. Next comes the reduced form, or IR (intermediate representation), which is used in type inference and code generation. In this form, there are fewer node types, all macros are expanded, and the entire control order is transformed into explicit branches and sequences of operators. It is built using `julia-syntax.scm'.
First, we will look at the AST tree, since it is necessary for writing macros.
The AST of the surface syntax
Interface AST consists almost entirely of expressions (Expr) and small units (for example, characters, numbers). As a rule, each visually distinguishable syntactic form has its own vertex of expression. Examples will be given in the syntax of the s-expression. Each parenthesized list corresponds to an expression where the first element is a vertex. For example, (call f x) corresponds to Expr(:call, :f, :x) in Julia.
Challenges
| Input | AST |
|---|---|
|
|
|
|
|
|
|
|
the do syntax:
f(x) do a,b
body
end
it is analyzed as (do (call f x) (-> (tuple a b) (block body))).
Operators
Most often, operators are used simply for function calls, so they are analyzed using a vertex call. However, some operators are special forms (not necessarily function calls), and in these cases the operator itself is the top of the expression. In julia-parser.scm they are called "syntactic operators". Some operators (+ and *) N-ary analysis is used. Call chains are analyzed as a single N-argument call. Finally, comparison chains have their own special expression structure.
| Input | AST |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Forms enclosed in square brackets
| Input | AST |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Macros
| Input | AST |
|---|---|
|
|
|
|
|
|
Lines
| Input | AST |
|---|---|
|
|
|
|
|
|
|
|
|
|
Syntax of documentation strings:
"some docs"
f(x) = x
it is analyzed as (macrocall (|.|Core'@doc) (line) "some docs" (= (call f x) (block x))).
Import and other
| Input | AST |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
using has the same representation as import', but with the top of the expression `:using instead of :import.
Numbers
Julia supports more numeric types than many schema implementations, so not all numbers are represented in the AST directly as schema numbers.
| Input | AST |
|---|---|
|
|
|
|
|
|
Block Shapes
The operator block is analyzed as (block stmt1 stmt2 ...).
The if operator:
if a
b
elseif c
d
else
e
end
it is analyzed as:
(if a (block (line 2) b)
(elseif (block (line 3) c) (block (line 4) d)
(block (line 6 e))))
The while loop is analyzed as (while condition body).
The for loop is analyzed as (for (= var iter) body). If there are several iteration specifications, they are analyzed as a block: (for (block (= v1 iter1) (= v2 iter2)) body).
break and continue are analyzed as null-argument expressions (break) and `(continue)'.
let is analyzed as (let (= var val) body) or (let (block (= var1 val1) (= var2 val2) ...) body) is similar to the for loops.
The basic definition of a function is analyzed as (function (call f x) body). Here is a more complex example:
function f(x::T; k = 1) where T
return x+1
end
it is analyzed as:
(function (where (call f (parameters (kw k 1))
(:: x T))
T)
(block (line 2) (return (call + x 1))))
Type definition:
mutable struct Foo{T<:S}
x::T
end
it is analyzed as:
(struct true (curly Foo (<: T S))
(block (line 2) (:: x T)))
The first argument is logical and indicates whether the type is modifiable.
The 'try` blocks are analyzed as (try try_block var catch_block finally_block). If there is no variable after catch, then the variable (var) will look like #f'. If there is no `finally clause, then the last argument is missing.
Expressions with quotes
The Julia source syntax forms for enclosing code in quotation marks (quote and :( )) support interpolation with $. In Lisp terminology, this means that they are actually forms of using backquotes. At the internal level, there is also a need to enclose the code in quotation marks without interpolation. In the Julia schema code, a non-interpolating quotation mark is represented by the vertex of the expression inertia.
The inert expressions are converted to Julia QuoteNode objects. These objects encapsulate a single value of any type and simply return it when evaluated.
The `quote' expression, whose argument is a small element, is also converted to a `QuoteNode'.
Line numbers
Information about the location of the source code is presented in the form of (line line_num file_name), where the third component is optional (and omitted when the current line number changes, but not the file name).
These expressions are represented in Julia as `LineNumberNode'.
Macros
The hygiene of the macro is represented through a pair of vertices of the expression escape and hygienic-scope'. The result of macro expansion is automatically a `(hygienic-scope block module) to represent the result of a new area. The user can insert (escape block) inside to interpolate the code from the caller.
Reduced form
The reduced form (IR) is more important to the compiler because it is used for type inference, optimization such as embedding, and code generation. It is also less obvious to humans, as it results from a significant reordering of the input syntax.
In addition to symbols (Symbol) and some numeric types, the following data types exist in reduced form.
-
ExprIt has the node type specified by the
headfield and theargsfield, which is aVector{Any}subexpressions . Whereas almost every part of the surface AST is represented by theExprtype, IR uses only a limited number ofExprsand mostly only for calls and some top-level forms. -
SlotNumberIdentifies arguments and local variables using sequential numbering. It has an
id' field with an integer value representing the slot index. The types of these slots can be found in the `slottypesfield of theirCodeInfoobject. -
ArgumentThe same as `SlotNumber', but appears only after optimization. Indicates that the slot referenced is an argument of the enabling function.
-
CodeInfoWraps the IR of a group of operators. Its
codefield is an array of expressions to execute. -
GotoNodeThe unconditional branch. The argument is the branch target object, represented as an index in the array of code to be moved to.
-
GotoIfNotA conditional branch. If the
condfield is set to false, it goes to the index defined by thedestfield. -
ReturnNodeReturns its argument (the
valfield) as the value of the enclosing function. If the field isvalundefined, represents an unavailable operator. -
QuoteNodeWraps an arbitrary value that will be referenced as data. For example, the function
f() = :acontains aQuoteNode, thevaluefield of which is the charactera, in order to return the character itself, rather than defining it. -
GlobalRefRefers to the global variable
namein themodmodule. -
SSAValueRefers to a sequentially numbered (starting from 1) static single assignment variable (SSA) inserted by the compiler. The
SSAValuenumber (id) is the index of the array of expression code whose value it represents. -
NewvarNodeMarks the point where the variable (slot) is created. This causes the variable to be reset to undefined.
Types of Expr
These characters are displayed in the 'head` field of expressions (Expr) in reduced form.
-
callFunction call (dynamic dispatching).
args[1]is the function being called,args[2:end]are the arguments. -
invokeFunction call (static dispatching).
args[1]is the MethodInstance being called,args[2:end]are the arguments (including the function being called withargs[2]). -
static_parameterIndicates a static parameter by index.
-
=Assignment. In IR, the first argument is always
SlotNumberor `GlobalRef'. -
methodAdds a method to a universal function and assigns the result if necessary.
It has a one-document form and a three-document form. The source of the single-argument form is the syntax `function foo end'. In single-argument form, an argument is a symbol. If this symbol already names a function in the current scope, nothing happens. If the symbol is not defined, a new function is created and assigned to the identifier specified by the symbol. If the symbol is defined but does not name the function, an error occurs. The definition of "names a function" is that the binding is permanent, and refers to an object of a single type. This is because an instance of a single type identifies the type to which the method needs to be added. When a type has fields, it will be unclear whether the method is added to the instance or its type.
The three-argument form has the following arguments:
* `args[1]` Имя функции или `nothing`, если неизвестно или не требуется. Если символ, то выражение сначала ведет себя как одноаргументная форма выше. В дальнейшем этот аргумент игнорируется. Может быть `nothing`, когда методы добавляются строго по типу, `(::T)(x) = x`, или когда метод добавляется к существующей функции, `MyModule.f(x) = x`. * `args[2]` `SimpleVector` данных типа аргумента. `args[2][1]` — это `SimpleVector` типов аргументов, а `args[2][2]` — это `SimpleVector` переменных типов, соответствующих статическим параметрам метода. * `args[3]` `CodeInfo` самого метода. Для определений методов «вне области» (добавление метода к функции, которая также имеет методы, определенные в других областях) это выражение, которое вычисляется в выражение `:lambda`.
-
struct_typeA semiargument expression that defines a new structure (
struct).* `args[1]` Имя структуры `struct`. * `args[2]` Выражение `call`, которое создает `SimpleVector`, указывая его параметры. * `args[3]` Выражение `call`, которое создает `SimpleVector`, указывая его имена полей. * `args[4]` `Symbol`, `GlobalRef` или `Expr`, указывающие супертип (например, `:Integer`, `GlobalRef(Core, :Any)` или `:(Core.apply_type(AbstractArray, T, N))`). * `args[5]` Выражение `call`, которое создает `SimpleVector`, указывая его типы полей. * `args[6]` Логический, имеет значение true, если является изменяемым (`mutable`). * `args[7]` The number of arguments to initialize. This will be the number of fields or the minimum number of fields called by the `new` operator of the internal constructor.
-
abstract_typeA three-argument expression that defines a new abstract type. The arguments match arguments 1, 2, and 4 of the
struct_typeexpressions. -
primitive_typeA four-argument expression that defines a new primitive type. Arguments 1, 2, and 4 are the same as in `struct_type'. Argument 3 is the number of bits.
!!! compat "Julia 1.5"
struct_type,abstract_typeandprimitive_typewere removed in Julia 1.5 and replaced with calls to new embedded objects. -
globalDeclares a global binding.
-
constDeclares a (global) variable as a constant.
-
newHighlights a new structure-like object. The first argument is the type. The pseudo-function level
newis downgraded to this, and the type is always inserted by the compiler. It’s pretty much just an internal function that doesn’t perform validation. Defining arbitrarynewexpressions can easily lead to failure. -
splatnewIt is similar to
new, except that the field values are passed as a single tuple. It works the same way assplat(new)ifnewwas a first-class function. Hence the name. -
isdefinedExpr(:isdefined, :x)returns a boolean value indicating whetherxhas already been defined in the current scope. -
the_exceptionThrows a caught exception inside the
catchblock as returned by `jl_current_exception(ct)'. -
enterIntroduces an event handler ('setjmp`).
args[1]is the label of the catch block to switch to in case of an error . Issues the token used by `pop_exception'. -
leaveReturns exception handlers.
args[1]is the number of handlers to return. -
pop_exceptionReturns the state of the current exceptions to the state in the associated
enterargument when exiting the catch block. 'args[1]` contains a token from the associatedenterargument.!!! compat "Julia 1.1" The
pop_exceptionargument is new in Julia 1.1. -
inboundsControls whether border checks are enabled or disabled. A stack is being maintained. If the first argument of this expression is true or false (
truemeans that bounds checking is disabled), it is sent to the stack. If the first argument is:pop, the stack is removed. -
boundscheckIt has the value
falseif it is inserted into a section of code marked with the macro `@inbounds', otherwise it has the value `true'. -
loopinfoMarks the end of the cycle. It contains metadata that is passed to the
LowerSimdLoopto either mark the inner loop of the expression@simdor to distribute information in the LLVM loop transmission. -
copyastPart of the quasi-quotation mark implementation. The argument is the surface syntax of the AST, which is simply recursively copied and returned at runtime.
-
metaMetadata. 'args[1]` is usually a character indicating the type of metadata, and the rest of the arguments are in free form. The following types of metadata are commonly used.
* `:inline` and `:noinline`: Inlining hints.
-
foreigncallA statically calculated container for information on the keyword `ccall'. The following fields are used.
* `args[1]` : name Выражение, которое будет проанализировано для внешней функции. * `args[2]::Type` : RT (Литеральный) возвращаемый тип, вычисленный статически, когда был определен содержащий метод. * `args[3]::SimpleVector` (of Types) : AT (Литеральный) вектор типов аргументов, вычисленный статически, когда был определен содержащий метод. * `args[4]::Int` : nreq Количество необходимых аргументов для определения функции с переменным количеством аргументов. * `args[5]::QuoteNode{Symbol}` : calling convention Соглашение о вызовах для вызова. * `args[6:5+length(args[3])]` : arguments Значения для всех аргументов (типы каждого из них указаны в args[3]). * `args[6+length(args[3])+1:end]` : gc-roots Дополнительные объекты, которым может потребоваться предоставить права суперпользователя при сборке мусора на время вызова. Сведения об источнике этих объектов и способе их обработки см. в главе [Работа с LLVM](@ref Working-with-LLVM). -
new_opaque_closureCreates a new opaque closure. The following fields are used.
* `args[1]` : signature Сигнатура функции для непрозрачного замыкания. Непрозрачные замыкания не используются в диспетчеризации, но входные типы могут быть ограничены. * `args[2]` : isva Указывает, принимает ли замыкание переменное количество аргументов. * `args[3]` : lb Нижняя граница для типа вывода. (По умолчанию имеет значение `Union{}`) * `args[4]` : ub Верхняя граница для типа вывода. (По умолчанию имеет значение `Any`) * `args[5]` : method Фактический метод как выражение `opaque_closure_method`. * `args[6:end]` : captures Значения, записываемые непрозрачным замыканием.!!! compat "Julia 1.7" Opaque closures were added in Julia 1.7.
Method
A unique container describing common metadata for a single method.
-
name,module,file,line,sigMetadata for a unique method definition for computer and human.
-
ambigA cache of other methods that may be ambiguous with this.
-
specializationsA cache of all MethodInstance ever created for this Method, used to ensure uniqueness. Uniqueness is necessary to ensure efficiency, especially for incremental pre-compilation and tracking method invalidations.
-
sourceThe original source code (if available, usually in compressed form).
-
generatorA callable object that can be executed to obtain a specialized source for a specific method signature.
-
rootsPointers to non-AST objects that have been interpolated into the AST, required for AST compression, type inference, or native code generation.
-
nargs,isva,called,is_for_opaque_closure,Descriptive bit fields for the source code of this Method.
-
primary_worldThe "Age of the world" (hierarchy of method definitions) for this Method.
MethodInstance
A unique container describing a single callable signature for a Method. For important information about safely changing these fields, see Proper maintenance of multithreaded locks.
-
specTypesThe primary key for this MethodInstance. Uniqueness is guaranteed by searching for `def.specializations'.
-
defThe method (`Method') whose specialization this function describes. Or a module (`Module'), if it is a top-level lambda extended in Module and which is not part of the Method.
-
sparam_valsValues of static parameters in
specTypes'. For the `MethodInstancecontainer inMethod.unspecifiedis an empty vectorSimpleVector. But for theMethodInstancecontainer of the runtime environment from theMethodTablecache, this value will always be defined and indexed. -
uninferredUncompressed source code for the top-level converter. In addition, for the generated function, this is one of the many places where the source code can be located.
-
backedgesWe keep a reverse list of cache dependencies to effectively track additional reanalysis or recompilation work that may be required after the new method definitions. To do this, we keep a list of other
MethodInstancecontainers that have been derived or optimized to contain a possible call to thisMethodInstancecontainer. Optimization results may be stored somewhere in the cache (cache), or it may be the result of something that does not need to be cached, such as constant propagation. Thus, we combine all this information into various cache entries (there is almost always only one applicable cache entry with a reference value for max_world). -
cacheCache of `CodeInstance' objects for which this template instance is shared.
CodeInstance
-
defThe `MethodInstance' container from which this cache entry was obtained.
-
ownerA token representing the owner of this
CodeInstance'. Will use `jl_egalto match. -
rettype/rettype_constThe output return type for the 'specFunctionObject` field, which (in most cases) is also the calculated return type for the function as a whole.
-
inferredIt may contain a cache of the output source code for this function, or it may have the value
nothingto simply indicate that nothing is output ('rettype'). -
ftprThe universal jlcall entry point.
-
jlcall_apiThe ABI used when calling `fptr'. Below are some important values.
* 0 - Not compiled yet * 1 - `JL_CALLABLE` `jl_value_t *(*)(jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)` * 2 - Constant (value stored in `rettype_const`) * 3 - With Static-parameters forwarded `jl_value_t *(*)(jl_svec_t *sparams, jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)` * 4 - Run in interpreter `jl_value_t *(*)(jl_method_instance_t *meth, jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)`
-
min_world/max_worldThe range of "ages of the world" (hierarchies of method definitions) for which this method instance is valid to call. If max_world is a special value of the -1 token, its value is not yet known. You can continue to use it until there is information that needs to be reviewed.
CodeInfo
A (usually temporary) container for storing downgraded source code.
-
codeAn array of
Anyoperators. -
slotnamesAn array of characters specifying names for each slot (argument or local variable).
-
slotflagsAn array of
UInt8slot properties represented as bit flags:* 0x02 - assigned (only false if there are *no* assignment statements with this var on the left) * 0x08 - used (if there is any read or write of the slot) * 0x10 - statically assigned once * 0x20 - might be used before assigned. This flag is only valid after type inference.
-
ssavaluetypesAny array or integer (
Int).If an integer (
Int), it specifies the number of temporary locations inserted by the compiler in the function (the length of thecodearray). If it is an array, it sets the type for each location. -
ssaflags32-bit operator-level flags for each expression in the function. For more information, see the definition of
jl_code_info_tin the julia.h file. -
linetableAn array of source code location objects.
-
codelocsAn array of integer indexes in a
linetableindicating the location associated with each operator.
Optional fields:
-
slottypesAn array of types for slots.
-
rettypeThe output is a reduced-form return type (IR). The default value is `Any'.
-
method_for_inference_limit_heuristicsmethod_for_inference_heuristicsextends the generator of this method, if necessary during output. -
parentThe `MethodInstance' container that "owns" this object (if applicable).
-
edgesPassing edges to method instances that should be invalidated.
-
min_world/max_worldThe range of "ages of the world" (hierarchies of method definitions) for which this code was valid at the time of its output.
Logical properties:
-
inferredWhether it was obtained by type inference.
-
inlineableWhether it is suitable for embedding.
-
propagate_inboundsShould
@inboundsbe distributed when embedding in order to skip the@boundscheckblocks.
UInt8 parameters:
-
constprop-
0 — use heuristics
-
1 — aggressive mode
-
2 — none
-
-
purityConsists of 5-bit flags:-
0x01 << 0— this method is guaranteed to return control or terminate execution consistently (:consistent) -
0x01 << 1— this method is devoid of externally semantically visible side effects (:effect_free) -
0x01 << 2— this method is guaranteed not to raise an exception (:nothrow) -
0x01 << 3— this method is guaranteed to terminate execution (:terminates_globally) -
0x01 << 4— the syntactic order of execution inside this method is guaranteed to terminate execution (:terminates_locally)For more information, see the
Base' documentation.@assume_effects.
-