Engee documentation

Abstract Syntax Trees (AST) in Julia

There are two code representations in Julia. First comes the AST tree of the surface syntax returned by the analyzer (for example, the function Meta.parse) and macro-driven. This is a structured representation of the code in the form in which it is written, built using `julia-parser.scm' from a stream of characters. Next comes the reduced form, or IR (intermediate representation), which is used in type inference and code generation. In this form, there are fewer node types, all macros are expanded, and the entire control order is transformed into explicit branches and sequences of operators. It is built using `julia-syntax.scm'.

First, we will look at the AST tree, since it is necessary for writing macros.

The AST of the surface syntax

Interface AST consists almost entirely of expressions (Expr) and small units (for example, characters, numbers). As a rule, each visually distinguishable syntactic form has its own vertex of expression. Examples will be given in the syntax of the s-expression. Each parenthesized list corresponds to an expression where the first element is a vertex. For example, (call f x) corresponds to Expr(:call, :f, :x) in Julia.

Challenges

Input AST

f(x)

(call f x)

f(x, y=1, z=2)

(call f x (kw y 1) (kw z 2))

f(x; y=1)

(call f (parameters (kw y 1)) x)

f(x...)

(call f (... x))

the do syntax:

f(x) do a,b
    body
end

it is analyzed as (do (call f x) (-> (tuple a b) (block body))).

Operators

Most often, operators are used simply for function calls, so they are analyzed using a vertex call. However, some operators are special forms (not necessarily function calls), and in these cases the operator itself is the top of the expression. In julia-parser.scm they are called "syntactic operators". Some operators (+ and *) N-ary analysis is used. Call chains are analyzed as a single N-argument call. Finally, comparison chains have their own special expression structure.

Input AST

x+y

(call + x y)

a+b+c+d

(call + a b c d)

2x

(call * 2 x)

a&&b;

(&& a b)

x += 1

(+= x 1)

a ? 1 : 2

(if a 1 2)

a,b

(tuple a b)

a==b

(call == a b)

1<i<=n

(comparison 1 < i <= n)

a.b

(. a (quote b))

a.(b)

(. a (tuple b))

Forms enclosed in square brackets

Input AST

a[i]

(ref a i)

t[i;j]

(typed_vcat t i j)

t[i j]

(typed_hcat t i j)

t[a b; c d]

(typed_vcat t (row a b) (row c d))

t[a b;;; c d]

(typed_ncat t 3 (row a b) (row c d))

a{b}

(curly a b)

a{b;c}

(curly a (parameters c) b)

[x]

(vect x)

[x,y]

(vect x y)

[x;y]

(vcat x y)

[x y]

(hcat x y)

[x y; z t]

(vcat (row x y) (row z t))

[x;y;; z;t;;;]

(ncat 3 (nrow 2 (nrow 1 x y) (nrow 1 z t)))

[x for y in z, a in b]

(comprehension (generator x (= y z) (= a b)))

T[x for y in z]

(typed_comprehension T (generator x (= y z)))

(a, b, c)

(tuple a b c)

(a; b; c)

(block a b c)

Macros

Input AST

@m x y

(macrocall @m (line) x y)

Base.@m x y

(macrocall (. Base (quote @m)) (line) x y)

@Base.m x y

(macrocall (. Base (quote @m)) (line) x y)

Lines

Input AST

"a"

"a"

x"y"

(macrocall @x_str (line) "y")

x"y"z

(macrocall @x_str (line) "y" "z")

"x = $x"

(string "x = " x)

a b c

(macrocall @cmd (line) "a b c")

Syntax of documentation strings:

"some docs"
f(x) = x

it is analyzed as (macrocall (|.|Core'@doc) (line) "some docs" (= (call f x) (block x))).

Import and other

Input AST

import a

(import (. a))

import a.b.c

(import (. a b c))

import ...a

(import (. . . . a))

import a.b, c.d

(import (. a b) (. c d))

import Base: x

(import (: (. Base) (. x)))

import Base: x, y

(import (: (. Base) (. x) (. y)))

export a, b

(export a b)

using has the same representation as import', but with the top of the expression `:using instead of :import.

Numbers

Julia supports more numeric types than many schema implementations, so not all numbers are represented in the AST directly as schema numbers.

Input AST

11111111111111111111

(macrocall @int128_str nothing "11111111111111111111")

0xfffffffffffffffff

(macrocall @uint128_str nothing "0xfffffffffffffffff")

1111...many digits...

(macrocall @big_str nothing "1111....")

Block Shapes

The operator block is analyzed as (block stmt1 stmt2 ...).

The if operator:

if a
    b
elseif c
    d
else
    e
end

it is analyzed as:

(if a (block (line 2) b)
    (elseif (block (line 3) c) (block (line 4) d)
            (block (line 6 e))))

The while loop is analyzed as (while condition body).

The for loop is analyzed as (for (= var iter) body). If there are several iteration specifications, they are analyzed as a block: (for (block (= v1 iter1) (= v2 iter2)) body).

break and continue are analyzed as null-argument expressions (break) and `(continue)'.

let is analyzed as (let (= var val) body) or (let (block (= var1 val1) (= var2 val2) ...) body) is similar to the for loops.

The basic definition of a function is analyzed as (function (call f x) body). Here is a more complex example:

function f(x::T; k = 1) where T
    return x+1
end

it is analyzed as:

(function (where (call f (parameters (kw k 1))
                       (:: x T))
                 T)
          (block (line 2) (return (call + x 1))))

Type definition:

mutable struct Foo{T<:S}
    x::T
end

it is analyzed as:

(struct true (curly Foo (<: T S))
        (block (line 2) (:: x T)))

The first argument is logical and indicates whether the type is modifiable.

The 'try` blocks are analyzed as (try try_block var catch_block finally_block). If there is no variable after catch, then the variable (var) will look like #f'. If there is no `finally clause, then the last argument is missing.

Expressions with quotes

The Julia source syntax forms for enclosing code in quotation marks (quote and :( )) support interpolation with $. In Lisp terminology, this means that they are actually forms of using backquotes. At the internal level, there is also a need to enclose the code in quotation marks without interpolation. In the Julia schema code, a non-interpolating quotation mark is represented by the vertex of the expression inertia.

The inert expressions are converted to Julia QuoteNode objects. These objects encapsulate a single value of any type and simply return it when evaluated.

The `quote' expression, whose argument is a small element, is also converted to a `QuoteNode'.

Line numbers

Information about the location of the source code is presented in the form of (line line_num file_name), where the third component is optional (and omitted when the current line number changes, but not the file name).

These expressions are represented in Julia as `LineNumberNode'.

Macros

The hygiene of the macro is represented through a pair of vertices of the expression escape and hygienic-scope'. The result of macro expansion is automatically a `(hygienic-scope block module) to represent the result of a new area. The user can insert (escape block) inside to interpolate the code from the caller.

Reduced form

The reduced form (IR) is more important to the compiler because it is used for type inference, optimization such as embedding, and code generation. It is also less obvious to humans, as it results from a significant reordering of the input syntax.

In addition to symbols (Symbol) and some numeric types, the following data types exist in reduced form.

  • Expr

    It has the node type specified by the head field and the args field, which is a Vector{Any} subexpressions . Whereas almost every part of the surface AST is represented by the Expr type, IR uses only a limited number of Exprs and mostly only for calls and some top-level forms.

  • SlotNumber

    Identifies arguments and local variables using sequential numbering. It has an id' field with an integer value representing the slot index. The types of these slots can be found in the `slottypes field of their CodeInfo object.

  • Argument

    The same as `SlotNumber', but appears only after optimization. Indicates that the slot referenced is an argument of the enabling function.

  • CodeInfo

    Wraps the IR of a group of operators. Its code field is an array of expressions to execute.

  • GotoNode

    The unconditional branch. The argument is the branch target object, represented as an index in the array of code to be moved to.

  • GotoIfNot

    A conditional branch. If the cond field is set to false, it goes to the index defined by the dest field.

  • ReturnNode

    Returns its argument (the val field) as the value of the enclosing function. If the field is val undefined, represents an unavailable operator.

  • QuoteNode

    Wraps an arbitrary value that will be referenced as data. For example, the function f() = :a contains a QuoteNode, the value field of which is the character a, in order to return the character itself, rather than defining it.

  • GlobalRef

    Refers to the global variable name in the mod module.

  • SSAValue

    Refers to a sequentially numbered (starting from 1) static single assignment variable (SSA) inserted by the compiler. The SSAValue number (id) is the index of the array of expression code whose value it represents.

  • NewvarNode

    Marks the point where the variable (slot) is created. This causes the variable to be reset to undefined.

Types of Expr

These characters are displayed in the 'head` field of expressions (Expr) in reduced form.

  • call

    Function call (dynamic dispatching). args[1] is the function being called, args[2:end] are the arguments.

  • invoke

    Function call (static dispatching). args[1] is the MethodInstance being called, args[2:end] are the arguments (including the function being called with args[2]).

  • static_parameter

    Indicates a static parameter by index.

  • =

    Assignment. In IR, the first argument is always SlotNumber or `GlobalRef'.

  • method

    Adds a method to a universal function and assigns the result if necessary.

    It has a one-document form and a three-document form. The source of the single-argument form is the syntax `function foo end'. In single-argument form, an argument is a symbol. If this symbol already names a function in the current scope, nothing happens. If the symbol is not defined, a new function is created and assigned to the identifier specified by the symbol. If the symbol is defined but does not name the function, an error occurs. The definition of "names a function" is that the binding is permanent, and refers to an object of a single type. This is because an instance of a single type identifies the type to which the method needs to be added. When a type has fields, it will be unclear whether the method is added to the instance or its type.

    The three-argument form has the following arguments:

    * `args[1]`
    
      Имя функции или `nothing`, если неизвестно или не требуется. Если символ, то выражение сначала ведет себя как одноаргументная форма выше. В дальнейшем этот аргумент игнорируется. Может быть `nothing`, когда методы добавляются строго по типу, `(::T)(x) = x`, или когда метод добавляется к существующей функции, `MyModule.f(x) = x`.
    
    * `args[2]`
    
      `SimpleVector` данных типа аргумента. `args[2][1]` — это `SimpleVector` типов аргументов, а `args[2][2]` — это `SimpleVector` переменных типов, соответствующих статическим параметрам метода.
    
    * `args[3]`
    
      `CodeInfo` самого метода. Для определений методов «вне области» (добавление метода к функции, которая также имеет методы, определенные в других областях) это выражение, которое вычисляется в выражение `:lambda`.
  • struct_type

    A semiargument expression that defines a new structure (struct).

    * `args[1]`
    
      Имя структуры `struct`.
    
    * `args[2]`
    
      Выражение `call`, которое создает `SimpleVector`, указывая его параметры.
    
    * `args[3]`
    
      Выражение `call`, которое создает `SimpleVector`, указывая его имена полей.
    
    * `args[4]`
    
      `Symbol`, `GlobalRef` или `Expr`, указывающие супертип (например, `:Integer`, `GlobalRef(Core, :Any)` или `:(Core.apply_type(AbstractArray, T, N))`).
    
    * `args[5]`
    
      Выражение `call`, которое создает `SimpleVector`, указывая его типы полей.
    
    * `args[6]`
    
      Логический, имеет значение true, если является изменяемым (`mutable`).
    
    * `args[7]`
    
      The number of arguments to initialize. This will be the number of fields or the minimum number of fields called by the `new` operator of the internal constructor.
  • abstract_type

    A three-argument expression that defines a new abstract type. The arguments match arguments 1, 2, and 4 of the struct_type expressions.

  • primitive_type

    A four-argument expression that defines a new primitive type. Arguments 1, 2, and 4 are the same as in `struct_type'. Argument 3 is the number of bits.

    !!! compat "Julia 1.5" struct_type, abstract_type and primitive_type were removed in Julia 1.5 and replaced with calls to new embedded objects.

  • global

    Declares a global binding.

  • const

    Declares a (global) variable as a constant.

  • new

    Highlights a new structure-like object. The first argument is the type. The pseudo-function level new is downgraded to this, and the type is always inserted by the compiler. It’s pretty much just an internal function that doesn’t perform validation. Defining arbitrary new expressions can easily lead to failure.

  • splatnew

    It is similar to new, except that the field values are passed as a single tuple. It works the same way as splat(new) if new was a first-class function. Hence the name.

  • isdefined

    Expr(:isdefined, :x) returns a boolean value indicating whether x has already been defined in the current scope.

  • the_exception

    Throws a caught exception inside the catch block as returned by `jl_current_exception(ct)'.

  • enter

    Introduces an event handler ('setjmp`). args[1] is the label of the catch block to switch to in case of an error . Issues the token used by `pop_exception'.

  • leave

    Returns exception handlers. args[1] is the number of handlers to return.

  • pop_exception

    Returns the state of the current exceptions to the state in the associated enter argument when exiting the catch block. 'args[1]` contains a token from the associated enter argument.

    !!! compat "Julia 1.1" The pop_exception argument is new in Julia 1.1.

  • inbounds

    Controls whether border checks are enabled or disabled. A stack is being maintained. If the first argument of this expression is true or false (true means that bounds checking is disabled), it is sent to the stack. If the first argument is :pop, the stack is removed.

  • boundscheck

    It has the value false if it is inserted into a section of code marked with the macro `@inbounds', otherwise it has the value `true'.

  • loopinfo

    Marks the end of the cycle. It contains metadata that is passed to the LowerSimdLoop to either mark the inner loop of the expression @simd or to distribute information in the LLVM loop transmission.

  • copyast

    Part of the quasi-quotation mark implementation. The argument is the surface syntax of the AST, which is simply recursively copied and returned at runtime.

  • meta

    Metadata. 'args[1]` is usually a character indicating the type of metadata, and the rest of the arguments are in free form. The following types of metadata are commonly used.

    * `:inline` and `:noinline`: Inlining hints.
  • foreigncall

    A statically calculated container for information on the keyword `ccall'. The following fields are used.

    * `args[1]` : name
    
      Выражение, которое будет проанализировано для внешней функции.
    
    * `args[2]::Type` : RT
    
      (Литеральный) возвращаемый тип, вычисленный статически, когда был определен содержащий метод.
    
    * `args[3]::SimpleVector` (of Types) : AT
    
      (Литеральный) вектор типов аргументов, вычисленный статически, когда был определен содержащий метод.
    
    * `args[4]::Int` : nreq
    
      Количество необходимых аргументов для определения функции с переменным количеством аргументов.
    
    * `args[5]::QuoteNode{Symbol}` : calling convention
    
      Соглашение о вызовах для вызова.
    
    * `args[6:5+length(args[3])]` : arguments
    
      Значения для всех аргументов (типы каждого из них указаны в args[3]).
    
    * `args[6+length(args[3])+1:end]` : gc-roots
    
      Дополнительные объекты, которым может потребоваться предоставить права суперпользователя при сборке мусора на время вызова. Сведения об источнике этих объектов и способе их обработки см. в главе [Работа с LLVM](@ref Working-with-LLVM).
  • new_opaque_closure

    Creates a new opaque closure. The following fields are used.

    * `args[1]` : signature
    
      Сигнатура функции для непрозрачного замыкания. Непрозрачные замыкания не используются в диспетчеризации, но входные типы могут быть ограничены.
    
    * `args[2]` : isva
    
      Указывает, принимает ли замыкание переменное количество аргументов.
    
    * `args[3]` : lb
    
      Нижняя граница для типа вывода. (По умолчанию имеет значение `Union{}`)
    
    * `args[4]` : ub
    
      Верхняя граница для типа вывода. (По умолчанию имеет значение `Any`)
    
    * `args[5]` : method
    
      Фактический метод как выражение `opaque_closure_method`.
    
    * `args[6:end]` : captures
    
      Значения, записываемые непрозрачным замыканием.

    !!! compat "Julia 1.7" Opaque closures were added in Julia 1.7.

Method

A unique container describing common metadata for a single method.

  • name, module, file, line, sig

    Metadata for a unique method definition for computer and human.

  • ambig

    A cache of other methods that may be ambiguous with this.

  • specializations

    A cache of all MethodInstance ever created for this Method, used to ensure uniqueness. Uniqueness is necessary to ensure efficiency, especially for incremental pre-compilation and tracking method invalidations.

  • source

    The original source code (if available, usually in compressed form).

  • generator

    A callable object that can be executed to obtain a specialized source for a specific method signature.

  • roots

    Pointers to non-AST objects that have been interpolated into the AST, required for AST compression, type inference, or native code generation.

  • nargs, isva, called, is_for_opaque_closure,

    Descriptive bit fields for the source code of this Method.

  • primary_world

    The "Age of the world" (hierarchy of method definitions) for this Method.

MethodInstance

A unique container describing a single callable signature for a Method. For important information about safely changing these fields, see Proper maintenance of multithreaded locks.

  • specTypes

    The primary key for this MethodInstance. Uniqueness is guaranteed by searching for `def.specializations'.

  • def

    The method (`Method') whose specialization this function describes. Or a module (`Module'), if it is a top-level lambda extended in Module and which is not part of the Method.

  • sparam_vals

    Values of static parameters in specTypes'. For the `MethodInstance container in Method.unspecified is an empty vector SimpleVector. But for the MethodInstance container of the runtime environment from the MethodTable cache, this value will always be defined and indexed.

  • uninferred

    Uncompressed source code for the top-level converter. In addition, for the generated function, this is one of the many places where the source code can be located.

  • backedges

    We keep a reverse list of cache dependencies to effectively track additional reanalysis or recompilation work that may be required after the new method definitions. To do this, we keep a list of other MethodInstance containers that have been derived or optimized to contain a possible call to this MethodInstance container. Optimization results may be stored somewhere in the cache (cache), or it may be the result of something that does not need to be cached, such as constant propagation. Thus, we combine all this information into various cache entries (there is almost always only one applicable cache entry with a reference value for max_world).

  • cache

    Cache of `CodeInstance' objects for which this template instance is shared.

CodeInstance

  • def

    The `MethodInstance' container from which this cache entry was obtained.

  • owner

    A token representing the owner of this CodeInstance'. Will use `jl_egal to match.

  • rettype/rettype_const

    The output return type for the 'specFunctionObject` field, which (in most cases) is also the calculated return type for the function as a whole.

  • inferred

    It may contain a cache of the output source code for this function, or it may have the value nothing to simply indicate that nothing is output ('rettype').

  • ftpr

    The universal jlcall entry point.

  • jlcall_api

    The ABI used when calling `fptr'. Below are some important values.

    * 0 - Not compiled yet
    * 1 - `JL_CALLABLE` `jl_value_t *(*)(jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)`
    * 2 - Constant (value stored in `rettype_const`)
    * 3 - With Static-parameters forwarded `jl_value_t *(*)(jl_svec_t *sparams, jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)`
    * 4 - Run in interpreter `jl_value_t *(*)(jl_method_instance_t *meth, jl_function_t *f, jl_value_t *args[nargs], uint32_t nargs)`
  • min_world / max_world

    The range of "ages of the world" (hierarchies of method definitions) for which this method instance is valid to call. If max_world is a special value of the -1 token, its value is not yet known. You can continue to use it until there is information that needs to be reviewed.

CodeInfo

A (usually temporary) container for storing downgraded source code.

  • code

    An array of Any operators.

  • slotnames

    An array of characters specifying names for each slot (argument or local variable).

  • slotflags

    An array of UInt8 slot properties represented as bit flags:

    * 0x02 - assigned (only false if there are *no* assignment statements with this var on the left)
    * 0x08 - used (if there is any read or write of the slot)
    * 0x10 - statically assigned once
    * 0x20 - might be used before assigned. This flag is only valid after type inference.
  • ssavaluetypes

    Any array or integer (Int).

    If an integer (Int), it specifies the number of temporary locations inserted by the compiler in the function (the length of the code array). If it is an array, it sets the type for each location.

  • ssaflags

    32-bit operator-level flags for each expression in the function. For more information, see the definition of jl_code_info_t in the julia.h file.

  • linetable

    An array of source code location objects.

  • codelocs

    An array of integer indexes in a linetable indicating the location associated with each operator.

Optional fields:

  • slottypes

    An array of types for slots.

  • rettype

    The output is a reduced-form return type (IR). The default value is `Any'.

  • method_for_inference_limit_heuristics

    method_for_inference_heuristics extends the generator of this method, if necessary during output.

  • parent

    The `MethodInstance' container that "owns" this object (if applicable).

  • edges

    Passing edges to method instances that should be invalidated.

  • min_world/max_world

    The range of "ages of the world" (hierarchies of method definitions) for which this code was valid at the time of its output.

Logical properties:

  • inferred

    Whether it was obtained by type inference.

  • inlineable

    Whether it is suitable for embedding.

  • propagate_inbounds

    Should @inbounds be distributed when embedding in order to skip the @boundscheck blocks.

UInt8 parameters:

  • constprop

    • 0 — use heuristics

    • 1 — aggressive mode

    • 2 — none

  • purity Consists of 5-bit flags:

    • 0x01 << 0 — this method is guaranteed to return control or terminate execution consistently (:consistent)

    • 0x01 << 1 — this method is devoid of externally semantically visible side effects (:effect_free)

    • 0x01 << 2 — this method is guaranteed not to raise an exception (:nothrow)

    • 0x01 << 3 — this method is guaranteed to terminate execution (:terminates_globally)

    • 0x01 << 4 — the syntactic order of execution inside this method is guaranteed to terminate execution (:terminates_locally)

      For more information, see the Base' documentation.@assume_effects.