Additional information about the types

If you are already using Julia, then you understand the fundamental role that types play. Here we will try to consider this issue in more detail, paying special attention to parametric types.

Types and sets (both `Any` and `Union{}`/`Bottom`)

Perhaps the easiest way to think of Julia’s type system is in terms of sets. While programs work with individual values, a type refers to a set of values. It’s not the same as a collection. For example, the set (Set) of values is itself a single Set value. Rather, a type describes a set of possible values, expressing uncertainty about what value we have.

concret type T describes a set of values whose direct label is returned by the function typeof, looks like a `T'. The abstract type describes some possible larger set of values.

Any describes the totality of possible values. Integer is a subset of Any that contains Int, Int8 and other specific types. Internally, Julia also actively uses another type, known as Bottom, which can also be written as `Union{}'. It corresponds to an empty set.

Julia types support standard set theory operations: you can ask if T1 is a 'subset' (subtype) of T2 with T1 <: T2. In the same way, you intersect the two types with the function typeintersect, take their union using the type Union and calculate the type that contains their union using the function typejoin:

julia> typeintersect(Int, Float64)
Union{}

julia> Union{Int, Float64}
Union{Float64, Int64}

julia> typejoin(Int, Float64)
Real

julia> typeintersect(Signed, Union{UInt8, Int8})
Int8

julia> Union{Signed, Union{UInt8, Int8}}
Union{UInt8, Signed}

julia> typejoin(Signed, Union{UInt8, Int8})
Integer

julia> typeintersect(Tuple{Integer, Float64}, Tuple{Int, Real})
Tuple{Int64, Float64}

julia> Union{Tuple{Integer, Float64}, Tuple{Int, Real}}
Union{Tuple{Int64, Real}, Tuple{Integer, Float64}}

julia> typejoin(Tuple{Integer, Float64}, Tuple{Int, Real})
Tuple{Integer, Real}

Although these operations may seem abstract, they are at the core of Julia. For example, method dispatch is performed by walking through the elements in the method list step by step until the one for which the tuple type of arguments is a subtype of the method signature is reached. For this algorithm to work, it is important that the methods are sorted by their specificity and the search starts with the most specific methods. Consequently, Julia also implements partial type order. This uses functionality similar to <:, but with differences that will be discussed below.

Types of unionAll

The Julia type system can also express an iterable type union: a union of types over all values of a certain variable. This is necessary to describe parametric types in which the values of some parameters are unknown.

For example, the type Array has two parameters, as shown in Array{Int,2}. If we didn’t know the element type, we could write Array{T,2} where T, which is a union of Array{T,2} for all values of T: Union{Array{Int8,2}, Array{Int16,2}, ...}.

This type is represented by a unionAll object, which contains a variable (in this example, T, of type TypeVar) and a wrapper type (in this example, Array{T,2}).

Consider the following methods.

f1(A::Array) = 1
f2(A::Array{Int}) = 2
f3(A::Array{T}) where {T<:Any} = 3
f4(A::Array{Any}) = 4

Signature (as described in the section Function calls) f3 is a unionAll type that encapsulates the tuple type: Tuple{typeof(f3), Array{T}} where T. All but f4 can be called using a = [1,2]. Everything except f2 can be called using b = Any[1,2].

Let’s look at these types in more detail.

julia> dump(Array)
UnionAll
  var: TypeVar
    name: Symbol T
    lb: Union{}
    ub: Any
  body: UnionAll
    var: TypeVar
      name: Symbol N
      lb: Union{}
      ub: Any
    body: Array{T, N} <: DenseArray{T, N}
      ref::MemoryRef{T}
      size::NTuple{N, Int64}

This indicates that Array actually names the type unionAll'. There is one nested type `unionAll for each parameter. The Array' syntax{Int,2}`equivalent to `+Array{Int}{2}+. Internally, an instance of each unionAll type is created with a specific variable value, one at a time, starting from the outermost one. In this case, omitting the final type parameters takes on a natural meaning. Array{Int} outputs a type equivalent to Array{Int,N} where N.

TypeVar' is not a type in itself, but rather should be considered as part of a `unionAll type structure. Type variables have lower and upper bounds for their values (in the fields lb and ub). The symbolic name name is purely decorative. Internally, TypeVar types are compared by address, so they are defined as mutable so that variables of "different" types can be distinguished. However, by convention they should not be mutable.

The TypeVar types can be built manually:

julia> TypeVar(:V, Signed, Real)
Signed<:V<:Real

There are convenient versions that allow you to omit any of these arguments except the name character.

The syntax of Array{T} where T<:Integer is lowered to the following:

let T = TypeVar(:T,Integer)
    UnionAll(T, Array{T})
end

therefore, TypeVar rarely needs to be created manually (moreover, it should be avoided).

Free variables

The concept of a variable of any type is extremely important in the type system. We say that the variable V has the free type T if T does not contain the type unionAll, which represents the variable V'. For example, the type `+Array{Array{V} where V<:Integer}+ has no free variables, but part of Array{V} inside it has a free variable V.

A type with free variables is, in a sense, not a type at all. Consider the type Array{Array{T}} where T, which refers to all homogeneous arrays of arrays. It may seem that the internal type is +Array{T}The +, considered by itself, refers to any kind of array. However, each element of the external array must have _ this_ array type, so Array{T} cannot refer to any old array. We can say that Array{T} actually "occurs" several times, and 'T` should have the same value every "time".

For this reason, the jl_has_free_typevars function in the C API is very important. The types for which it returns true will not give intelligible answers in subtyping and other type functions.

TypeNames

The following two types Array are functionally equivalent, but are output in different ways:

julia> TV, NV = TypeVar(:T), TypeVar(:N)
(T, N)

julia> Array
Array

julia> Array{TV, NV}
Array{T, N}

They can be distinguished by examining the type name field, which is an object of type TypeName.:

julia> dump(Array{Int,1}.name)
TypeName
  name: Symbol Array
  module: Module Core
  names: empty SimpleVector
  wrapper: UnionAll
    var: TypeVar
      name: Symbol T
      lb: Union{}
      ub: Any
    body: UnionAll
      var: TypeVar
        name: Symbol N
        lb: Union{}
        ub: Any
      body: Array{T, N} <: DenseArray{T, N}
  cache: SimpleVector
    ...

  linearcache: SimpleVector
    ...

  hash: Int64 -7900426068641098781
  mt: MethodTable
    name: Symbol Array
    defs: Nothing nothing
    cache: Nothing nothing
    max_args: Int64 0
    module: Module Core
    : Int64 0
    : Int64 0

In this case, the corresponding field is wrapper, which contains a reference to the top-level type used to create new types of `Array'.

julia> pointer_from_objref(Array)
Ptr{Cvoid} @0x00007fcc7de64850

julia> pointer_from_objref(Array.body.body.name.wrapper)
Ptr{Cvoid} @0x00007fcc7de64850

julia> pointer_from_objref(Array{TV,NV})
Ptr{Cvoid} @0x00007fcc80c4d930

julia> pointer_from_objref(Array{TV,NV}.name.wrapper)
Ptr{Cvoid} @0x00007fcc7de64850

The wrapper field in Array points to itself, but for Array{TV,NV} it points to the original type definition.

What about the other fields? hash assigns an integer value to each type. To study the cache field, it is recommended to choose a type that is used less actively than Array. First, let’s create our own type:

julia> struct MyType{T,N} end

julia> MyType{Int,2}
MyType{Int64, 2}

julia> MyType{Float32, 5}
MyType{Float32, 5}

When creating an instance of a parametric type, each specific type is stored in the type cache (`MyType.body.body.name.cache'). However, instances containing free type variables are not cached.

Types of tuples

Tuple types are an interesting special case. For dispatching to work with declarations like `x::Tuple', the type must be able to accommodate any tuple. Let’s check the parameters:

julia> Tuple
Tuple

julia> Tuple.parameters
svec(Vararg{Any})

Unlike other types, tuple types are covariant in their parameters, so this definition allows a Tuple to match any tuple type.:

julia> typeintersect(Tuple, Tuple{Int,Float64})
Tuple{Int64, Float64}

julia> typeintersect(Tuple{Vararg{Any}}, Tuple{Int,Float64})
Tuple{Int64, Float64}

However, if the tuple type with a variable number of arguments (Vararg) has free variables, it can describe different types of tuples.:

julia> typeintersect(Tuple{Vararg{T} where T}, Tuple{Int,Float64})
Tuple{Int64, Float64}

julia> typeintersect(Tuple{Vararg{T}} where T, Tuple{Int,Float64})
Union{}

Note that when the type T is free relative to the type Tuple (i.e. its binding type unionAll is outside the type Tuple), only one value T should work with the entire type. Therefore, dissimilar tuples do not match.

Finally, it’s worth noting that Tuple{} is different.:

julia> Tuple{}
Tuple{}

julia> Tuple{}.parameters
svec()

julia> typeintersect(Tuple{}, Tuple{Int})
Union{}

What is the "main" tuple type?

julia> pointer_from_objref(Tuple)
Ptr{Cvoid} @0x00007f5998a04370

julia> pointer_from_objref(Tuple{})
Ptr{Cvoid} @0x00007f5998a570d0

julia> pointer_from_objref(Tuple.name.wrapper)
Ptr{Cvoid} @0x00007f5998a04370

julia> pointer_from_objref(Tuple{}.name.wrapper)
Ptr{Cvoid} @0x00007f5998a04370

therefore, Tuple == Tuple{Vararg{Any}} is indeed the main type.

Diagonal types

Consider the Tuple' type{T,T} where T. A method with this signature will look like this:

f(x::T, y::T) where {T} = ...

According to the usual interpretation of the type unionAll, this type T applies to all types, including Any, so it should be equivalent to Tuple'.{Any,Any}. However, this interpretation leads to a number of practical problems.

First, the value of T must be available inside the method definition. For a call like f(1, 1.0), it is unclear what the type of T should be. It can be a Union{Int,Float64}, or maybe, Real. Intuitively, we expect the declaration x::T to mean T === typeof(x)'. To make sure that the invariant is preserved, we need `typeof(x) === typeof(y)===T in this method. This means that the method should only be called for arguments of exactly the same type.

It turns out that the ability to send data about whether two values have the same type is very useful (it is used, for example, in the promotion system), so we have several reasons to get a different interpretation of the Tuple{T,T} where T. To do this, we add the following rule to subtyping: if a variable occurs more than once in a covariant position, it is limited to a range of specific types only. ("Covariant position" means that only the types Tuple and Union occur between the occurrence of a variable and the type unionAll entered by it.) Such variables are called "diagonal variables" or "specific variables".

For example, Tuple{T,T} where T can be considered as Union{Tuple{Int8,Int8}, Tuple{Int16,Int16}...}, where the type T covers all specific types. This leads to some interesting subtyping results. For example, Tuple{Real,Real} is not a subtype of the Tuple' type{T,T} where T, because it includes some types such as Tuple{Int8,Int16}, where these two elements have different types. Tuple{Real,Real} and Tuple{T,T} where T have a non-trivial intersection Tuple{T,T} where T<:Real. However, Tuple{Real} _ is a subtype of the type Tuple{T} where T, because in this case the type T occurs only once and therefore is not diagonal.

Next, consider the signature of the following form:

f(a::Array{T}, x::T, y::T) where {T} = ...

In this case, the type T appears in an invariant position inside the Array'.{T}. This means that whatever type of array is passed, it uniquely determines the value of type T — we say that type T has an equality constraint. Therefore, the diagonal rule is not necessary in this case, since the array defines the type T, and you can allow x and y to be any subtypes of the type `T'. Therefore, variables that occur in an invariant position are never considered diagonal. This choice of behavior is somewhat controversial — some believe that this definition should be written as

f(a::Array{T}, x::S, y::S) where {T, S<:T} = ...

to clarify whether x and y should have the same type. In this version of the signature, they will have the same type, or you can enter a third variable for the type y if x and y may have different types.

The next difficulty is the interaction of unions and diagonal variables, for example:

f(x::Union{Nothing,T}, y::T) where {T} = ...

Let’s look at what this ad means. 'y` is of type T'. Then `x can have either the same type T, or the type Nothing. Thus, all of the following calls must match:

f(1, 1)
f("", "")
f(2.0, 2.0)
f(nothing, 1)
f(nothing, "")
f(nothing, 2.0)

These examples show the following: when x has the value nothing'.::Nothing, there are no additional restrictions for y. It’s as if the method signature has `y::Any'. In fact, we have the following type equivalence:

(Tuple{Union{Nothing,T},T} where T) == Union{Tuple{Nothing,Any}, Tuple{T,T} where T}

The general rule is that a specific variable in a covariant position behaves as non-specific if the subtyping algorithm uses it only once. When x is of type Nothing, you don’t need to use type T in Union'.{Nothing,T}, it is only used in the second slot. This is due to the fact that in Tuple{T} where T limiting the type of T to specific types does not matter. The type is equal to Tuple{Any} anyway.

However, having appeared in an invariant position, the variable loses the right to be specific, regardless of whether this occurrence is used or not. Otherwise, types may behave differently depending on which other types they are compared to, making subtyping non-transitive. For example, consider the following.

Tuple{Int,Int8,Vector{Integer}} <: Tuple{T,T,Vector{Union{Integer,T}}} where T

If the type T inside Union' is ignored, then the type `T will be specific, and the answer will be false, since the first two types are not the same. Now consider the following.

Tuple{Int,Int8,Vector{Any}} <: Tuple{T,T,Vector{Union{Integer,T}}} where T

Right now we can’t ignore the type T in Union (we should have T == Any), so the type T is not specific and the answer is true. In this case, the concreteness of T will depend on another type, which is unacceptable, since the type must have a clear meaning on its own. Therefore, the appearance of a T inside a Vector is considered in both cases.

Subtyping of diagonal variables

The subtyping algorithm for diagonal variables consists of two components: (1) determining the occurrences of variables and (2) ensuring that diagonal variables cover only specific types.

The first task is solved by keeping counters occurs_inv and occur_cov (in the file src/subtype.c) for each variable in the environment, tracking the number of invariant and covariant occurrences, respectively. The variable is diagonal when occurs_inv == 0 && occurs_cov > 1.

The second problem is solved by imposing a condition on the lower bound of the variable. As the subtyping algorithm works, the boundaries of each variable are narrowed (with increasing lower and decreasing upper bounds) in order to track the range of variable values for which the subtype ratio will be performed. Having completed the calculation of a body of type unionAll, the variable of which is diagonal, let’s look at the final values of the boundaries. Since a variable must be specific, a contradiction arises if its lower bound cannot be a subtype of a specific type. For example, an abstract type like AbstractArray cannot be a subtype of a specific type, but a specific type like Int' can, and the empty type `Bottom can too. If the lower bound fails this test, the algorithm stops with the answer `false'.

For example, in the Tuple' task{Int,String} <: Tuple{T,T} where T we get that the ratio would be correct if T were a supertype of Union{Int,String}. However, the Union{Int,String} is an abstract type, so the relation does not hold.

This specificity test is performed using the 'is_leaf_bound` function. Note that this test is slightly different from the jl_is_leaf_type function, as it also returns true for the lower bound (Bottom). Currently, this function is heuristic and does not intercept all possible specific types. The difficulty is that the concreteness of the lower bound may depend on the values of the bounds of other variables of the type. For example, Vector{T}+`is equivalent to a specific type `+Vector{Int}, only if the upper and lower bounds of T are equal to `Int'. We have not yet developed a complete algorithm for dealing with this situation.

Getting to know the internal mechanism

Most operations for working with types are found in the files jltypes.c and subtype.c. It’s best to start by studying typing in action. Build Julia using make debug and run Julia in the debugger. In the chapter gdb Debugging Tips you will find useful tips.

Since the subtyping code is actively used in the REPL itself, and therefore breakpoints in this code are triggered frequently, it will be easier if you create the following definition:

julia> function mysubtype(a,b)
           ccall(:jl_breakpoint, Cvoid, (Any,), nothing)
           a <: b
       end

and then set a breakpoint in the jl_breakpoint function. After this point is triggered, you can set breakpoints in other functions.

As a warm-up, try the following.

mysubtype(Tuple{Int, Float64}, Tuple{Integer, Real})

Let’s raise the level and consider a more complex case.:

mysubtype(Tuple{Array{Int,2}, Int8}, Tuple{Array{T}, T} where T)

Subtyping and sorting methods

The type_morespecific functions are used to set a partial order for functions in method tables (from the most specific to the least). Specificity is strict; if a is more specific than b, then a is not equal to b and b is not more specific than `a'.

If a is a strict subtype of b, then it is automatically considered more specific. Based on this, type_morespecific uses some less formal rules. For example, subtype depends on the number of arguments, while type_morespecific may not. In particular, the Tuple{Int,AbstractFloat} is more specific than Tuple{Integer}, even if it is not a subtype. (Neither Tuple{Int,AbstractFloat}, nor Tuple{Integer,Float64} is not more specific than the other one.) In particular, Tuple{Int,Vararg{Int}} is not a subtype of Tuple{Integer}, but is considered more specific. However, morespecific gets a length bonus: specifically, Tuple{Int,Int}`more specific than `+Tuple{Int,Vararg{Int}}+.

If you are debugging ways to sort methods, it may be advisable to define a function.:

type_morespecific(a, b) = ccall(:jl_type_morespecific, Cint, (Any,Any), a, b)

which allows you to check whether the tuple type a is more specific than the tuple type `b'.