Engee documentation
Notebook

Data types in Julia

A script about data types in addition to article on habr.com.

Types in Julia

  • Primitive type: a type defined using the keyword primitive type. Objects of a primitive type have a given fixed memory size specified in the type definition. 📝Int64,Bool,Char

  • Composite type: a type defined using the keyword struct. Composite types consist of zero or more fields referencing other objects (primitive or composite type).📝Complex,Rational (fields re, im and num, den, respectively), Tuple

  • Concrete type: primitive or composite type

  • Abstract type: a type defined using the keyword abstract type. Abstract types have no fields, and objects cannot be created (instantiated) based on them. In addition, they cannot be declared as children of a particular type. Also, abstract types include non-specific types.📝 Number, AbstractFloat

  • Modifiable type: a composite type defined using the keyword mutable struct. Modifiable types can associate their fields with objects other than those they were associated with during initialisation.📝 String, Dict

  • Unmodifiable type: all types except those defined with mutable struct.

  • Parametric type: a family of (mutable or immutable) composite or abstract types with the same field names and type name without regard to parameter types. The defined type is then uniquely identified by the parametric type name and the type(s) of the parameter(s). 📝 Rational{Int8}(1,2), see below AbstractArray{T,N}, Array{T,N}

  • Source types: a type whose definition is contained in Julia Base or in the Julia standard library

  • Bit type: a primitive or immutable composite type whose fields are all bit types.

  • Singleton: an object created from a composite type consisting of zero fields. 📝nothing, missing

  • Container: a composite type (not necessarily modifiable) designed to reference a variable number of objects and provide methods to access, enumerate, and eventually modify references to other objects.

Primitive type

In spite of the fact that the documentation does not recommend using the primitive type construct, I suggest you to start your acquaintance with types with primitive types.

This is done because here we will go down to the lowest level, where we will see how data will be represented in memory in the end.

As an example, let's introduce our "space-protected" Bool, which fills all possible available bit cells either 0 or 1.

When creating a primitive type, you must explicitly specify how many bits are needed to store the type. (In our case 8)

In [ ]:
primitive type FilledBool  8 end

function FilledBool(x::Int)
    if iszero(x)
        reinterpret(FilledBool,0b00000000)
    elseif x == 1
        reinterpret(FilledBool,0b11111111)
    else 
        error("В качестве параметров допустимы только 0 и 1")
    end
end 

Base.show(io :: IO, x :: FilledBool) = print(io, bitstring(x))

@show tr = FilledBool(1)
@show fls = FilledBool(0)
println("Regular Bool true: ", bitstring(true))
tr = FilledBool(1) = 11111111
fls = FilledBool(0) = 00000000
Regular Bool true: 00000001

Let's check if our type is a bit type:

In [ ]:
isbitstype(FilledBool)
Out[0]:
true

The documentation says that instead of creating your own primitive types, it is better to make a wrapper over them in the form of a composite type. Let's get to know it better!

Composite type

Non-interchangeable composite type

It is important to realise that a composite type can compose of multiple fields, or one or zero fields. Unlike many other programming languages, where fields and methods are associated with an object, only fields and its constructor are attached to a composite type in Julia. The relationship between OOP and Julia is interestingly explained here.

But we will focus on types for now.

Let's say we have a type "Mountain". We specify 2 characteristics of objects of this type:

  • year of conquest (the year can be positive or negative)
  • mountain height (assume that all mountains are above sea level)

In immutable types, once they are created, the fields cannot be changed.

In [ ]:
struct Mountain
    first_ascent_year::Int16
    height::UInt16
end

Everest = Mountain(1953,8848)
Int(Everest.height)

try
    Everest.height = 9000  # нельзя менять значения полей Mountain
catch e 
e
end
Out[0]:
ErrorException("setfield!: immutable struct of type Mountain cannot be changed")

To have a closer look at how the structure is organised, you can use:

In [ ]:
dump(Everest)
Mountain
  first_ascent_year: Int16 1953
  height: UInt16 0x2290

Each element type of an invariant structure Mountain element is bit, so the Mountain type is bit

In [ ]:
@show sizeof(Mountain) # 2 поля по 2 байта = 4
isbitstype(Mountain)
sizeof(Mountain) = 4
Out[0]:
true

Let's consider the case when the fields of the immutable structure are non-bit type.

The string is stored not as an array of Char's elements, but as a pointer to the Char's array. Therefore, the size of the structure is 8 bytes (the size of the pointer) and the size of the string is 6 bytes. (Although sizeof(Char)=4, in case of ASCII they will take 1 byte)

In [ ]:
struct City
    name::String
end

Moscow = City("Moscow")

Moscow.name

@show sizeof(Moscow)
@show sizeof(Moscow.name)
@show Base.summarysize(Moscow);
sizeof(Moscow) = 8
sizeof(Moscow.name) = 6
Base.summarysize(Moscow) = 22

If you want to use static strings, then

In [ ]:
import Pkg.add; Pkg.add("StaticStrings")
using StaticStrings
struct StaticCity
    name::StaticString{10}
end
Moscow = StaticCity(static"Moscow"10) # дополняется \0 до 10
@show sizeof(Moscow)
@show sizeof(Moscow.name)
@show Base.summarysize(Moscow);
sizeof(Moscow) = 10
sizeof(Moscow.name) = 10
Base.summarysize(Moscow) = 10

Although we cannot change the string, this type is not bitwise.

That is, it is important to understand the difference between unmodifiable and bit types.

The unusual behaviour of the ismutable("123") function is explained here

In [ ]:
@show isbitstype(City)
@show isbitstype(StaticCity);
isbitstype(City) = false
isbitstype(StaticCity) = true

We would like to note separately that an immutable type can have immutable fields of a mutable type.

As an analogy: Suppose we have a rope to which a balloon is tied, which we can change: stretch, inflate, fill with water. But we can't tear off the string and attach a green ball to it.

In [ ]:
struct Student
    name::String
    grade::UInt8        # класс
    grades::Vector{Int} # оценки
 end
 Alex = Student("Alex", 1, [5,5,5])
 @show sizeof(Alex)  # 8 + 1 + 8 = 17 => 24 округление до x % 8 == 0
sizeof(Alex) = 24
Out[0]:
24
In [ ]:
pointer(Alex.grades)
Out[0]:
Ptr{Int64} @0x00007f18d58967a0
In [ ]:
push!(Alex.grades,4)
Alex.grades
Out[0]:
4-element Vector{Int64}:
 5
 5
 5
 4
In [ ]:
@show pointer(Alex.grades)
pointer(Alex.grades) = Ptr{Int64} @0x00007f18d370a0e0
Out[0]:
Ptr{Int64} @0x00007f18d370a0e0

As we can see, we change the elements of the vector, but not the pointer to its first element.

In [ ]:
# разыменование указателя на вектор (1й элемент вектора)
unsafe_load(pointer(Alex.grades)) 
Out[0]:
5

And if we want to change not the elements of the vector, but the pointer to the vector, an error will occur.

In [ ]:
try
Alex.grades = [1, 2, 3] # здесь же мы хотим 
catch e
    e
end
Out[0]:
ErrorException("setfield!: immutable struct of type Student cannot be changed")

Variable type

In the case of a changeable type, we can change the fields.

In [ ]:
mutable struct MutableStudent
    const name::String
    grade::UInt8        # класс
    grades::Vector{Int} # оценки
end
Peter = MutableStudent("Peter", 1, [5,5,5])
Peter.grade = 2
Out[0]:
2

But there is a possibility to make some fields of a changeable structure immutable (constant). In this case, despite the fact that the structure is changeable, this field cannot be changed.

In [ ]:
try
    Peter.name = "Alex"
catch e
    e
end
Out[0]:
ErrorException("setfield!: const field .name of type MutableStudent cannot be changed")

You can see how now we can change a vector to another vector:

In [ ]:
@show pointer(Peter.grades)
@show Peter.grades = [2,2,2]
@show pointer(Peter.grades)
pointer(Peter.grades) = Ptr{Int64} @0x00007f18572077e0
Peter.grades = [2, 2, 2] = [2, 2, 2]
pointer(Peter.grades) = Ptr{Int64} @0x00007f185714e430
Out[0]:
Ptr{Int64} @0x00007f185714e430

The difference between immutable struct and mutable struct with constant fields.

Despite the fact that fields of immutable struct and constant fields of mutable struct cannot be changed, there is a significant difference between objects of such types with the same fields.

In case of immutable type - objects with the same fields are literally one and the same object, because all objects with the same fields will be located at the same address.

In the case of mutable struct each of the objects with the same constant fields will be located at its unique address.

In [ ]:
struct Immutable
    a::Int32
    b::Int32
 end

 mutable struct ConstMutable
    const a::Int32
    const b::Int32
end

im_obj_1 = Immutable(1,2)
im_obj_2 = Immutable(1,2)

const_mut_obj_1 = ConstMutable(1,2)
const_mut_obj_2 = ConstMutable(1,2)
# === означает равенство и значений и адресов в памяти
@show im_obj_1 === im_obj_2  
@show const_mut_obj_1 === const_mut_obj_2
im_obj_1 === im_obj_2 = true
const_mut_obj_1 === const_mut_obj_2 = false
Out[0]:
false

Immutable structures may not be as convenient in terms of their use interface. But their advantage is their placement "on the stack". While modifiable structures are usually stored "on the heap".

In [ ]:
println(@allocations (a = Immutable(3,4); b = Immutable(3,4)))
println(@allocations (a = ConstMutable(3,4); b = ConstMutable(3,4)))
0
2

However, this statement need not be taken literally.

So, for example, the compiler can make optimisations and not allocate memory for modifiable structures inside a function that will return a number rather than a modifiable structure:

In [ ]:
function foo(x,y)
    obj1 = Immutable(x,y)
    obj2 = Immutable(y,x)
    c = obj1.a + obj2.b
end
function bar(x,y)
    obj1 = ConstMutable(x,y)
    obj2 = ConstMutable(y,x)
    c = obj1.a + obj2.b
end
println(@allocations foo(1,2))
println(@allocations bar(1,2))
0
0

Abstract type

What are abstract types for? Abstract types are needed to:

  • group concrete types
  • define interfaces for functions
  • control the scope of creation of other classes using parameterisation (see below)

Grouping of specific types

Thanks to abstract types, type hierarchies can be organised.

Let's consider the classic and most understandable type - Number.

Using A <: B We can specify or check that the type A is a subtype of . B

In [ ]:
Int8 <: Integer || Int16 <: Integer
Out[0]:
true
In [ ]:
subtypes(Signed)
WARNING: both ImageMetadata and ImageAxes export "data"; uses of it in module Images must be qualified
Out[0]:
6-element Vector{Any}:
 BigInt
 Int128
 Int16
 Int32
 Int64
 Int8

We can also work the other way round: B :> A indicates that B is a supatype A

And the supertypes function returns a tuple of supertypes ordered from left to right in ascending order

In [ ]:
supertypes(Int8)
Out[0]:
(Int8, Signed, Integer, Real, Number, Any)

But a more visually pleasing extension is the AbstractTrees package, which allows us to get a familiar picture.

In [ ]:
using AbstractTrees
AbstractTrees.children(t::Type) = subtypes(t)
print_tree(Number) # здесь можно видеть типы и от Engee
Number
├─ MultiplicativeInverse
│  ├─ SignedMultiplicativeInverse
│  └─ UnsignedMultiplicativeInverse
├─ Complex
├─ Measurement
├─ Quaternion
├─ Real
│  ├─ AbstractFloat
│  │  ├─ BigFloat
│  │  ├─ DecimalFloatingPoint
│  │  │  ├─ Dec128
│  │  │  ├─ Dec32
│  │  │  └─ Dec64
│  │  ├─ Float16
│  │  ├─ Float32
│  │  └─ Float64
│  ├─ AbstractIrrational
│  │  ├─ Irrational
│  │  └─ IrrationalConstant
│  │     ├─ Fourinvπ
│  │     ├─ Fourπ
│  │     ├─ Halfπ
│  │     ├─ Inv2π
│  │     ├─ Inv4π
│  │     ├─ Invsqrt2
│  │     ├─ Invsqrt2π
│  │     ├─ Invsqrtπ
│  │     ├─ Invπ
│  │     ├─ Log2π
│  │     ├─ Log4π
│  │     ├─ Loghalf
│  │     ├─ Logten
│  │     ├─ Logtwo
│  │     ├─ Logπ
│  │     ├─ Quartπ
│  │     ├─ Sqrt2
│  │     ├─ Sqrt2π
│  │     ├─ Sqrt3
│  │     ├─ Sqrt4π
│  │     ├─ Sqrthalfπ
│  │     ├─ Sqrtπ
│  │     ├─ Twoinvπ
│  │     └─ Twoπ
│  ├─ FixedPoint
│  │  ├─ Fixed
│  │  └─ Normed
│  ├─ Dual
│  ├─ Percentile
│  ├─ Integer
│  │  ├─ Bool
│  │  ├─ UpperBoundedInteger
│  │  ├─ ChainedVectorIndex
│  │  ├─ Signed
│  │  │  ├─ BigInt
│  │  │  ├─ Int128
│  │  │  ├─ Int16
│  │  │  ├─ Int32
│  │  │  ├─ Int64
│  │  │  └─ Int8
│  │  └─ Unsigned
│  │     ├─ VarUInt
│  │     ├─ UInt128
│  │     ├─ UInt16
│  │     ├─ UInt32
│  │     ├─ UInt64
│  │     └─ UInt8
│  ├─ Rational
│  ├─ SimpleRatio
│  ├─ StaticFloat64
│  ├─ PValue
│  ├─ TestStat
│  ├─ LiteralReal
│  ├─ SafeReal
│  ├─ Num
│  ├─ Struct
│  └─ AbstractSIMD
│     ├─ AbstractSIMDVector{W, T} where {W, T<:(Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, var"#s3"} where var"#s3"<:StaticInt)}
│     │  ├─ AbstractMask
│     │  │  ├─ EVLMask{W, U} where {W, U<:Union{UInt128, UInt16, UInt32, UInt64, UInt8}}
│     │  │  └─ Mask{W, U} where {W, U<:Union{UInt128, UInt16, UInt32, UInt64, UInt8}}
│     │  ├─ MM
│     │  └─ Vec{W, T} where {W, T<:(Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, var"#s3"} where var"#s3"<:StaticInt)}
│     └─ VecUnroll{N, W, T, V} where {N, W, T<:(Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, var"#s3"} where var"#s3"<:StaticInt), V<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, AbstractSIMD{W, T}}}
├─ StaticInteger
│  ├─ StaticBool
│  │  ├─ False
│  │  └─ True
│  └─ StaticInt
├─ AbstractQuantity
│  └─ Quantity
├─ LogScaled
│  ├─ Gain{L} where L<:LogInfo
│  └─ Level{L} where L<:LogInfo
├─ Double
└─ LazyMulAdd

However, I recommend launching print_tree(Any) and diving into the marvellous world of Julia types))))

Abstract types and multiple dispatching

Finishing the reasoning about our numbers, we would like to note that it is logical that any two numbers can be added together.

That is why in promotion.jl is the following line:

+(x::Number, y::Number) = +(promote(x,y)...)

(using methods(+) you can see what adds up to what and by what rules)

Although, for example, the least common multiple should be defined only for integers or rationals. We will discuss it at the end.

And for those who are tired of numbers, I suggest to return to our rams

In [ ]:
abstract type Pet end
struct Dog <: Pet; name::String end
struct Cat <: Pet; name::String end

function encounter(a::Pet, b::Pet)
    verb = meets(a, b)
    println("$(a.name) встречает $(b.name) и $verb.")
end


meets(a::Dog, b::Dog) = "нюхает"
meets(a::Dog, b::Cat) = "гонится"
meets(a::Cat, b::Dog) = "шипит"
meets(a::Cat, b::Cat) = "мурлычит"

fido = Dog("Рекс")
rex = Dog("Мухтар")
whiskers = Cat("Матроскин")
spots = Cat("Бегемот")

encounter(fido, rex)       
encounter(fido, whiskers)  
encounter(whiskers, rex)   
encounter(whiskers, spots) 
Рекс встречает Мухтар и нюхает.
Рекс встречает Матроскин и гонится.
Матроскин встречает Мухтар и шипит.
Матроскин встречает Бегемот и мурлычит.

The convenience is that we can not specify for each animal how it greets the other, but make a common interface "greetings" for animals.

In [ ]:
meets(a::Pet, b::Pet) = "здоровается"

struct Cow <: Pet; name::String end

encounter(rex,Cow("Бурёнка"))
Мухтар встречает Бурёнка и здоровается.

This doesn't work so conveniently in all languages. You can see more about this in the video from which this code was taken.

DataType

But before we get to the last section.

** Let's bring on the turmoil! **

And for now, let's just take a look at the strange beast-- DataType

In [ ]:
123 isa Integer
Out[0]:
true
In [ ]:
Vector isa DataType || Dict isa DataType
Out[0]:
false
In [ ]:
Function isa DataType
Out[0]:
true

Don't worry, explanations will be forthcoming.

Parameterised types

Both composite, abstract and even primitive types can be parameterised.

Let's start with the more obvious type:

Parameterised composite types

can be useful when it is important to preserve the logic of the structure, but the type of object fields can change.

For example, it is possible to guess what a complex number is:

struct Complex{T<:Real} <: Number
    re::T
    im::T
end
In [ ]:
ci8 = Int8(1)+Int8(2)im
@show typeof(ci8)
sizeof(ci8)
typeof(ci8) = Complex{Int8}
Out[0]:
2
In [ ]:
cf64 = 1.5+2im
@show typeof(cf64)
sizeof(cf64)
typeof(cf64) = ComplexF64
Out[0]:
16

As you can see, depending on what parameters we passed, we get objects of different types. They occupy different amounts of memory and may work differently.

Parameterised abstract types

An example of a parameterised abstract type is AbstractDict

abstract type AbstractDict{K,V} end

A dictionary in turn is:

mutable struct Dict{K,V} <: AbstractDict{K,V}
    slots::Memory{UInt8}
    keys::Memory{K}
    vals::Memory{V}
    ...
    ...
end

This is necessary in order to implement efficient interfaces. For example, the environment variable set ENV is not a Dict, but it is an AbstractDict.

It is important that the ENV be parametric AbstractDict{String,String}. This is why parametric abstract types can be very convenient.

In [ ]:
ENV isa Dict
Out[0]:
false
In [ ]:
ENV isa AbstractDict
Out[0]:
true

FINALLY, ARRAYS!

It is only now, having reached parametric abstract types, that we can understand what arrays are.

Even though implementation arrays are written in C, we can see what they are

[1,2,3]

[1 2 3;
 4 5 6;
 7 8 9;]

 and rand(3,3,3,3)

It's all about the definition of the type:

   abstract type AbstractArray{T,N} end

N here could be denoted as

Abstract type AbstractArray{T,N<:Unsigned} end
In [ ]:
Array <: DenseArray <: AbstractArray
Out[0]:
true
In [ ]:
Array{Int8,1}(undef,4)
Out[0]:
4-element Vector{Int8}:
 -32
  40
  89
  59
In [ ]:
Array{Int8,2}(undef,2,3)
Out[0]:
2×3 Matrix{Int8}:
 0  0  0
 0  0  0
In [ ]:
Array{Int8,3}(undef,3,3,3)
Out[0]:
3×3×3 Array{Int8, 3}:
[:, :, 1] =
 1  0  0
 0  0  0
 0  0  8

[:, :, 2] =
 0  0     0
 0  0   -48
 0  0  -126

[:, :, 3] =
 77  127    16
 83    0  -125
 27    0    77

And here is the answer to why range in Julia supports the array interface:

In [ ]:
1:0.5:1000 isa StepRangeLen <: AbstractArray
Out[0]:
true

I.e. parametric type is similar to templates in C++.

But it is important to understand the peculiarities of types in Julia

Parametric types in Julia are invariant.

In [ ]:
 Vector{Int} <: Vector{Real}


 Vector{Int} <: Vector{<:Real}


 Vector{Complex} <: Vector{<:Real}


 Vector{Complex} <: Vector
Out[0]:
true

This is where DataType can help us. DataType allows us to understand if a type is "declared".

All concrete types are DataType.

  1. most non-parametric types are DataTypes
abstract type Number end;
  1. If we have specified parameters, it is also a DataType.

To understand what a DataType is - it is easier to start from what is not a DataType.

In [ ]:
Union{Int32,Char} isa DataType
Vector{<:Real} isa DataType # тоже своего рода "объединение всех векторов, чей тип является подтипом Real"
Out[0]:
false

And how does it all apply ?

Multiple dispatching has the following priority:

  1. Concrete type
  2. abstract type
  3. Parameterised type

We finish by looking at how the arranged function of the greatest common multiple where there are "overloaded" functions:

#1
function lcm(a::T, b::T) where T<:Integer
#2
function lcm(x::Rational, y::Rational)
#3
lcm(a::Real, b::Real) = lcm(promote(a,b)...)
#4
lcm(a::T, b::T) where T<:Real = throw(MethodError(lcm, (a,b)))

For the case of integers, function 1 will be called

If we pass rational parameters, then function 2 will be called.

If we pass lcm(2, 2//3), function 3 will be called first and type promotion will take place. After that function 2 will be called.

But if we call lcm(2, 1.5) , then after type promotion we will get to 4 - the "template" version, where an error will be called.

This is similar to the rule

In [ ]:
promote(2, 2//3)
Out[0]:
(2//1, 2//3)

See you soon!