Engee documentation
Notebook

Data types in Julia

A script about data types in addition to the [article](https://habr.com/ru/companies/etmc_exponenta/articles/882178 /) on habr.com .

In [ ]:
Pkg.add("AbstractTrees")

Types in Julia

  • Primitive type: a type defined using a keyword primitive type. Objects of a primitive type have a specified fixed memory size specified in the type definition. 📝Int64,Bool,Char

  • Composite type: A type defined using a keyword struct. Composite types consist of zero or more fields that refer to other objects (primitive or composite type).📝Complex,Rational (fields re, im and num, den, respectively), Tuple

  • Specific type: primitive or composite type

  • Abstract type: a type defined using a keyword abstract type. Abstract types do not have fields, and objects cannot be created (instantiated) based on them. In addition, they cannot be declared children of a specific type. Abstract types also include non-concrete types.📝 Number, AbstractFloat

  • Mutable type: a composite type defined using a keyword mutable struct. Mutable types can associate their fields with other objects other than the ones they were associated with during initialization.📝 String, Dict

  • Immutable type: all types except those defined with mutable struct.

  • Parametric type: A family of (mutable or immutable) composite or abstract types with the same field names and type name without regard to parameter types. The defined type is then uniquely identified by the name of the parametric type and the type(s) of the parameter(s). 📝 Rational{Int8}(1,2), see below AbstractArray{T,N}, Array{T,N}

  • Source types: A type whose definition is contained in Julia Base or in the Julia standard library

  • Bit type: a primitive or immutable composite type, all fields of which are bit types

  • Singleton: An object created based on a composite type consisting of zero fields. 📝nothing, missing

  • Container: A composite type (not necessarily mutable) designed to reference a variable number of objects and provide methods for accessing, iterating, and eventually changing references to other objects.

The primitive type

Despite the fact that the documentation does not recommend using the construction primitive type I suggest starting to get acquainted with types with primitive ones.

This is done because here we will drop to the lowest level, where we will see how the data will eventually be represented in memory.

As an example, let's introduce our "interference-proof from space" Bool, which fills all possible available bit cells with either 0 or 1.

When creating a primitive type, you must explicitly specify how many bits are needed to store this type. (In our case 8)

In [ ]:
primitive type FilledBool  8 end

function FilledBool(x::Int)
    if iszero(x)
        reinterpret(FilledBool,0b00000000)
    elseif x == 1
        reinterpret(FilledBool,0b11111111)
    else 
        error("Only 0 and 1 are allowed as parameters.")
    end
end 

Base.show(io :: IO, x :: FilledBool) = print(io, bitstring(x))

@show tr = FilledBool(1)
@show fls = FilledBool(0)
println("Regular Bool true: ", bitstring(true))
tr = FilledBool(1) = 11111111
fls = FilledBool(0) = 00000000
Regular Bool true: 00000001

Let's check if our type is a bit type.:

In [ ]:
isbitstype(FilledBool)
Out[0]:
true

The documentation says that instead of creating your own primitive types, it's better to wrap them in a composite type.
Let's get to know him better!

Composite type

Immutable composite type

It is important to understand that a composite type can consist of several fields, as well as one or zero fields.
Unlike many other programming languages, where fields and methods are associated with an object, only fields and its [constructor] are associated with a composite type in Julia (https://engee.com/helpcenter/stable/ru/julia/manual/constructors.html ).
It's interesting to talk about how OOP and Julia relate. здесь

But we'll focus on the types for now.

Let's say we have a "Mountain" type. We specify 2 characteristics of objects of this type:

  • the year of the conquest (the year can be positive or negative)
  • the height of the mountain (assume that all mountains are above sea level)

Fields cannot be changed in immutable types after they are created.

In [ ]:
struct Mountain
    first_ascent_year::Int16
    height::UInt16
end

Everest = Mountain(1953,8848)
Int(Everest.height)

try
    Everest.height = 9000  # you cannot change the values of the Mountain fields
catch e 
e
end
Out[0]:
ErrorException("setfield!: immutable struct of type Mountain cannot be changed")

To take a closer look at how the structure works, you can use:

In [ ]:
dump(Everest)
Mountain
  first_ascent_year: Int16 1953
  height: UInt16 0x2290

Each element type of the immutable Mountain structure is bit-based, so the Mountain type is bit-based

In [ ]:
@show sizeof(Mountain) # 2 fields of 2 bytes each = 4
isbitstype(Mountain)
sizeof(Mountain) = 4
Out[0]:
true

Consider the case when the fields of an immutable structure are not a bit type.

The string is not stored as an array of elements Char's, but as a pointer to an array Char
Therefore, the size of the structure is 8 bytes (the size of the pointer), and the size of the string is 6 bytes.
(Although sizeof(Char)=4 in the case of ASCII, they will take up 1 byte)

In [ ]:
struct City
    name::String
end

Moscow = City("Moscow")

Moscow.name

@show sizeof(Moscow)
@show sizeof(Moscow.name)
@show Base.summarysize(Moscow);
sizeof(Moscow) = 8
sizeof(Moscow.name) = 6
Base.summarysize(Moscow) = 22

If you want to use static strings, then

In [ ]:
import Pkg.add; Pkg.add("StaticStrings")
using StaticStrings
struct StaticCity
    name::StaticString{10}
end
Moscow = StaticCity(static"Moscow"10) # expanded from \0 to 10
@show sizeof(Moscow)
@show sizeof(Moscow.name)
@show Base.summarysize(Moscow);
sizeof(Moscow) = 10
sizeof(Moscow.name) = 10
Base.summarysize(Moscow) = 10

Even though we can't change the string, this type is not a bit type.

That is, it is important to understand the difference between immutable and bit types.

The unusual behavior of the ismutable function("123") is explained here

In [ ]:
@show isbitstype(City)
@show isbitstype(StaticCity);
isbitstype(City) = false
isbitstype(StaticCity) = true

I would like to note separately that an immutable type can have immutable fields of a mutable type.

As an analogy,
let's say we have a string to which a balloon is tied, which we can change: stretch, inflate, fill with water.
But we can't tear off the string and attach the green ball to it.

In [ ]:
struct Student
    name::String
    grade::UInt8        # class
    grades::Vector{Int} # evaluations
 end
 Alex = Student("Alex", 1, [5,5,5])
 @show sizeof(Alex)  # 8 + 1 + 8 = 17 => 24 rounding up to x % 8 == 0
sizeof(Alex) = 24
Out[0]:
24
In [ ]:
pointer(Alex.grades)
Out[0]:
Ptr{Int64} @0x00007f3817033d80
In [ ]:
push!(Alex.grades,4)
Alex.grades
Out[0]:
4-element Vector{Int64}:
 5
 5
 5
 4
In [ ]:
@show pointer(Alex.grades)
pointer(Alex.grades) = Ptr{Int64} @0x00007f3a4ac9bf80
Out[0]:
Ptr{Int64} @0x00007f3a4ac9bf80

As you can see, we are changing the elements of the vector, but not the pointer to its first element.

In [ ]:
# dereference of a pointer to a vector (the 1st element of the vector)
unsafe_load(pointer(Alex.grades)) 
Out[0]:
5

But if we want to change not the elements of the vector, but the pointer to the vector, an error will occur.

In [ ]:
try
Alex.grades = [1, 2, 3] # Here we want to
catch e
    e
end
Out[0]:
ErrorException("setfield!: immutable struct of type Student cannot be changed")

Changeable type

In the case of a mutable type, we can change the fields.

In [ ]:
mutable struct MutableStudent
    const name::String
    grade::UInt8        # class
    grades::Vector{Int} # evaluations
end
Peter = MutableStudent("Peter", 1, [5,5,5])
Peter.grade = 2
Out[0]:
2

However, it is possible to make some fields of a mutable structure immutable (constant).
In this case, despite the fact that the structure is mutable, this field cannot be changed.

In [ ]:
try
    Peter.name = "Alex"
catch e
    e
end
Out[0]:
ErrorException("setfield!: const field .name of type MutableStudent cannot be changed")

You can see how now we can change the vector to another one.:

In [ ]:
@show pointer(Peter.grades)
@show Peter.grades = [2,2,2]
@show pointer(Peter.grades)
pointer(Peter.grades) = Ptr{Int64} @0x00007f38bdc85ee0
Peter.grades = [2, 2, 2] = [2, 2, 2]
pointer(Peter.grades) = Ptr{Int64} @0x00007f38bdefaac0
Out[0]:
Ptr{Int64} @0x00007f38bdefaac0

The difference between an immutable struct and a mutable struct with constant fields.

Despite the fact that the fields of an immutable structure and constant fields of a mutable structure cannot be changed, there is a significant difference between objects of these types with the same fields.

In the case of an immutable type, objects with the same fields are literally the same object, since all objects with the same fields will be located at the same address.

In the case of mutable struct each of the objects with the same constant fields will be located at its own unique address.

In [ ]:
struct Immutable
    a::Int32
    b::Int32
 end

 mutable struct ConstMutable
    const a::Int32
    const b::Int32
end

im_obj_1 = Immutable(1,2)
im_obj_2 = Immutable(1,2)

const_mut_obj_1 = ConstMutable(1,2)
const_mut_obj_2 = ConstMutable(1,2)
# === means the equality of both values and addresses in memory
@show im_obj_1 === im_obj_2  
@show const_mut_obj_1 === const_mut_obj_2
im_obj_1 === im_obj_2 = true
const_mut_obj_1 === const_mut_obj_2 = false
Out[0]:
false

Immutable structures may not be as convenient in terms of the interface of their use.
But their advantage is their placement "on the stack". While mutable structures are usually stored "on the heap".

In [ ]:
println(@allocations (a = Immutable(3,4); b = Immutable(3,4)))
println(@allocations (a = ConstMutable(3,4); b = ConstMutable(3,4)))
0
2

However, this statement does not need to be applied буквально.

For example, the compiler can optimize and not allocate memory for mutable structures inside a function that will return a number rather than a mutable structure.:

In [ ]:
function foo(x,y)
    obj1 = Immutable(x,y)
    obj2 = Immutable(y,x)
    c = obj1.a + obj2.b
end
function bar(x,y)
    obj1 = ConstMutable(x,y)
    obj2 = ConstMutable(y,x)
    c = obj1.a + obj2.b
end
println(@allocations foo(1,2))
println(@allocations bar(1,2))
0
0

The abstract type

What are abstract types for?
Abstract types are needed in order to:

  • group specific types
  • set interfaces for functions
  • manage the scope of creating other classes using parameterization (see below)

Grouping of specific types

Abstract types make it possible to organize hierarchies of types.

Let's consider the classic and most understandable type - Number.

Using A <: B We can specify or verify that the type A It is a subtype of B

In [ ]:
Int8 <: Integer || Int16 <: Integer
Out[0]:
true
In [ ]:
subtypes(Signed)
Out[0]:
6-element Vector{Any}:
 BigInt
 Int128
 Int16
 Int32
 Int64
 Int8

You can also work in the opposite direction.:
B :> A shows that B It is a supertype A

And the supertypes function returns a tuple of supertypes ordered from left to right in ascending order.

In [ ]:
supertypes(Int8)
Out[0]:
(Int8, Signed, Integer, Real, Number, Any)

But a more visually pleasing extension is the AbstractTrees package, which allows us to get a familiar picture.

In [ ]:
using AbstractTrees
AbstractTrees.children(t::Type) = subtypes(t)
print_tree(Number) # here you can see the types from Engee as well.
Number
├─ MultiplicativeInverse
│  ├─ SignedMultiplicativeInverse
│  └─ UnsignedMultiplicativeInverse
├─ Complex
├─ Measurement
├─ Quaternion
├─ Real
│  ├─ AbstractFloat
│  │  ├─ BigFloat
│  │  ├─ DecimalFloatingPoint
│  │  │  ├─ Dec128
│  │  │  ├─ Dec32
│  │  │  └─ Dec64
│  │  ├─ Float16
│  │  ├─ Float32
│  │  └─ Float64
│  ├─ AbstractIrrational
│  │  ├─ Irrational
│  │  └─ IrrationalConstant
│  │     ├─ Fourinvπ
│  │     ├─ Fourπ
│  │     ├─ Halfπ
│  │     ├─ Inv2π
│  │     ├─ Inv4π
│  │     ├─ Invsqrt2
│  │     ├─ Invsqrt2π
│  │     ├─ Invsqrtπ
│  │     ├─ Invπ
│  │     ├─ Log2π
│  │     ├─ Log4π
│  │     ├─ Loghalf
│  │     ├─ Logten
│  │     ├─ Logtwo
│  │     ├─ Logπ
│  │     ├─ Quartπ
│  │     ├─ Sqrt2
│  │     ├─ Sqrt2π
│  │     ├─ Sqrt3
│  │     ├─ Sqrt4π
│  │     ├─ Sqrthalfπ
│  │     ├─ Sqrtπ
│  │     ├─ Twoinvπ
│  │     └─ Twoπ
│  ├─ FixedPoint
│  │  └─ Fixed
│  ├─ FixedPoint
│  │  ├─ Fixed
│  │  └─ Normed
│  ├─ Dual
│  ├─ Percentile
│  ├─ Integer
│  │  ├─ Bool
│  │  ├─ NodeType
│  │  ├─ ReaderType
│  │  ├─ UpperBoundedInteger
│  │  ├─ ChainedVectorIndex
│  │  ├─ Signed
│  │  │  ├─ BigInt
│  │  │  ├─ Int128
│  │  │  ├─ Int16
│  │  │  ├─ Int32
│  │  │  ├─ Int64
│  │  │  └─ Int8
│  │  └─ Unsigned
│  │     ├─ VarUInt
│  │     ├─ UInt128
│  │     ├─ UInt16
│  │     ├─ UInt32
│  │     ├─ UInt64
│  │     └─ UInt8
│  ├─ Rational
│  ├─ SimpleRatio
│  ├─ StaticFloat64
│  ├─ PValue
│  ├─ TestStat
│  ├─ LiteralReal
│  ├─ SafeReal
│  ├─ Num
│  ├─ Struct
│  └─ AbstractSIMD
│     ├─ AbstractSIMDVector{W, T} where {W, T<:(Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, var"#s3"} where var"#s3"<:StaticInt)}
│     │  ├─ AbstractMask
│     │  │  ├─ EVLMask{W, U} where {W, U<:Union{UInt128, UInt16, UInt32, UInt64, UInt8}}
│     │  │  └─ Mask{W, U} where {W, U<:Union{UInt128, UInt16, UInt32, UInt64, UInt8}}
│     │  ├─ MM
│     │  └─ Vec{W, T} where {W, T<:(Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, var"#s3"} where var"#s3"<:StaticInt)}
│     └─ VecUnroll{N, W, T, V} where {N, W, T<:(Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, var"#s3"} where var"#s3"<:StaticInt), V<:Union{Bool, Float16, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, Bit, AbstractSIMD{W, T}}}
├─ StaticInteger
│  ├─ StaticBool
│  │  ├─ False
│  │  └─ True
│  └─ StaticInt
├─ AbstractQuantity
│  └─ Quantity
├─ LogScaled
│  ├─ Gain{L} where L<:LogInfo
│  └─ Level{L} where L<:LogInfo
├─ Double
└─ LazyMulAdd

However, I recommend running print_tree(Any) and immerse yourself in the wonderful world of Julia types))))

Abstract types and multiple dispatching

Concluding the discussion about our numbers, I would like to note that it is logical that any two numbers can be added together.

Therefore, in promotion.jl есть next line:

+(x::Number, y::Number) = +(promote(x,y)...)

(using methods(+) you can see what's going on with what and by what rules)

Although, for example, the smallest common multiple should be defined only for integers or rationals. We will discuss it at the end.

And for those who are tired of numbers, I suggest we return to our sheep.

In [ ]:
abstract type Pet end
struct Dog <: Pet; name::String end
struct Cat <: Pet; name::String end

function encounter(a::Pet, b::Pet)
    verb = meets(a, b)
    println("$(a.name ) meets $(b.name ) and $verb.")
end


meets(a::Dog, b::Dog) = "He sniffs"
meets(a::Dog, b::Cat) = "Chasing"
meets(a::Cat, b::Dog) = "hisses"
meets(a::Cat, b::Cat) = "Purring"

fido = Dog("Rex")
rex = Dog("Mukhtar")
whiskers = Cat("Matroskin")
spots = Cat("Hippopotamus")

encounter(fido, rex)       
encounter(fido, whiskers)  
encounter(whiskers, rex)   
encounter(whiskers, spots) 
Рекс встречает Мухтар и нюхает.
Рекс встречает Матроскин и гонится.
Матроскин встречает Мухтар и шипит.
Матроскин встречает Бегемот и мурлычит.

The convenience lies in the fact that we can not specify for each animal how it greets another, but create a common "greeting" interface for animals.

In [ ]:
meets(a::Pet, b::Pet) = "He greets you"

struct Cow <: Pet; name::String end

encounter(rex,Cow("Burenka"))
Мухтар встречает Бурёнка и здоровается.

This does not work so conveniently in all languages. More information about this can be found in video, from which this code was taken.

DataType

But before going to the last section

** Let's make trouble!**

And for now, just look at the strange beast. - DataType

In [ ]:
123 isa Integer
Out[0]:
true
In [ ]:
Vector isa DataType || Dict isa DataType
Out[0]:
false
In [ ]:
Function isa DataType
Out[0]:
true

Don't worry, explanations will be given.

Parameterized types

Both composite, abstract, and even primitive types can be parameterized.

Let's start with the more obvious variety:

Parameterized composite types

They can be useful when it is important to preserve the logic of the structure, but the type of object fields may vary.

So, for example, you can guess what a complex number is.:

struct Complex{T<:Real} <: Number
    re::T
    im::T
end
In [ ]:
ci8 = Int8(1)+Int8(2)im
@show typeof(ci8)
sizeof(ci8)
typeof(ci8) = Complex{Int8}
Out[0]:
2
In [ ]:
cf64 = 1.5+2im
@show typeof(cf64)
sizeof(cf64)
typeof(cf64) = ComplexF64
Out[0]:
16

As you can see, depending on which parameters we passed, we get objects of different types.
They take up different amounts of memory and can work in different ways.

Parameterized abstract types

An example of a parameterized abstract type is AbstractDict

abstract type AbstractDict{K,V} end

The dictionary, in turn, is:

mutable struct Dict{K,V} <: AbstractDict{K,V}
    slots::Memory{UInt8}
    keys::Memory{K}
    vals::Memory{V}
    ...
    ...
end

This is necessary in order to implement effective interfaces.
For example, a set of environment variables ENV It is not a Dict, but it is an AbstractDict.

It is important that the ENV is parametric. AbstractDict{String,String}.
Therefore, parametric abstract types can be very convenient.

In [ ]:
ENV isa Dict
Out[0]:
false
In [ ]:
ENV isa AbstractDict
Out[0]:
true

FINALLY, ARRAYS!

And only now, having reached parametric abstract types, we can understand what arrays are.

Despite the fact that реализация arrays are written in C, we can see what they are.

[1,2,3]

[1 2 3;
 4 5 6;
 7 8 9;]

 и rand(3,3,3)

Всё дело в определении этого типа:

   abstract type AbstractArray{T,N} end

N here could be denoted as

abstract type AbstractArray{T,N<:Unsigned} end
In [ ]:
Array <: DenseArray <: AbstractArray
Out[0]:
true
In [ ]:
Array{Int8,1}(undef,4)
Out[0]:
4-element Vector{Int8}:
   0
  71
 -72
  35
In [ ]:
Array{Int8,2}(undef,2,3)
Out[0]:
2×3 Matrix{Int8}:
  -96  -36   58
 -108   35  127
In [ ]:
Array{Int8,3}(undef,3,3,3)
Out[0]:
3×3×3 Array{Int8, 3}:
[:, :, 1] =
 0  0  0
 0  0  0
 0  0  8

[:, :, 2] =
 0  9   0
 0  0  13
 0  0   0

[:, :, 3] =
  0  0  16
  0  0   0
 15  0   0

And here's the answer to why range in Julia supports the array interface.:

In [ ]:
1:0.5:1000 isa StepRangeLen <: AbstractArray
Out[0]:
true

In other words, the parametric type is similar to templates in C++.

But it's important to understand the specifics of types in Julia.

Parametric Types in Julia are [invariant](https://habr.com/ru/articles/218753 /)

In [ ]:
 Vector{Int} <: Vector{Real}


 Vector{Int} <: Vector{<:Real}


 Vector{Complex} <: Vector{<:Real}


 Vector{Complex} <: Vector
Out[0]:
true

That's where DataType can help us.
The DataType allows you to understand whether the type is "declared".

All the specific types are DataType.

  1. Most nonparametric types are DataType
abstract type Number end;
  1. If we specified the parameters, then this is also a DataType.

To understand what a DataType is, it's easier to start from what is not a DataType.

In [ ]:
Union{Int32,Char} isa DataType
Vector{<:Real} isa DataType # it is also a kind of "union of all vectors whose type is a subtype of Real"
Out[0]:
false

And how does it all apply?

Multiple dispatching has the following priority:

  1. The specific type
  2. The abstract type
  3. Parameterized type

We'll end by looking at how устроена the function of the largest common multiple, where there are "overloaded" functions:

#1
function lcm(a::T, b::T) where T<:Integer
#2
function lcm(x::Rational, y::Rational)
#3
lcm(a::Real, b::Real) = lcm(promote(a,b)...)
#4
lcm(a::T, b::T) where T<:Real = throw(MethodError(lcm, (a,b)))

For the case of integers, the function 1 will be called.

If we pass rational parameters, then it will be called функция 2

If we pass lcm(2, 2//3),then function 3 will be called first and [promotion] will occur типов](https://engee.com/helpcenter/stable/ru/julia/manual/conversion-and-promotion.html#продвижение). After that, function 2 will be called.

But if we call lcm(2, 1.5), then after advancing the types, we will get to the 4- "template" version, where an error will already be caused.

It looks like a rule.

In [ ]:
promote(2, 2//3)
Out[0]:
(2//1, 2//3)

See you soon!