Data types in Julia
A script about data types in addition to the [article](https://habr.com/ru/companies/etmc_exponenta/articles/882178 /) on habr.com .
Pkg.add("AbstractTrees")
Types in Julia
-
Primitive type: a type defined using a keyword
primitive type. Objects of a primitive type have a specified fixed memory size specified in the type definition. 📝Int64,Bool,Char -
Composite type: A type defined using a keyword
struct. Composite types consist of zero or more fields that refer to other objects (primitive or composite type).📝Complex,Rational(fieldsre, imandnum, den, respectively),Tuple -
Specific type: primitive or composite type
-
Abstract type: a type defined using a keyword
abstract type. Abstract types do not have fields, and objects cannot be created (instantiated) based on them. In addition, they cannot be declared children of a specific type. Abstract types also include non-concrete types.📝Number,AbstractFloat -
Mutable type: a composite type defined using a keyword
mutable struct. Mutable types can associate their fields with other objects other than the ones they were associated with during initialization.📝String,Dict -
Immutable type: all types except those defined with
mutable struct. -
Parametric type: A family of (mutable or immutable) composite or abstract types with the same field names and type name without regard to parameter types. The defined type is then uniquely identified by the name of the parametric type and the type(s) of the parameter(s). 📝
Rational{Int8}(1,2), see belowAbstractArray{T,N},Array{T,N} -
Source types: A type whose definition is contained in Julia Base or in the Julia standard library
-
Bit type: a primitive or immutable composite type, all fields of which are bit types
-
Singleton: An object created based on a composite type consisting of zero fields. 📝
nothing,missing -
Container: A composite type (not necessarily mutable) designed to reference a variable number of objects and provide methods for accessing, iterating, and eventually changing references to other objects.
The primitive type
Despite the fact that the documentation does not recommend using the construction primitive type I suggest starting to get acquainted with types with primitive ones.
This is done because here we will drop to the lowest level, where we will see how the data will eventually be represented in memory.
As an example, let's introduce our "interference-proof from space" Bool, which fills all possible available bit cells with either 0 or 1.
When creating a primitive type, you must explicitly specify how many bits are needed to store this type. (In our case 8)
primitive type FilledBool 8 end
function FilledBool(x::Int)
if iszero(x)
reinterpret(FilledBool,0b00000000)
elseif x == 1
reinterpret(FilledBool,0b11111111)
else
error("Only 0 and 1 are allowed as parameters.")
end
end
Base.show(io :: IO, x :: FilledBool) = print(io, bitstring(x))
@show tr = FilledBool(1)
@show fls = FilledBool(0)
println("Regular Bool true: ", bitstring(true))
Let's check if our type is a bit type.:
isbitstype(FilledBool)
The documentation says that instead of creating your own primitive types, it's better to wrap them in a composite type.
Let's get to know him better!
Composite type
Immutable composite type
It is important to understand that a composite type can consist of several fields, as well as one or zero fields.
Unlike many other programming languages, where fields and methods are associated with an object, only fields and its [constructor] are associated with a composite type in Julia (https://engee.com/helpcenter/stable/ru/julia/manual/constructors.html ).
It's interesting to talk about how OOP and Julia relate. здесь
But we'll focus on the types for now.
Let's say we have a "Mountain" type. We specify 2 characteristics of objects of this type:
- the year of the conquest (the year can be positive or negative)
- the height of the mountain (assume that all mountains are above sea level)
Fields cannot be changed in immutable types after they are created.
struct Mountain
first_ascent_year::Int16
height::UInt16
end
Everest = Mountain(1953,8848)
Int(Everest.height)
try
Everest.height = 9000 # you cannot change the values of the Mountain fields
catch e
e
end
To take a closer look at how the structure works, you can use:
dump(Everest)
Each element type of the immutable Mountain structure is bit-based, so the Mountain type is bit-based
@show sizeof(Mountain) # 2 fields of 2 bytes each = 4
isbitstype(Mountain)
Consider the case when the fields of an immutable structure are not a bit type.
The string is not stored as an array of elements Char's, but as a pointer to an array Char
Therefore, the size of the structure is 8 bytes (the size of the pointer), and the size of the string is 6 bytes.
(Although sizeof(Char)=4 in the case of ASCII, they will take up 1 byte)
struct City
name::String
end
Moscow = City("Moscow")
Moscow.name
@show sizeof(Moscow)
@show sizeof(Moscow.name)
@show Base.summarysize(Moscow);
If you want to use static strings, then
import Pkg.add; Pkg.add("StaticStrings")
using StaticStrings
struct StaticCity
name::StaticString{10}
end
Moscow = StaticCity(static"Moscow"10) # expanded from \0 to 10
@show sizeof(Moscow)
@show sizeof(Moscow.name)
@show Base.summarysize(Moscow);
Even though we can't change the string, this type is not a bit type.
That is, it is important to understand the difference between immutable and bit types.
The unusual behavior of the ismutable function("123") is explained here
@show isbitstype(City)
@show isbitstype(StaticCity);
I would like to note separately that an immutable type can have immutable fields of a mutable type.
As an analogy,
let's say we have a string to which a balloon is tied, which we can change: stretch, inflate, fill with water.
But we can't tear off the string and attach the green ball to it.
struct Student
name::String
grade::UInt8 # class
grades::Vector{Int} # evaluations
end
Alex = Student("Alex", 1, [5,5,5])
@show sizeof(Alex) # 8 + 1 + 8 = 17 => 24 rounding up to x % 8 == 0
pointer(Alex.grades)
push!(Alex.grades,4)
Alex.grades
@show pointer(Alex.grades)
As you can see, we are changing the elements of the vector, but not the pointer to its first element.
# dereference of a pointer to a vector (the 1st element of the vector)
unsafe_load(pointer(Alex.grades))
But if we want to change not the elements of the vector, but the pointer to the vector, an error will occur.
try
Alex.grades = [1, 2, 3] # Here we want to
catch e
e
end
Changeable type
In the case of a mutable type, we can change the fields.
mutable struct MutableStudent
const name::String
grade::UInt8 # class
grades::Vector{Int} # evaluations
end
Peter = MutableStudent("Peter", 1, [5,5,5])
Peter.grade = 2
However, it is possible to make some fields of a mutable structure immutable (constant).
In this case, despite the fact that the structure is mutable, this field cannot be changed.
try
Peter.name = "Alex"
catch e
e
end
You can see how now we can change the vector to another one.:
@show pointer(Peter.grades)
@show Peter.grades = [2,2,2]
@show pointer(Peter.grades)
The difference between an immutable struct and a mutable struct with constant fields.
Despite the fact that the fields of an immutable structure and constant fields of a mutable structure cannot be changed, there is a significant difference between objects of these types with the same fields.
In the case of an immutable type, objects with the same fields are literally the same object, since all objects with the same fields will be located at the same address.
In the case of mutable struct each of the objects with the same constant fields will be located at its own unique address.
struct Immutable
a::Int32
b::Int32
end
mutable struct ConstMutable
const a::Int32
const b::Int32
end
im_obj_1 = Immutable(1,2)
im_obj_2 = Immutable(1,2)
const_mut_obj_1 = ConstMutable(1,2)
const_mut_obj_2 = ConstMutable(1,2)
# === means the equality of both values and addresses in memory
@show im_obj_1 === im_obj_2
@show const_mut_obj_1 === const_mut_obj_2
Immutable structures may not be as convenient in terms of the interface of their use.
But their advantage is their placement "on the stack". While mutable structures are usually stored "on the heap".
println(@allocations (a = Immutable(3,4); b = Immutable(3,4)))
println(@allocations (a = ConstMutable(3,4); b = ConstMutable(3,4)))
However, this statement does not need to be applied буквально.
For example, the compiler can optimize and not allocate memory for mutable structures inside a function that will return a number rather than a mutable structure.:
function foo(x,y)
obj1 = Immutable(x,y)
obj2 = Immutable(y,x)
c = obj1.a + obj2.b
end
function bar(x,y)
obj1 = ConstMutable(x,y)
obj2 = ConstMutable(y,x)
c = obj1.a + obj2.b
end
println(@allocations foo(1,2))
println(@allocations bar(1,2))
The abstract type
What are abstract types for?
Abstract types are needed in order to:
- group specific types
- set interfaces for functions
- manage the scope of creating other classes using parameterization (see below)
Grouping of specific types
Abstract types make it possible to organize hierarchies of types.
Let's consider the classic and most understandable type - Number.
Using A <: B We can specify or verify that the type A It is a subtype of B
Int8 <: Integer || Int16 <: Integer
subtypes(Signed)
You can also work in the opposite direction.:
B :> A shows that B It is a supertype A
And the supertypes function returns a tuple of supertypes ordered from left to right in ascending order.
supertypes(Int8)
But a more visually pleasing extension is the AbstractTrees package, which allows us to get a familiar picture.
using AbstractTrees
AbstractTrees.children(t::Type) = subtypes(t)
print_tree(Number) # here you can see the types from Engee as well.
However, I recommend running print_tree(Any) and immerse yourself in the wonderful world of Julia types))))
Abstract types and multiple dispatching
Concluding the discussion about our numbers, I would like to note that it is logical that any two numbers can be added together.
Therefore, in promotion.jl есть next line:
+(x::Number, y::Number) = +(promote(x,y)...)
(using methods(+) you can see what's going on with what and by what rules)
Although, for example, the smallest common multiple should be defined only for integers or rationals. We will discuss it at the end.
And for those who are tired of numbers, I suggest we return to our sheep.
abstract type Pet end
struct Dog <: Pet; name::String end
struct Cat <: Pet; name::String end
function encounter(a::Pet, b::Pet)
verb = meets(a, b)
println("$(a.name ) meets $(b.name ) and $verb.")
end
meets(a::Dog, b::Dog) = "He sniffs"
meets(a::Dog, b::Cat) = "Chasing"
meets(a::Cat, b::Dog) = "hisses"
meets(a::Cat, b::Cat) = "Purring"
fido = Dog("Rex")
rex = Dog("Mukhtar")
whiskers = Cat("Matroskin")
spots = Cat("Hippopotamus")
encounter(fido, rex)
encounter(fido, whiskers)
encounter(whiskers, rex)
encounter(whiskers, spots)
The convenience lies in the fact that we can not specify for each animal how it greets another, but create a common "greeting" interface for animals.
meets(a::Pet, b::Pet) = "He greets you"
struct Cow <: Pet; name::String end
encounter(rex,Cow("Burenka"))
This does not work so conveniently in all languages. More information about this can be found in video, from which this code was taken.
DataType
But before going to the last section
** Let's make trouble!**
And for now, just look at the strange beast. - DataType
123 isa Integer
Vector isa DataType || Dict isa DataType
Function isa DataType
Don't worry, explanations will be given.
Parameterized types
Both composite, abstract, and even primitive types can be parameterized.
Let's start with the more obvious variety:
Parameterized composite types
They can be useful when it is important to preserve the logic of the structure, but the type of object fields may vary.
So, for example, you can guess what a complex number is.:
struct Complex{T<:Real} <: Number
re::T
im::T
end
ci8 = Int8(1)+Int8(2)im
@show typeof(ci8)
sizeof(ci8)
cf64 = 1.5+2im
@show typeof(cf64)
sizeof(cf64)
As you can see, depending on which parameters we passed, we get objects of different types.
They take up different amounts of memory and can work in different ways.
Parameterized abstract types
An example of a parameterized abstract type is AbstractDict
abstract type AbstractDict{K,V} end
The dictionary, in turn, is:
mutable struct Dict{K,V} <: AbstractDict{K,V}
slots::Memory{UInt8}
keys::Memory{K}
vals::Memory{V}
...
...
end
This is necessary in order to implement effective interfaces.
For example, a set of environment variables ENV It is not a Dict, but it is an AbstractDict.
It is important that the ENV is parametric. AbstractDict{String,String}.
Therefore, parametric abstract types can be very convenient.
ENV isa Dict
ENV isa AbstractDict
FINALLY, ARRAYS!
And only now, having reached parametric abstract types, we can understand what arrays are.
Despite the fact that реализация arrays are written in C, we can see what they are.
[1,2,3]
[1 2 3;
4 5 6;
7 8 9;]
и rand(3,3,3)
Всё дело в определении этого типа:
abstract type AbstractArray{T,N} end
N here could be denoted as
abstract type AbstractArray{T,N<:Unsigned} end
Array <: DenseArray <: AbstractArray
Array{Int8,1}(undef,4)
Array{Int8,2}(undef,2,3)
Array{Int8,3}(undef,3,3,3)
And here's the answer to why range in Julia supports the array interface.:
1:0.5:1000 isa StepRangeLen <: AbstractArray
In other words, the parametric type is similar to templates in C++.
But it's important to understand the specifics of types in Julia.
Parametric Types in Julia are [invariant](https://habr.com/ru/articles/218753 /)
Vector{Int} <: Vector{Real}
Vector{Int} <: Vector{<:Real}
Vector{Complex} <: Vector{<:Real}
Vector{Complex} <: Vector
That's where DataType can help us.
The DataType allows you to understand whether the type is "declared".
All the specific types are DataType.
- Most nonparametric types are DataType
abstract type Number end;
- If we specified the parameters, then this is also a DataType.
To understand what a DataType is, it's easier to start from what is not a DataType.
Union{Int32,Char} isa DataType
Vector{<:Real} isa DataType # it is also a kind of "union of all vectors whose type is a subtype of Real"
And how does it all apply?
Multiple dispatching has the following priority:
- The specific type
- The abstract type
- Parameterized type
We'll end by looking at how устроена the function of the largest common multiple, where there are "overloaded" functions:
#1
function lcm(a::T, b::T) where T<:Integer
#2
function lcm(x::Rational, y::Rational)
#3
lcm(a::Real, b::Real) = lcm(promote(a,b)...)
#4
lcm(a::T, b::T) where T<:Real = throw(MethodError(lcm, (a,b)))
For the case of integers, the function 1 will be called.
If we pass rational parameters, then it will be called функция 2
If we pass lcm(2, 2//3),then function 3 will be called first and [promotion] will occur типов](https://engee.com/helpcenter/stable/ru/julia/manual/conversion-and-promotion.html#продвижение). After that, function 2 will be called.
But if we call lcm(2, 1.5), then after advancing the types, we will get to the 4- "template" version, where an error will already be caused.
It looks like a rule.
promote(2, 2//3)
See you soon!