Multithreading
A presentation of Julia’s multithreading features is given in this https://julialang.org/blog/2019/07/multithreading /[blog post].
Running Julia with multiple threads
By default, Julia runs with a single thread of execution. You can check this using the command Threads.nthreads()
.
julia> Threads.nthreads()
1
The number of control threads is controlled either by the command line argument -t
/--threads
, or by an environment variable JULIA_NUM_THREADS
. When both are specified, -t
/--threads
will have priority.
The number of threads can be set either as an integer (--threads=4
), or as auto
(--threads=auto
), where auto
tries to output a useful default number of threads to use (for more information, see the page Command Line options).
Compatibility: Julia 1.5
The command line argument |
Compatibility: Julia 1.7
So that as the value of the environment variable |
$ julia --threads 4
Let’s check that four streams are available to us.
julia> Threads.nthreads()
4
However, we are currently in the main stream. To check this, use the function Threads.threadid
.
julia> Threads.threadid()
1
If you prefer to use an environment variable, you can set it as follows in Bash (Linux/macOS): |
export JULIA_NUM_THREADS=4
В оболочке C на Linux/macOS или в CMD на Windows:
set JULIA_NUM_THREADS=4
В Powershell на Windows:
$env:JULIA_NUM_THREADS=4
Учтите, что это необходимо сделать *перед* запуском Julia.
The number of threads specified by |
Multiple garbage collector threads
The garbage collector can use multiple threads. The number used is either equal to half the number of computing worker threads, or is set using the command line argument --gcthreads
or an environment variable. JULIA_NUM_GC_THREADS
.
Compatibility: Julia 1.10
The command line argument `--gcthreads' requires a version not lower than Julia 1.10. |
Pools of streams
When program threads are busy with multiple tasks, tasks can be delayed, which can negatively affect the responsiveness and interactivity of the program. To solve this problem, you can specify that the task is interactive when you plan (Threads.@spawn
) execute it:
using Base.Threads
@spawn :interactive f()
In interactive tasks, high-latency operations should be avoided, and if these are long-running tasks, they should exit frequently.
Julia can be run with one or more threads reserved for interactive tasks.:
$ julia --threads 3,1
You can use an environment variable in the same way. JULIA_NUM_THREADS
:
export JULIA_NUM_THREADS=3,1
At the same time, Julia starts with 3 threads in the thread pool :default
and 1 thread in the thread pool :interactive
:
julia> using Base.Threads
julia> nthreadpools()
2
julia> threadpool() # главный поток находится в пуле интерактивных потоков
:interactive
julia> nthreads(:default)
3
julia> nthreads(:interactive)
1
julia> nthreads()
3
The |
Depending on whether the Julia environment was running with interactive threads, the main thread is either in the default thread pool or in the interactive thread pool. |
Any number or both of them can be replaced with the word auto
, resulting in Julia choosing a reasonable default value.
The '@threads` macro
Let’s look at a simple example based on our own streams. Create an array of zeros.
julia> a = zeros(10)
10-element Vector{Float64}:
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
We will work with this array using four threads simultaneously. Each thread will record its ID in each position.
Julia supports parallel loops based on macros Threads.@threads
. This macro is placed before the for
loop and indicates to Julia that the loop is a multithreaded region.
julia> Threads.@threads for i = 1:10
a[i] = Threads.threadid()
end
The iteration area is distributed among the threads, and then each thread writes its ID to the assigned positions.
julia> a
10-element Vector{Float64}:
1.0
1.0
1.0
2.0
2.0
2.0
3.0
3.0
4.0
4.0
Please note that Threads.@threads
an optional reduction parameter is missing, such as @distributed
.
Using @threads
without data race
The concept of data race is discussed in detail in the section Interaction and data races between threads. For now, just know that racing against the data can lead to incorrect results and dangerous mistakes.
Let’s say we want to make the sum_single
function below multithreaded.
julia> function sum_single(a)
s = 0
for i in a
s += i
end
s
end
sum_single (generic function with 1 method)
julia> sum_single(1:1_000_000)
500000500000
Simply adding @threads
leads to a data race with multiple threads simultaneously reading and writing `s'.
julia> function sum_multi_bad(a)
s = 0
Threads.@threads for i in a
s += i
end
s
end
sum_multi_bad (generic function with 1 method)
julia> sum_multi_bad(1:1_000_000)
70140554652
Note that the result is not equal to 500000500000
, as it should be, and is likely to change with each calculation.
To solve this problem, you can use task-specific buffers to split the amount into chunks without racing. Here sum_single
is reused with its own internal buffer s
. The input vector a
is split into fragments nthreads()
for parallel operation. Then we use Threads.@spawn
to create tasks that summarize each fragment separately. Finally, we summarize the results of each task, again using sum_single
:
julia> function sum_multi_good(a)
chunks = Iterators.partition(a, length(a) ÷ Threads.nthreads())
tasks = map(chunks) do chunk
Threads.@spawn sum_single(chunk)
end
chunk_sums = fetch.(tasks)
return sum_single(chunk_sums)
end
sum_multi_good (generic function with 1 method)
julia> sum_multi_good(1:1_000_000)
500000500000
Buffers should not be managed based on |
Another option is to use atomic operations with variables shared by tasks or threads, which may be more productive depending on the characteristics of the operations.
Interaction and data race between threads
Although Julia threads can communicate through shared memory, it is known that it is quite difficult to write correct multithreaded code that is free from data races. Channels (Channel
) in Julia are thread-safe and can be used for secure communication. The sections below explain how to use locks and atomic operations to avoid data races.
Eliminating data races
You are solely responsible for ensuring that there is no data race in your program. If you do not comply with this requirement, none of the promises here are guaranteed. The observed results can be completely unpredictable.
If there is a race, according to Julia, it does not ensure memory safety. Be very careful when reading any data if it is possible to write it in another stream, as this may lead to segmentation errors or even more serious problems. Below are a couple of unsafe ways to access global variables from different threads.
Thread 1:
global b = false
global a = rand()
global b = true
Thread 2:
while !b; end
bad_read1(a) # Доступ к `a` здесь НЕбезопасен!
Thread 3:
while !@isdefined(a); end
bad_read2(a) # Доступ к `a` здесь НЕбезопасен.
Using locks to exclude data races
Locking is an important tool to avoid data races and write thread-safe code as a result. The lock can be locked and unlocked. If a thread has blocked a lock and has not unblocked it, it is said to be "holding" it. If there is only one lock and we write code that requires holding the lock to access some data, we can ensure that multiple threads will never access the same data at the same time. Please note that the relationship between the lock and the variable is established by the programmer, not the program.
For example, you can create a my_lock
lock and lock it while changing the my_variable
variable. The easiest way to do this is using the macro @lock
:
julia> my_lock = ReentrantLock();
julia> my_variable = [1, 2, 3];
julia> @lock my_lock my_variable[1] = 100
100
When using a similar pattern with the same lock and variable, but in a different thread, operations will be free from data races.
The operation described above with the functional version of lock
can be performed in the following two ways:
julia> lock(my_lock) do
my_variable[1] = 100
end
100
julia> begin
lock(my_lock)
try
my_variable[1] = 100
finally
unlock(my_lock)
end
end
100
All three options are equivalent. Note that the latest version requires an explicit try
block to ensure that the lock is permanently unlocked, whereas the first two versions do this internally. When changing data (for example, assigning a value to a variable in the global scope or in a closure) that is available to other threads, you should always use the locking pattern shown above. Failure to comply with this requirement can lead to unforeseen and serious consequences.
Atomic operations
Julia supports accessing and changing values in an atomic manner, that is, with thread safety and prevention https://en.wikipedia.org/wiki/Race_condition [race conditions]. To provide such access to the value, you can put it in a shell. Threads.Atomic
(and the value must be of a primitive type). See the following example.
julia> i = Threads.Atomic{Int}(0);
julia> ids = zeros(4);
julia> old_is = zeros(4);
julia> Threads.@threads for id in 1:4
old_is[id] = Threads.atomic_add!(i, id)
ids[id] = id
end
julia> old_is
4-element Vector{Float64}:
0.0
1.0
7.0
3.0
julia> i[]
10
julia> ids
4-element Vector{Float64}:
1.0
2.0
3.0
4.0
If we had tried to do the addition without the atomicity label, the answer might have been incorrect due to the race condition. An example of what would happen without a race exception:
julia> using Base.Threads
julia> Threads.nthreads()
4
julia> acc = Ref(0)
Base.RefValue{Int64}(0)
julia> @threads for i in 1:1000
acc[] += 1
end
julia> acc[]
926
julia> acc = Atomic{Int64}(0)
Atomic{Int64}(0)
julia> @threads for i in 1:1000
atomic_add!(acc, 1)
end
julia> acc[]
1000
Atomic operations in each field
You can also use atomic operations at a more detailed level using macros. @atomic
, @atomicswap
, @atomicreplace'
and @atomiconce`.
Detailed information about the memory model and other aspects of their design is provided in https://gist .github.com/vtjnash/11b0031f2e2a66c9c24d33e810b34ec0 [Julia Atomic Operations Manifesto], which will be officially published later.
The label @atomic
can be used for any field in the structure declaration, and then each record must also be marked @atomic
and must use one of the specific orders of atomic operations (:monotonic
, :acquire
, :release
, :acquire_release
or :sequentially_consistent
). For any reading of an atomic field, you can also specify a restriction on the order of atomic operations. If it is not specified, a non-strict monotonic order is used.
Compatibility: Julia 1.7
Atomic operations in each field require a version not lower than Julia 1.7. |
Side effects and mutable function arguments
When using multithreading, you should be careful to use functions that are not https://en.wikipedia.org/wiki/Pure_function [clean], as you may get an incorrect answer. For example, functions that have name ending with !
, modify their arguments by default and therefore are not pure.
@threadcall
External libraries, for example, called via 'ccall`, create problems for the task-based I/O mechanism used in Julia. If the C library performs a blocking operation, the Julia scheduler cannot perform any other tasks until the call is completed. (The exception is calling custom C code with a Julia callback, which can give up control, or C code with a call to the jl_yield()
function, which is the equivalent yield
in C.)
The macro @threadcall
allows you to avoid suspending execution in such situations. It assigns the execution of the C function in a separate thread. For this purpose, a pool of four threads is used by default. The size of the thread pool is controlled by the environment variable UV_THREADPOOL_SIZE'. When using this macro, when the requesting task (in the main Julia event loop) is waiting for a free thread or performs a function in an available thread, it gives control to other tasks. In this case, the `@threadcall
does not terminate until the execution is completed. Therefore, from the user’s point of view, this macro is a blocking call, just like other Julia APIs.
It is extremely important that the called function does not make a Julia callback, as this will cause the program to crash.
It is possible that in future versions of Julia the macro @threadcall
will be deleted or changed.
Warnings
Currently, most operations in the Julia runtime and standard libraries provide thread safety, provided that user code does not contain data races. However, thread support is still unstable in some areas. Multithreaded programming inevitably involves many difficulties, so if a program using threads behaves in an unusual or undesirable way (for example, crashes or produces unexpected results), suspicion should primarily fall on interactions between threads.
When using streams in Julia, you need to consider a number of specific limitations and risks.
-
When using a collection type from Base in multiple threads simultaneously, if at least one thread modifies the collection (common examples:
push!`for arrays or inserting elements into a `Dict
), manual locking must be used. -
Assignment of execution in '@spawn` is non-deterministic and should not be relied upon.
-
Using tasks that rely on computing resources without allocating memory can prevent garbage collection from running on other threads where memory is allocated. In such cases, it may be necessary to manually insert a call to
GC.safepoint()
to ensure that it runs. This limitation will be eliminated in the future. -
Try not to use top-level operations such as
include
oreval
in parallel for defining types, methods, and modules. -
Please note that enabling threads may disrupt the operation of finalizers registered by the library. In order to freely use streams for them, additional ecosystem improvements may be necessary. For more information, see Safe use of finalizers.
Migration of tasks
After a task has started running in a certain thread, it can move to another thread if it outputs a value.
Such tasks can be started using @spawn
or @threads
, although the schedule parameter :static
for @threads
freezes the thread ID.
This means that in most cases threadid()
should not be considered as a constant within a task, which means it should not be used for indexing buffers or stateful objects into a vector.
Compatibility: Julia 1.7
Migration of tasks appeared in version Julia 1.7. Before that, tasks always remained in the same thread in which they were started. |
Safe use of finalizers
Since finalizers can interrupt the execution of any code, special care must be taken when interacting with any global states. Unfortunately, changes to the global state are the main reason for their use (it usually doesn’t make sense to use a pure function as a finalizer). Therefore, a difficult situation arises. There are several approaches to solving this problem.
-
In single-threaded mode, you can call the internal function
jl_gc_enable_finalizers
in the code. in the C language, so that the launch of finalizers is not assigned in a mission-critical region. This is used in a number of internal functions (for example, in our C locks) to prevent recursion when performing certain operations (incremental package loading, code generation, etc.). The combination of blocking and this flag allows you to make finalizers secure. -
Another strategy implemented in some elements of the Base module is to explicitly delay the launch of the finalizer until it can acquire a lock without recursion. The following example shows the application of this strategy for Distributed.finalize_ref`.
function finalize_ref(r::AbstractRemoteRef)
if r.where > 0 # Проверяет, не запущен ли уже финализатор if islocked(client_refs) || !trylock(client_refs) # Задержка запуска финализатора, если невозможно получить блокировку finalizer(finalize_ref, r) return nothing end try # За `lock` всегда должна следовать `try` if r.where > 0 # Здесь нужна повторная проверка # Здесь должна быть фактическая очистка r.where = 0 end finally unlock(client_refs) end end nothing end
3. A related third strategy is to use a yield-free queue. We don't currently have a lock-free queue implemented in Base, but `Base.IntrusiveLinkedListSynchronized{T}` is suitable. This can frequently be a good strategy to use for code with event loops. For example, this strategy is employed by `Gtk.jl` to manage lifetime ref-counting. In this approach, we don't do any explicit work inside the `finalizer`, and instead add it to a queue to run at a safer time. In fact, Julia's task scheduler already uses this, so defining the finalizer as `x -> @spawn do_cleanup(x)` is one example of this approach. Note however that this doesn't control which thread `do_cleanup` runs on, so `do_cleanup` would still need to acquire a lock. That doesn't need to be true if you implement your own queue, as you can explicitly only drain that queue from your thread.