Engee documentation

Proper maintenance of multithreaded locks

The following strategies ensure that there are no deadlocks in the code (usually by observing the 4th Coffman condition: cyclic waiting).

  1. The code needs to be structured in such a way that you only need to get one lock at a time.

  2. Shared locks should always be obtained in the same order as shown in the table below.

  3. It is necessary to avoid constructions where unlimited recursion is expected to be required.

Blockages

Below are all the locks that exist in the system, as well as the mechanisms for their use to avoid potential deadlocks (the Straus algorithm is unacceptable here).

The following locks are definitely final locks (1st level) and should not attempt to obtain any other lock.

  • A point of safety

    Note that this lock is implicitly obtained by JL_LOCK and JL_UNLOCK'. Use the `_NOGC options to eliminate this situation for level 1 locks.

    While holding this lock, the code should not perform any allocations or reach security points. Note that there are safety points when allocating, enabling or disabling garbage collection, entering or restoring exception frames, and accepting or releasing locks.

  • shared_map

  • finalizers

  • pagealloc

  • gc_perm_lock

  • flisp

  • jl_in_stackwalk (Win32)

  • ResourcePool<?>::mutex

  • RLST_mutex

  • llvmprintingmutex

  • jllockedstream::mutex

  • debuginfo_asyncsafe

  • inferencetimingmutex

  • ExecutionEngine::SessionLock

    flisp itself is already thread-safe. This lock protects only the pool `jl_ast_context_list_t'. Similarly, ResourcePool<?>::mutexes protects only the associated resource pool.

The following is the final lock (2nd level), which internally receives only locks of the 1st level (security point).

  • globalrootslock

  • Module->lock

  • JLDebuginfoPlugin::PluginMutex

  • newlyinferredmutex

The following is a level 3 lock that can only receive level 1 or 2 locks internally.

  • Method->writelock

  • typecache

The following is a 4th-level lock that can recursively receive only 1st, 2nd, or 3rd-level locks.

  • MethodTable->writelock

While holding the lock above this point, the Julia code cannot be invoked.

orc::ThreadSafeContext (TSCtx) locks occupy a special place in the lock hierarchy. They serve to protect the global non-thread-safe state of LLVM, but there can be any number of them. By default, all these locks can be considered level 5 locks when compared with the rest of the hierarchy. You should receive TSCtx only from the TSCtx JIT pool, and all locks on this TSCtx must be lifted before it is returned to the pool. If multiple TSCtx locks need to be obtained at the same time (due to recursive compilation), then they should be obtained in the same order in which the TSCtx locks were taken from the pool.

The following is a Level 5 lockdown:

  • JuliaOJIT::EmissionMutex

The following is a 6th-level lock that can recursively receive only lower-level locks.

  • codegen

  • jl_modules_mutex

The next lock is almost the root lock (of the penultimate level), which means that only the root lock can be held when trying to obtain it.

  • typeinf

    This option is perhaps one of the most difficult, since type inference can be invoked from many points.

    Currently, this lock is combined with the code generation lock because they call each other recursively.

The next lock synchronizes the I/O operation. Keep in mind that performing any I/O operation (for example, outputting warning messages or debugging information) while holding any other lock listed above can lead to dangerous and difficult-to-detect deadlocks. BE VERY CAREFUL!

  • iolock

  • Separate ThreadSynchronizers locks

    You can continue to hold them after releasing the iolock lock or receive them without it, but be very careful and do not try to get the iolock lock while holding these locks.

  • Blocking Libdl.LazyLibrary

The next lock is the root lock, which means that no other locks can be held when trying to obtain it.

  • toplevel

    This lock should be held when trying to perform a top-level action (for example, creating a new type or defining a new method): an attempt to obtain this lock inside an intermediate function will result in a deadlock condition.

    In addition, it is unclear whether any code can safely be executed in parallel with an arbitrary top-level expression. Therefore, it may be necessary for all threads to reach the safety point first.

Broken locks

The following locks do not work.

  • toplevel

    Doesn’t exist now > > fix: create it.

  • Module->lock

    It is vulnerable to deadlocks because there is no certainty that it is received sequentially. > Some operations (for example, 'import_module') do not have a lock. > > Fix: replace `jl_modules_mutex'?

  • loading.jl: require and register_root_module

    This file potentially has a lot of problems. > > Fix: Locks are required.

Common global data structures

Each such data structure requires locks, as they share a mutable global state. Here is the reverse list for the above lock priority list. It does not include the final resources of the 1st level, as they are too simple.

MethodTable modifications (def, cache): MethodTable->writelock

Type declarations: toplevel lock

Application of types: blocking typecache

Tables of global variables: Module->lock

Module serializer: toplevel lock

JIT and type inference: blocking code generation

Updates to MethodInstance/CodeInstance: Method->writelock, blocking code generation

  • These are set at creation and are immutable.:

    • specTypes

    • sparam_vals

    • def

    • owner

  • These are set using `jl_type_infer' (while holding the code generation lock):

    • cache

    • rettype

    • inferred * acceptable age

  • The inInference flag:

    • Optimization to quickly prevent repetition in the jl_type_infer function when it is already running

    • The actual state (of the inferred installation, then fptr) is protected by a code generation lock

  • Function pointers:

    • perform a transition once from NULL to a value while the code generation lock is held.

  • The code generator cache (contents of `functionObjectsDecls'):

    • can jump several times, but only while the code generation lock is held

    • You can use its old version or block new versions, so races are not dangerous unless the code references other data in the method instance (for example, rettype) and assumes that they are consistent, unless it holds the code generation lock.

LLVMContext: blocking code generation

Method: Method->writelock

  • root array (serializer and code generation)

  • TFUNC challenge/specialization/modification