Engee documentation

Instrumentation of Julia using DTrace and bpftrace

The DTrace and bpftrace tools provide lightweight process instrumentation. Instrumentation can be enabled and disabled during the process, and its costs are minimal.

Compatibility: Julia 1.8

Probe support was added in Julia 1.8.

This document is intended for Linux, but the content should be mostly true for Mac OS/Darwin and FreeBSD.

Enabling support

On Linux, install the systemtap package with the dtrace version and create a Make.user file with the following contents:

WITH_DTRACE=1

to enable the USDT probes.

Check

> readelf -n usr/lib/libjulia-internal.so.1

Displaying notes found in: .note.gnu.build-id
  Owner                Data size  Description
  GNU                  0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 57161002f35548772a87418d2385c284ceb3ead8

Displaying notes found in: .note.stapsdt
  Owner                Data size  Description
  stapsdt              0x00000029 NT_STAPSDT (SystemTap probe descriptors)
    Provider: julia
    Name: gc__begin
    Location: 0x000000000013213e, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cac
    Arguments:
  stapsdt              0x00000032 NT_STAPSDT (SystemTap probe descriptors)
    Provider: julia
    Name: gc__stop_the_world
    Location: 0x0000000000132144, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cae
    Arguments:
  stapsdt              0x00000027 NT_STAPSDT (SystemTap probe descriptors)
    Provider: julia
    Name: gc__end
    Location: 0x000000000013214a, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cb0
    Arguments:
  stapsdt              0x0000002d NT_STAPSDT (SystemTap probe descriptors)
    Provider: julia
    Name: gc__finalizer
    Location: 0x0000000000132150, Base: 0x00000000002bb4da, Semaphore: 0x0000000000346cb2
    Arguments:

Adding probes to libjulia

The probes are declared in the dtraces format in the file src/uprobes.d. The generated header file is included in the file src/julia_internal.h: when adding probes, it is necessary to provide an empty implementation in it.

The header will contain the *_ENABLED semaphore and the actual probe call. If it takes a lot of resources to calculate the probe arguments, you should first check if the probe is enabled, and then calculate the arguments and invoke the probe.

  if (JL_PROBE_{PROBE}_ENABLED())
    auto expensive_arg = ...;
    JL_PROBE_{PROBE}(expensive_arg);

If the probe has no arguments, it is better not to enable the semaphore check. When the USDT probes are turned on, the semaphore puts a load on the memory, regardless of whether the probe is turned on.

#define JL_PROBE_GC_BEGIN_ENABLED() __builtin_expect (julia_gc__begin_semaphore, 0)
__extension__ extern unsigned short julia_gc__begin_semaphore __attribute__ ((unused)) __attribute__ ((section (".probes")));

Since the probe itself is an idle operation, control will be transferred to the probe handler.

Available probes

Garbage Collection probes

  1. julia:gc__begin: Garbage collection begins in one of the threads and initiates program execution suspension.

  2. julia:gc__stop_the_world: All threads have reached a safe point, and garbage collection is underway.

  3. julia:gc__mark__begin: the beginning of the marking phase.

  4. julia:gc__mark_end(scanned_bytes, perm_scanned): Marking phase is completed.

  5. julia:gc__sweep_begin(full): start of the cleaning phase.

  6. julia:gc__sweep_end: The cleaning phase is completed.

  7. julia:gc__end: Garbage collection is completed, other threads continue to run.

  8. julia:gc__finalizer: The initial garbage collection thread has completed the execution of finalizers.

Task Environment Probes

  1. julia:rt__run__task(task): Switches to the task task in the current thread.

  2. julia:rt__pause__task(task): Switches from the task task in the current thread.

  3. julia:rt__new__task(parent, child): the parent task has created the child task in the current thread.

  4. julia:rt__start__task(task): The task is running for the first time with a new stack.

  5. julia:rt__finish__task(task): The task has completed and will no longer be executed.

  6. julia:rt__start__process__events(task): The task task has started processing libuv events.

  7. julia:rt__finish__process__events(task): The `task' task has completed processing libuv events.

Task Queue Probes

  1. julia:rt__taskq__insert(ptls, task): The ptls thread tried to add the task task to the multiq PARTR.

  2. julia:rt__taskq__get(ptls, task): The 'ptls` thread extracted the task task from PARTR multiq.

Flow stop and wake-up probes

  1. julia:rt__sleep__check__wake(ptls, old_state): the stream (PTLS ptls) is waking up, the previous state is `old_state'.

  2. julia:rt__sleep__check__wakeup(ptls): The stream (PTLS ptls) has awakened itself.

  3. julia:rt__sleep__check__sleep(ptls): The thread (PTLS ptls) is trying to switch to an inactive state.

  4. julia:rt__sleep__check__taskq__wake(ptls): The thread (PTLS ptls) failed to switch to an inactive state due to tasks in PARTR multiq.

  5. julia:rt__sleep__check__task__wake(ptls): The thread (PTLS ptls) failed to switch to an inactive state due to tasks in the Base work queue.

  6. julia:rt__sleep__check__uv__wake(ptls): The thread (PTLS ptls) failed to enter an inactive state due to libuv waking up.

Examples of using probes

Delay in suspending program execution during garbage collection

The 'contrib/gc_stop_the_world_latency.bt` provides an example of the bpftrace script, which creates a histogram of the delay in reaching the safe point for all threads.

Run this Julia code with the julia -t 2 parameter.

using Base.Threads

fib(x) = x <= 1 ? 1 : fib(x-1) + fib(x-2)

beaver = @spawn begin
    while true
        fib(30)
        # Эта заданная вручную безопасная точка необходима, иначе данный
        # цикл может никогда не подвергаться сборке мусора.
        GC.safepoint()
    end
end

allocator = @spawn begin
    while true
        zeros(1024)
    end
end

wait(allocator)

In the second terminal:

> sudo contrib/bpftrace/gc_stop_the_world_latency.bt
Attaching 4 probes...
Tracing Julia GC Stop-The-World Latency... Hit Ctrl-C to end.
^C


@usecs[1743412]:
[4, 8)               971 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[8, 16)              837 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        |
[16, 32)             129 |@@@@@@                                              |
[32, 64)              10 |                                                    |
[64, 128)              1 |                                                    |

We can see the distribution of the delay in the program suspension phase for the Julia process.

Task Generation Monitor

Sometimes it can be useful to know when a task spawns other tasks. This is easily done using the `rt__new__task'. The first argument of the probe, `parent', is an existing task that creates another task. Thus, if you know the address of the task that needs to be tracked, you can easily find out which tasks it has generated. Let’s see how it’s done. First, let’s start the Julia session and get the PID and the address of the REPL task.

> julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

1> getpid()
997825

2> current_task()
Task (runnable) @0x00007f524d088010

Now we can run bpftrace and make the rt__new__task probe monitor only the parent task.

sudo bpftrace -p 997825 -e 'usdt:usr/lib/libjulia-internal.so:julia:rt__new__task /arg0==0x00007f524d088010/{ printf("Task: %x\n", arg0); }'

(Note that in the above code, the first argument, i.e. parent, is arg0.)

If we generate one task:

@async 1+1

then we will see that it has been created.:

Task: 4d088010

However, if we generate some more tasks from this new task:

@async for i in 1:10
   @async 1+1
end

using bpftrace we will see only one task.:

Task: 4d088010

And it’s still the same task that we’re tracking! Of course, we can easily remove this filter to see all the created tasks.

sudo bpftrace -p 997825 -e 'usdt:usr/lib/libjulia-internal.so:julia:rt__new__task { printf("Task: %x\n", arg0); }'

Task: 4d088010
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290
Task: 4dc4e290

We see our root task and the newly spawned task as the parent for ten more new tasks.

Detecting a mass awakening

Sometimes tasks face the problem of mass awakening (thundering herd): when an inactive task execution environment receives a request to perform any work, all threads can be awakened, even if only a part of them would be enough. Because of this, there may be an additional delay, as CPU resources are consumed in waking up all threads (and stopping them due to lack of sufficient work).

Using bpftrace you can easily illustrate this problem. First, in one terminal, we run Julia with several threads (six in this example) and get the process PID.

> julia -t 6
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.2 (2021-07-14)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

1> getpid()
997825

In another terminal, we run bpftrace to monitor the process, in particular using the probe rt__sleep__check__wake.

sudo bpftrace -p 997825 -e 'usdt:usr/lib/libjulia-internal.so:julia:rt__sleep__check__wake { printf("Thread wake up! %x\n", arg0); }'

Now we will create and complete one task in Julia.

Threads.@spawn 1+1

In 'bpftrace`, the output will be something like this.

Thread wake up! 3f926100
Thread wake up! 3ebd5140
Thread wake up! 3f876130
Thread wake up! 3e2711a0
Thread wake up! 3e312190

Although we have created only one task (which can be handled by only one thread at a time), all threads have been awakened! In the future, perhaps this problem will be solved and the runtime will wake up only one thread (or not at all if there is enough spawning thread to complete the task).

Task monitoring using BPFnative.jl

BPFnative.jl can be connected to USDT sensing points in the same way as bpftrace. https://github.com/jpsamaroo/BPFnative.jl/blob/master/examples/task-runtime.jl [Here] you can find a demonstration of monitoring the task execution environment, garbage collection, as well as threads going to sleep and waking up.

Notes on using bpftrace

An example of a probe in the bpftrace format looks like this.

usdt:usr/lib/libjulia-internal.so:julia:gc__begin
{
  @start[pid] = nsecs;
}

The probe declaration takes the type usdt, followed by either the library path or the PID, the vendor name julia, and the probe name `gc__begin'. Note that the relative path to is used here. `libjulia-internal.so but in a production environment, an absolute path may be required.