Engee documentation
Notebook

Distributed computing

Engee provides facilities for implementing distributed computing through a library called Distributed. The tools in this library help you explore the possibilities of executing tasks in separate threads that may reside on different compute cores.

parallel_computing1.png

Importing a library for distributed computing:

In [ ]:
Pkg.add(["LinearAlgebra", "Distributed"])
In [ ]:
using Distributed

Use the nworkers function of the Distributed library to find out how many processes are currently available:

In [ ]:
nworkers()
Out[0]:
1

With addprocs you can add some number of worker processes:

In [ ]:
addprocs(2)
nworkers()
Out[0]:
2

Using pmap

One example of all available workflows is the use of the pmap function. The pmap function transforms a collection of c by applying f to each element with using available workflows and tasks.

In [ ]:
pmap(x -> x*2, [1,2,3])
Out[0]:
3-element Vector{Int64}:
 2
 4
 6

Any error can stop pmap. This may cause the specified function to not be applied to all elements of the collection. However, these errors can be handled with the on_error argument.

In [ ]:
pmap(x -> iseven(x) ? error("foo") : x, 1:4; on_error=identity)
Out[0]:
4-element Vector{Any}:
 1
  ErrorException("foo")
 3
  ErrorException("foo")

Errors can also be handled by returning values instead:

In [ ]:
even_or_zero = pmap(x->iseven(x) ? error("foo") : x, 1:4; on_error=ex->0)
Out[0]:
4-element Vector{Int64}:
 1
 0
 3
 0

Using the @distributed macro

The @distributed macro in the Julia programming language provides the ability to parallelise operations in a loop. It allows you to iterate the loop on different processes, which can speed up code execution, especially when working with large amounts of data.

An example of using the @distributed macro:

In [ ]:
@elapsed @sync @distributed for _ in 1:2
    sleep(2)
end
Out[0]:
2.450649919

In this case the loop iterations are executed in parallel, without using the macro the execution time of the loop would take 4 seconds.

Using the @everywhere macro

The @everywhere macro is used to execute code on all available processors in parallel calculations without having to explicitly specify each process.

This example will be executed on every processor available for parallel calculations and will output the eigenvalues of the random matrix A:

In [ ]:
@everywhere begin
    using LinearAlgebra
    A = rand()
    println((A))
end
0.7949660890678617
      From worker 2:	0.43047216493869567
      From worker 3:	0.6615365425334665

Conclusions:

This example demonstrated the use of the Distributed library to implement distributed computing. Examples of using the pmap function and the @distributed and @everywhere macros for parallel execution of tasks and operations in a loop were given.