Engee documentation
Notebook

Advanced block development based on Engee Function

Project Description

We have already learned how to develop our blocks using Engee Function.

But what if the algorithm to be implemented is complex and cumbersome? Problems arise immediately:

  • The Engee Function code turns out to be bloated and it's just hard to read.

  • It is unclear how to test such code. If an error occurs, it is difficult to localize it.

Solving these problems is not "rocket science", but routine work for programmers. So let's be them for a while!

This is how we will solve the problems of cumbersome code.:

  1. Let's separate the algorithm into a separate module

  2. We will cover the module with tests

  3. Let's increase the stability of the code

As an example, consider an algorithm for finding the distance between two sets of observations. And to work, we will need the following packages:

In [ ]:
import Pkg
Pkg.add(["LinearAlgebra", "Test", "BenchmarkTools"])
   Resolving package versions...
  No Changes to `~/.project/Project.toml`
  No Changes to `~/.project/Manifest.toml`

Code removal to the module

A module in the Julia context is code that is enclosed in a separate namespace. This allows you to make variables inside this module "invisible" outside it. To learn more about the advantages of the modules, please refer to help.

Let's look at the code of our algorithm enclosed in the module:

In [ ]:
;cat PDIST2.jl
module PDIST2

function EF_pdist2(X::Matrix{Float64}, Y::Matrix{Float64}; metric::String="euclidean")
    m, n = size(X)
    p, n2 = size(Y)
    n == n2 || throw(DimensionMismatch("Number of columns in X and Y must match"))

    if metric == "euclidean"
        XX = sum(X.^2, dims=2)
        YY = sum(Y.^2, dims=2)
        D = XX .+ YY' .- 2 .* (X * Y')
        return sqrt.(max.(D, 0))
    
    elseif metric == "squaredeuclidean"
        XX = sum(X.^2, dims=2)
        YY = sum(Y.^2, dims=2)
        D = XX .+ YY' .- 2 .* (X * Y')
        return max.(D, 0)
    
    elseif metric == "manhattan"
        D = zeros(m, p)
        for j in 1:p
            for i in 1:m
                D[i, j] = sum(abs.(X[i, :] .- Y[j, :]))
            end
        end
        return D
    
    elseif metric == "cosine"
        XX = sqrt.(sum(X.^2, dims=2))
        YY = sqrt.(sum(Y.^2, dims=2))
        norms = XX .* YY'
        XY = X * Y'
        
        # Handle division by zero: set invalid entries to 0, then correct cases where both vectors are zero
        sim = zeros(size(XY))
        valid = norms .> 0
        sim[valid] .= XY[valid] ./ norms[valid]
        
        # Identify pairs where both vectors are zero (cosine similarity = 1)
        both_zero = (XX .== 0) .& (YY' .== 0)
        sim[both_zero] .= 1
        
        return 1 .- sim
    
    else
        throw(ArgumentError("Unknown metric: $metric. Supported metrics are 'euclidean', 'squaredeuclidean', 'manhattan', 'cosine'"))
    end
end


end

Testing

The purpose of testing is to prove that the code works correctly, and all exceptions are caught.

The tests are written using the [Test.jl] package(https://engee.com/helpcenter/stable/ru/julia/stdlib/Test.html )

Let's create three simple tests:

  • Simple function call

  • Processing of different dimensions

  • Correctness of calculating distances between identical matrices

Let's put these tests in a test suite, which is defined as:

@testset <setname> begin
<tests>
end

A special feature of the tests in the Test package.The jl is that testing macros verify the truth of a certain expression. Let's look at an example:

In [ ]:
using Test, LinearAlgebra

demoroot = @__DIR__

include("PDIST2.jl")

X = rand(3,3)
Y = rand(3,3)
Z = PDIST2.EF_pdist2(X,Y);
@testset "EF_pdist2 tests" begin
    @test_nowarn PDIST2.EF_pdist2(X,Y);
    @test_throws DimensionMismatch PDIST2.EF_pdist2(rand(3,3),rand(2,2))
    @test iszero(diag(PDIST2.EF_pdist2(X,X)))
end
Test Summary:   | Pass  Total  Time
EF_pdist2 tests |    3      3  0.8s
Out[0]:
Test.DefaultTestSet("EF_pdist2 tests", Any[], 3, false, false, true, 1.754979865285884e9, 1.754979866072697e9, false, "In[3]")

Performance evaluation

To evaluate the overall performance of the code, you need to measure several indicators:

  • Speed of execution

  • Amount of allocated memory

  • Number of allocations (specific to Julia)

To do this, we will use the Benchmark.jl package. Its advantage lies in the fact that you can either fine-tune the experiment or start measuring immediately.:

In [ ]:
using BenchmarkTools

@benchmark PDIST2.EF_pdist2(rand(10,10),rand(10,10))
Out[0]:
BenchmarkTools.Trial: 10000 samples with 8 evaluations per sample.
 Range (minmax):  3.033 μs 9.495 ms   GC (min … max):  0.00% … 99.82%
 Time  (median):     7.843 μs               GC (median):     0.00%
 Time  (mean ± σ):   9.099 μs ± 95.130 μs   GC (mean ± σ):  10.42% ±  1.00%

                      ▁▄▇██▄▂▁                               
  ▂▃▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▅████████▇▆▄▃▃▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁ ▂
  3.03 μs        Histogram: frequency by time        14.2 μs <

 Memory estimate: 6.77 KiB, allocs estimate: 18.

The following measurements are important to us: Time and Memory estimate.

Time is the execution time of a single run. Since @benchmark runs several runs, we get a set of such measurements and can apply statistical processing to it. Its results are shown after running the @benchmark macro, as shown above.

Memory Estimate will show the amount of allocated memory. Let's see how the execution time and the amount of memory increases as the input volume increases.:

In [ ]:
matrix_size = 80 # @param {type:"slider",min:10,max:100,step:10}
@benchmark PDIST2.EF_pdist2(rand(matrix_size,matrix_size),rand(matrix_size,matrix_size))
Out[0]:
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (minmax):   96.698 μs 23.746 ms   GC (min … max):  0.00% … 98.90%
 Time  (median):     116.496 μs                GC (median):     0.00%
 Time  (mean ± σ):   172.030 μs ± 430.871 μs   GC (mean ± σ):  13.25% ±  6.23%

  ▁▇█▆▆▄▂▁                                            ▁▃▃▂▂▁   ▂
  ██████████▇▇▆▆▄▃▁▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▃▆▇█▇▄▄▁▁▁▃▅▇▇▆▆▄▇███████▇ █
  96.7 μs       Histogram: log(frequency) by time        494 μs <

 Memory estimate: 352.01 KiB, allocs estimate: 25.

Failure tolerance

The code you create must be error-resistant. This requires working with exceptions.

An exception, in a general sense, is any error that can be handled. The main difference from the usual errors (syntactic ones) is that the user can generate exceptions himself. Let's look at the code of our pdist2 function:

 m, n = size(X)
 p, n2 = size(Y)
 n == n2 || throw(DimensionMismatch("Number of columns in X and Y must match"))

If the dimensions of the matrices do not match, then the throw function is called and an exception is thrown.

The exception can be handled using the construction

try
catch
end

A code call is placed in the try block. If this code throws an exception, the code from the catch block will be executed.

Using the library in the Engee Function

As an example, we will use the EF_dist_find model.:

image.png
In [ ]:
mdl = engee.open(joinpath(demoroot,"EF_dist_find.engee"));

Consider the PDIST2 block constructor:

include("/user/start/examples/base_simulation/advanced_block_development/PDIST2.jl")

mutable struct Block <: AbstractCausalComponent
cache::Matrix{Float64};
function Block()
    c = zeros(Float64,INPUT_SIGNAL_ATTRIBUTES[1].dimensions);
    info("Allocated $(Base.summarysize(c)) bytes for pdist2")
    new(c)
end

end

We include our module in the Engee Function, specifying the full path to its code.

Using the info() function, we will display the amount of memory allocated when creating the cache.

Consider the Step method of this block:

function (c::Block)(t::Real, in1, in2)    
    try
        c.cache = PDIST2.EF_pdist2(in1,in2);
    catch
        error("Matrix Dimensions should be equal!")
        stop_simulation()
    end

    return c.cache
end

Note that the function call from our module is wrapped in an exception handler. If our pdist2 function throws an exception, the error "Matrix Dimensions should be equal!" will appear in the diagnostic window, and the simulation will be stopped.

When opened, two 3x3 matrices with random numbers are created. Let's make sure that the model works.:

In [ ]:
engee.run(mdl)
Out[0]:
SimulationResult(
    "PDIST2.1" => WorkspaceArray{Matrix{Float64}}("EF_dist_find/PDIST2.1")

)

Conclusions

The project showed an approach to creating an Engee Function-based block, which makes debugging easier and improves the quality and reliability of the code.