Classification using a multilayer neural network¶

In this example, we will train a neural network on the classification task and place it inside the block Engee Function, which will allow us to easily transfer the trained algorithm from one model to another.

Neural network training¶

In this task, we will create a classification algorithm for data from a standard XOR problem. We will create a vector noisy with a set of input data x1 and x2, distributed from 0 to 1, and a vector truth - the result of the prediction we expect from the neural network (operation xor( (x1>0.5, x2>0.5)).

Let's do all the work in one cell, and then transfer the trained neural network to the canvas and give all the explanations of the code.

Pkg.add(["Statistics", "Flux", "Symbolics"])

   Resolving package versions...
  No Changes to `~/.project/Project.toml`
  No Changes to `~/.project/Manifest.toml`

# Pkg.add( "ChainPlots" ) # Использовать осторожно, от версии к версии бывают нестыковки

using Flux, Statistics, Random
Random.seed!( 2 )           # Обеспечим управляемость процесса обучения

# Архитектура модели: два полносвязанных слоя с небольшим количеством нейронов в каждом.
model = Chain(
      Dense(2 => 3, tanh),
      Dense(3 => 2),        # Сколько классов, столько и выходов нейросети
      softmax )

# Генерация входных данных
inputs = rand( Float32, 2, 1000 );                                   # 2×1000 Matrix{Float32}
truth = [ xor(col[1]>0.5, col[2]>0.5) for col in eachcol(inputs) ];  # Vector{Bool} из 1000 элементов

# Сохраним на будущее прогноз "необученной" модели
probs1 = model( inputs );

# Подготовка данных и обучение
targets = Flux.onehotbatch( truth, [true, false] );          # Раскладываем выходную переменную на логиты и создаем загрузчик данных
data = Flux.DataLoader( (inputs, targets), batchsize=64, shuffle=true );
opt_state = Flux.setup( Adam( 0.01 ), model );               # Настройки процедуры оптимизации и конкретная функция потерь
loss(ỹ, y) = Flux.crossentropy( ỹ, y )
accuracy(ỹ, y) = mean( Flux.onecold( ỹ ) .== Flux.onecold( y ))
loss_history, accuracy_history = [], []                      # Производим обучение, записывая результаты
for i in 1:5000
    Flux.train!( model, data, opt_state) do m, x, y
        loss( m(x), y ) # Функция потерь - ошибка на каждом элементе датасета
    end
    push!( loss_history, loss( model(inputs), targets ) ) # Запомним значение функции потерь и точности прогноза
    push!( accuracy_history, accuracy( model(inputs), targets ) )
end

# Прогноз модели после обучения
probs2 = model( inputs );

# Выведем график, по которому можно оценить качество обучения
gr()
plot( [ loss_history, accuracy_history], size=(300,200), label=["loss" "accuracy"], leg=:right )

Training results¶

We purposely kept the model predictions before and after training:

println( "Точность прогноза перед обучением: ", 100 * mean( (probs1[1,:] .> 0.5) .== truth ), "%" )
println( "Точность прогноза после обучения: ", 100 * mean( (probs2[1,:] .> 0.5) .== truth ), "%" )

Точность прогноза перед обучением: 50.4%
Точность прогноза после обучения: 97.1%

# Выведем график исходных данных
p_true = scatter( inputs[1,:], inputs[2,:], zcolor=truth, title="Исходные данные" );
p_raw = scatter( inputs[1,:], inputs[2,:], zcolor=probs1[1,:], title="Прогноз до обучения" );
p_done = scatter( inputs[1,:], inputs[2,:], zcolor=probs2[1,:], title="После обучения" );

plot(p_true, p_raw, p_done, layout=(1,3), size=(700,200), titlefont=font(9), ms=3.5, legend=false, cbar=false )

Transferring the neural network to the Engee Function block¶

We will generate Julia code for this neural network by substituting symbolic variables into its input and getting a symbolic expression instead of the output.

Let's put it right inside the block Engee Function to get a block that can be copied and pasted into any other model.

# Сгенерируем новое изображение для размещения на лицевой стороне блока
# (успех зависит от стабильности текущей версии ChainPlots)

# using ChainPlots
# p = plot( model,
#     titlefontsize=10, size=(300,300),
#     xticks=:none, series_annotations="", markersize=8,
#     markercolor="white", markerstrokewidth=4, linewidth=1 )
# savefig( p, "$(@__DIR__)/neural_net_block_mask.png");

# Создадим код нейросети

using Symbolics
@variables x1 x2
s = model( [x1, x2] );

# Загрузим модель, если она еще не открыта на холсте
if "neural_classification" ∉ getfield.(engee.get_all_models(), :name)
    engee.load( "$(@__DIR__)/neural_classification.engee");
end

# Шаблон кода, который мы поместим в блок Engee Function
code_strings = """
struct Block <: AbstractCausalComponent; end

# У нейросети два выхода: s[1] и s[2]
nn(x1, x2) = ($(s[1]), $(s[2]))

# Вычислим выходы нейросети и вернем результат классификации: 0 или 1
function (c::Block)(t::Real, x1, x2)
    # "Вероятность" каждого из классов
    c1, c2 = nn(x1, x2)
    # Вычисляем выходное значение по результатам классификации
    # - если больше вероятность c1, то будет выбран класс true (1)
    # - если больше – вероятность c2, то будет выбран класс false (2)
    return (c1 > c2) ? 1 : 0
end
"""

# Какой блок модели должен содержать код нейросети?
block_address = "neural_classification/Engee Function"
engee.set_param!( block_address, "StepMethodCode" => code_strings)

# Сохраним модель после изменения 
engee.save( "neural_classification", "$(@__DIR__)/neural_classification.engee"; force = true )

Model(
	name: neural_classification
	id: 995f8eb6-3102-488a-8dac-5040e7bd1526
)

The Address of the required block can be copied in the settings of this block, from the Path in model field on the Information panel.

Let's run the model and see the result:

model_data = engee.run( "neural_classification" );

# Подготовим выходные переменные
model_x1 = model_data["X1"].value;
model_x2 = model_data["X2"].value;
model_y = vec( hcat( model_data["Y"].value... ));

# Построим график
scatter( model_x1, model_x2, model_y, ms=2.5, msw=.5, leg=false, zcolor=model_y, c=:viridis,
         xlimits=(0,1), ylimits=(0,1), title="Прогноз от блока Engee Function", titlefont=font(10) )

Explanation of the code¶

# plot( model, size=(600, 350) )

Let us understand a few features of the process of training neural networks for classification, viz:

softmax feature,
one-hot coding,
creation of a data loader,
cross-entropy loss function,
calculation of prediction accuracy.

model = Chain(
      Dense(2 => 3, tanh),
      Dense(3 => 2),
      softmax )

Firstly, we can see that the task of our neural network is to determine to which class an object should belong.

Our neural network does not translate two input arguments into one output variable. The number of output variables is equal to the number of classes.

Note: the last FC layer has linear activation. After it is some "softmax layer ". Softmax is an operation on numbers, which is sometimes presented as activation. But the Flux package takes a different approach. What is the essence of the operation softmax?

Softmax translates output values of the neural network into probabilities. These are slightly more correct input data for the "binary cross-entropy" loess function (see below).

Suppose the neural network has N outputs in the output layer, and after it is the function softmax. It takes each input value x_i, exponentiates it to the degree of $x_i$ ($\epsilon_i = e^{x_i}$) and for each value $\epsilon_i$ computes the output $y_i = \frac{\epsilon_i}{\sum_{i=1}^{N}{\epsilon_i}}$. The output function divides each $\epsilon_i$ by the sum of all $\epsilon_i$ and at the output we get logits - strictly positive numbers, the sum of which is equal to 1.

target = Flux.onehotbatch( truth, [true, false] )

Our classification task is organised so that the network returns us either [1, 0], or [0, 1]. Why is that?

Imagine that the neural network is supposed to return you a class number, one out of ten. If the neural network is wrong and returns 2 instead of 1, MSE will return us the error (2-1)=1. If the neural network makes a mistake and instead of 1 returns 10, MSE will return us the error (10-1)=9, although this error is generally not a grosser error than the others. We need to base it on something else. The one hot coding allows us to get away from comparing class numbers and **compare the distribution of the network's "confidence" in a particular class.

But in the training sample the values of the output variable are still scalar: true and false. The function onehot translates them into vectors of two values: true to [1,0], and false to the vector [0,1].

data = Flux.DataLoader( (noisy, target) )

DataLoader - is a slightly more elegant way of feeding data into a neural network, for which you don't have to transpose the parameter vectors before feeding them into the network. It can also be passed the parameters shuffle = true to shuffle the sample at each epoch of training, and batchsize=64 to parallelise execution. Here is what the elements that this object feeds into the neural network look like:

data = Flux.DataLoader( (inputs, targets), batchsize=1 );
first( data )

(Float32[0.5859486; 0.54989403;;], Bool[0; 1;;])

As we have seen, the first element of the DataLoader object consists of two parts:

vector of features - scalar values fed to the input of the neural network,
vector of predictions - the required class encoded by the type one-hot.

loss(ỹ, y) = Flux.crossentropy( ỹ, y )

As yours has already said, there is a rule of thumb to not use mean squared error (MSE) in classification tasks. Neural networks learn very slowly with it, especially if the classification is multi-class. What do we do instead?

If the classification task is organised as in our example, cross-entropy (crossentropy) or its binary variant (binarycrossentropy) is often used as a loss function if there are only two classes. Sometimes the operation softmax is excluded from the neural network for the sake of speeding up its work, then you can specify that it should be executed in the loss function by specifying logitcrossentropy or logitbinarycrossentropy.

accuracy(ỹ, y) = mean( Flux.onecold( ỹ ) .== Flux.onecold( y ))

We keep the prediction accuracy (accuracy) - the percentage of correctly guessed values. The function onecold performs the inverse operation to onehot. The onehot-coding operation finds the largest element in the vector and represents it as number 1 in the output vector, where all other positions are equal to 0. In its turn, the onecold operation finds the largest element in the input vector and outputs one value - the class label corresponding to this element (or an ordinal number if no labels are specified).

This function can be defined much more simply, but high-level functions usually avoid many errors or at least produce more valuable error messages.

Not every run leads to a good result, so we set some specific seed. at the beginning of this example. It can be useful to automatically train several models with different initialisations and select the best one. Or, if training is planned to be done very often and on a slightly different sample (as in the case of digital twin), it is better to spend more time creating a more robust training procedure. For example, set up sinusoidal control of the learning rate or add batch-normalisation.

Conclusion¶

We trained a neural network for classification and placed it on the canvas as another block within the model.

The code of the training procedure is very concise, it can be reduced to 7 lines. The code of the neural network inside the block was generated automatically.