Engee documentation
Notebook

Building and training a neural network for handwritten digit recognition

In this example we will consider data processing and training of a neural network model for image classification. The MNIST dataset, which contains 70000 labelled images of handwritten digits, is chosen as a set of observation objects. The example will use a .csv file in which the images are expanded as tabular data containing brightness values for each pixel.

Connection of libraries for data processing:

In [ ]:
Pkg.add(["Colors", "CSV", "Flux"])
In [ ]:
using CSV, DataFrames

Loading data into a variable:

In [ ]:
df = DataFrame(CSV.File("$(@__DIR__)/mnist_784.csv")); 

Conclusion the first five lines of the dataframe:

In [ ]:
first(df,5)
Out[0]:

5 rows × 785 columns (omitted printing of 775 columns)

pixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9pixel10
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
10000000000
20000000000
30000000000
40000000000
50000000000

Conclusion the first five rows and the last column of the dataframe, which tells you what class the object of observation belongs to:

In [ ]:
df[1:5,780:785]
Out[0]:

5 rows × 6 columns

pixel780pixel781pixel782pixel783pixel784class
Int64Int64Int64Int64Int64Int64
1000005
2000000
3000004
4000001
5000009

Dividing the data set into training and test samples at a ratio of 8 to 2:

In [ ]:
X_train, y_train = Matrix(df[1:56000,1:784]), df[1:56000,785]
X_test, y__test = Matrix(df[56001:end,1:784]), df[56001:end,785]
Out[0]:
([0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], [1, 8, 5, 9, 8, 0, 3, 1, 3, 2  …  7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

Conversion of samples into formats acceptable for neural network processing:

In [ ]:
X_train, X_test = convert(Matrix{Float32}, X_train), convert(Matrix{Float32}, X_test)
y_train, y__test = convert(Vector{Float32}, y_train), convert(Vector{Float32}, y__test)
Out[0]:
(Float32[5.0, 0.0, 4.0, 1.0, 9.0, 2.0, 1.0, 3.0, 1.0, 4.0  …  4.0, 0.0, 9.0, 0.0, 6.0, 1.0, 2.0, 2.0, 3.0, 3.0], Float32[1.0, 8.0, 5.0, 9.0, 8.0, 0.0, 3.0, 1.0, 3.0, 2.0  …  7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0])

Connecting a library for data visualisation:

In [ ]:
using Plots

Displaying an object and its class:

In [ ]:
test_img = Vector(df[60000,1:784])
test_img = (reshape(test_img, 28, 28)) / 256
using Colors
println("Класс объекта: ", df[60000,785])
plot(Gray.(test_img))
Класс объекта: 8
Out[0]:

Final data transformation for neural network processing:

In [ ]:
X_train, X_test = X_train', X_test'
y_train, y__test = y_train', y__test'
Out[0]:
(Float32[5.0 0.0 … 3.0 3.0], Float32[1.0 8.0 … 5.0 6.0])

Connecting a machine learning library:

In [ ]:
using Flux
using Flux: train!

Defining the structure of a neural network:

In [ ]:
model = Chain(
    Dense(784, 15,elu),
    Dense(15, 10,sigmoid),
    softmax
)
Out[0]:
Chain(
  Dense(784 => 15, elu),                # 11_775 parameters
  Dense(15 => 10, σ),                   # 160 parameters
  NNlib.softmax,
)                   # Total: 4 arrays, 11_935 parameters, 46.871 KiB.

Test result of recognition (before training the model):

In [ ]:
predict = model(X_test)
Out[0]:
10×14000 Matrix{Float32}:
 0.0729828  0.133842   0.0592658  …  0.157915   0.133842   0.0600273
 0.0729828  0.133842   0.0592658     0.0580938  0.133842   0.0588685
 0.198388   0.133842   0.161101      0.0768005  0.133842   0.0588685
 0.0729828  0.0492376  0.0592658     0.0580938  0.0492376  0.160021
 0.0729828  0.0492376  0.0592658     0.0580938  0.0492376  0.0588685
 0.154932   0.0492376  0.161101   …  0.0591634  0.0492376  0.160021
 0.0729923  0.133842   0.0592659     0.157915   0.133842   0.160021
 0.0729828  0.133842   0.0592658     0.0580938  0.133842   0.0588747
 0.0729855  0.133842   0.161101      0.157915   0.133842   0.0644081
 0.135789   0.0492376  0.161101      0.157915   0.0492376  0.160021

Determination of training parameters:

In [ ]:
loss(x, y) = Flux.mse(model(x), reshape(Flux.onehotbatch(y_train,0:9), 10, 56000)) # функция потерь

ps = Flux.params(model) # указание на параметры корректируемые в процессе обучения

opt = Adam(0.01) # выбор оптимизатора
Out[0]:
Adam(0.01, (0.9, 0.999), 1.0e-8, IdDict{Any, Any}())

Defining a function to calculate the accuracy of the model:

In [ ]:
function accuracy()
    correct = 0
    for index in 1:length(y__test)
        probs = model(Flux.unsqueeze(X_test[:,index],dims=3))
        predicted_digit = argmax(probs)[1]-1
        if predicted_digit == y__test'[index]
            correct +=1
        end
    end
    return correct/length(y__test')
end
Out[0]:
accuracy (generic function with 1 method)

Iterative model training process:

In [ ]:
loss_history = [] # определение пустого массива для записи функции потерь

epochs = 50 # определение количества шагов обучения модели

for epoch in 1:epochs # повторение обучения модели на каждом шаге
    train!(loss, ps, [(X_train, y_train)], opt) # обучающая функция, корректирующая параметры модели
    train_loss = loss(X_train, y_train) # расчёт функции потерь на текущем шаге
    push!(loss_history, train_loss) # запись функции потерь
    acc = accuracy() * 100
    if epoch == 1 # условие для отображения функции потерь на каждом шаге обучения
        println("Epoch = $epoch : Training Loss = $train_loss, Model Accuracy = $acc %");
    elseif epoch % 1 == 0
        println("Epoch = $epoch : Training Loss = $train_loss, Model Accuracy = $acc %");
    end
end
Epoch = 1 : Training Loss = 0.08882678, Model Accuracy = 21.357142857142858 %
Epoch = 2 : Training Loss = 0.08633855, Model Accuracy = 22.87857142857143 %
Epoch = 3 : Training Loss = 0.08413743, Model Accuracy = 29.114285714285714 %
Epoch = 4 : Training Loss = 0.082765914, Model Accuracy = 34.31428571428572 %
Epoch = 5 : Training Loss = 0.08176625, Model Accuracy = 36.614285714285714 %
Epoch = 6 : Training Loss = 0.08065751, Model Accuracy = 38.121428571428574 %
Epoch = 7 : Training Loss = 0.079435244, Model Accuracy = 45.050000000000004 %
Epoch = 8 : Training Loss = 0.078252606, Model Accuracy = 55.16428571428571 %
Epoch = 9 : Training Loss = 0.07740078, Model Accuracy = 61.79285714285714 %
Epoch = 10 : Training Loss = 0.07679748, Model Accuracy = 66.17857142857143 %
Epoch = 11 : Training Loss = 0.07649854, Model Accuracy = 69.19999999999999 %
Epoch = 12 : Training Loss = 0.07618407, Model Accuracy = 70.92857142857143 %
Epoch = 13 : Training Loss = 0.07574901, Model Accuracy = 72.02142857142857 %
Epoch = 14 : Training Loss = 0.075383395, Model Accuracy = 72.48571428571428 %
Epoch = 15 : Training Loss = 0.07500039, Model Accuracy = 73.20714285714286 %
Epoch = 16 : Training Loss = 0.07469048, Model Accuracy = 73.08571428571429 %
Epoch = 17 : Training Loss = 0.07434183, Model Accuracy = 73.94285714285715 %
Epoch = 18 : Training Loss = 0.07392979, Model Accuracy = 75.37857142857143 %
Epoch = 19 : Training Loss = 0.073610745, Model Accuracy = 76.27142857142857 %
Epoch = 20 : Training Loss = 0.07343323, Model Accuracy = 76.24285714285715 %
Epoch = 21 : Training Loss = 0.07315283, Model Accuracy = 76.32857142857142 %
Epoch = 22 : Training Loss = 0.07284213, Model Accuracy = 76.21428571428571 %
Epoch = 23 : Training Loss = 0.07260198, Model Accuracy = 75.86428571428571 %
Epoch = 24 : Training Loss = 0.0723972, Model Accuracy = 76.44285714285715 %
Epoch = 25 : Training Loss = 0.07218366, Model Accuracy = 78.10000000000001 %
Epoch = 26 : Training Loss = 0.072051615, Model Accuracy = 79.37857142857143 %
Epoch = 27 : Training Loss = 0.07194763, Model Accuracy = 80.12142857142858 %
Epoch = 28 : Training Loss = 0.07184025, Model Accuracy = 80.92857142857143 %
Epoch = 29 : Training Loss = 0.07170713, Model Accuracy = 81.10714285714286 %
Epoch = 30 : Training Loss = 0.0715578, Model Accuracy = 81.35 %
Epoch = 31 : Training Loss = 0.071458206, Model Accuracy = 81.42857142857143 %
Epoch = 32 : Training Loss = 0.071334094, Model Accuracy = 81.78571428571428 %
Epoch = 33 : Training Loss = 0.07123414, Model Accuracy = 82.19285714285715 %
Epoch = 34 : Training Loss = 0.071115755, Model Accuracy = 82.39285714285714 %
Epoch = 35 : Training Loss = 0.07096117, Model Accuracy = 82.54285714285714 %
Epoch = 36 : Training Loss = 0.07084565, Model Accuracy = 82.62142857142857 %
Epoch = 37 : Training Loss = 0.070747726, Model Accuracy = 82.8 %
Epoch = 38 : Training Loss = 0.07065126, Model Accuracy = 83.22857142857143 %
Epoch = 39 : Training Loss = 0.07053765, Model Accuracy = 83.71428571428572 %
Epoch = 40 : Training Loss = 0.07047585, Model Accuracy = 83.82857142857144 %
Epoch = 41 : Training Loss = 0.070413895, Model Accuracy = 83.76428571428572 %
Epoch = 42 : Training Loss = 0.07036532, Model Accuracy = 83.85714285714285 %
Epoch = 43 : Training Loss = 0.07026137, Model Accuracy = 84.17857142857143 %
Epoch = 44 : Training Loss = 0.07019601, Model Accuracy = 84.37142857142858 %
Epoch = 45 : Training Loss = 0.070140265, Model Accuracy = 84.17857142857143 %
Epoch = 46 : Training Loss = 0.07010393, Model Accuracy = 83.99285714285715 %
Epoch = 47 : Training Loss = 0.07005022, Model Accuracy = 83.96428571428571 %
Epoch = 48 : Training Loss = 0.06997585, Model Accuracy = 84.16428571428571 %
Epoch = 49 : Training Loss = 0.06992216, Model Accuracy = 84.42857142857143 %
Epoch = 50 : Training Loss = 0.06987861, Model Accuracy = 85.0142857142857 %

Visualisation of the change in the loss function at each training step:

In [ ]:
plot((1:epochs), loss_history, title="Изменение функции потерь", xlabel="Шаг обучения", ylabel="Функция потерь")
Out[0]:

Displaying results:

In [ ]:
number = 13000
test_img = Vector(df[56000+number,1:784])
test_img = (reshape(test_img, 28, 28)) / 256
using Colors
result = model(X_test[:,number])
println("Известный класс объекта: ", df[56000+number,785], "\n  ", "Вектор, характеризующий класс объекта: ", result)
plot(Gray.(test_img))
Известный класс объекта: 0
  Вектор, характеризующий класс объекта: Float32[0.23196934, 0.08533675, 0.08533675, 0.08533675, 0.08533675, 0.08533675, 0.08533675, 0.08533675, 0.08533675, 0.08533675]
Out[0]:

Conclusion

In this example, the pixel brightness data was preprocessed and the neural network architecture, optimiser parameters and loss function were defined. The model was trained and showed reasonably accurate, but not perfect class partitioning. To improve the recognition quality, the neural network can be modified by changing the architecture of layers and increasing the training sample.