Engee documentation
Notebook

Building and training a neural network for handwritten digit recognition

In this example, we will consider data processing and training based on a neural network model for image classification. The MNIST dataset was selected as a set of observational objects, which contains 70,000 labeled images of handwritten numbers. The example uses a .csv format file in which the images are expanded as tabular data containing brightness values for each pixel.

Connecting libraries for data processing:

In [ ]:
Pkg.add(["Colors", "CSV", "Flux", "Optimisers"])
In [ ]:
using CSV, DataFrames

Loading data into a variable:

In [ ]:
df = DataFrame(CSV.File("$(@__DIR__)/mnist_784.csv")); 

Conclusion of the first five lines of the dataframe:

In [ ]:
first(df,5)
Out[0]:
5×785 DataFrame
685 columns omitted
Rowpixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9pixel10pixel11pixel12pixel13pixel14pixel15pixel16pixel17pixel18pixel19pixel20pixel21pixel22pixel23pixel24pixel25pixel26pixel27pixel28pixel29pixel30pixel31pixel32pixel33pixel34pixel35pixel36pixel37pixel38pixel39pixel40pixel41pixel42pixel43pixel44pixel45pixel46pixel47pixel48pixel49pixel50pixel51pixel52pixel53pixel54pixel55pixel56pixel57pixel58pixel59pixel60pixel61pixel62pixel63pixel64pixel65pixel66pixel67pixel68pixel69pixel70pixel71pixel72pixel73pixel74pixel75pixel76pixel77pixel78pixel79pixel80pixel81pixel82pixel83pixel84pixel85pixel86pixel87pixel88pixel89pixel90pixel91pixel92pixel93pixel94pixel95pixel96pixel97pixel98pixel99pixel100
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
20000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
40000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
50000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Conclusion of the first five rows and the last column with data, which indicates which class the observation object belongs to.:

In [ ]:
df[1:5,780:785]
Out[0]:
5×6 DataFrame
Rowpixel780pixel781pixel782pixel783pixel784class
Int64Int64Int64Int64Int64Int64
1000005
2000000
3000004
4000001
5000009

Splitting the data set into a training and test sample in an 8 to 2 ratio:

In [ ]:
X_train, y_train = Matrix(df[1:56000,1:784]), df[1:56000,785]
X_test, y__test = Matrix(df[56001:end,1:784]), df[56001:end,785]
Out[0]:
([0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], [1, 8, 5, 9, 8, 0, 3, 1, 3, 2  …  7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

Converting samples to formats acceptable for neural network processing:

In [ ]:
X_train, X_test = convert(Matrix{Float32}, X_train), convert(Matrix{Float32}, X_test)
y_train, y__test = convert(Vector{Float32}, y_train), convert(Vector{Float32}, y__test)
Out[0]:
(Float32[5.0, 0.0, 4.0, 1.0, 9.0, 2.0, 1.0, 3.0, 1.0, 4.0  …  4.0, 0.0, 9.0, 0.0, 6.0, 1.0, 2.0, 2.0, 3.0, 3.0], Float32[1.0, 8.0, 5.0, 9.0, 8.0, 0.0, 3.0, 1.0, 3.0, 2.0  …  7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0])

Connecting a library for data visualization:

In [ ]:
using Plots

Displaying an object and its class:

In [ ]:
test_img = Vector(df[60000,1:784])
test_img = (reshape(test_img, 28, 28)) / 256
using Colors
println("Класс объекта: ", df[60000,785])
plot(Gray.(test_img))
Класс объекта: 8
Out[0]:

The final transformation of data for processing by a neural network:

In [ ]:
X_train, X_test = X_train', X_test'
y_train, y__test = y_train', y__test'
Out[0]:
(Float32[5.0 0.0 … 3.0 3.0], Float32[1.0 8.0 … 5.0 6.0])

Connecting the machine learning library:

In [ ]:
using Flux, Optimisers;

Defining the structure of a neural network:

In [ ]:
model = Chain(
    Dense(784, 15,elu),
    Dense(15, 10,sigmoid),
    softmax
)
Out[0]:
Chain(
  Dense(784 => 15, elu),                # 11_775 parameters
  Dense(15 => 10, σ),                   # 160 parameters
  NNlib.softmax,
)                   # Total: 4 arrays, 11_935 parameters, 46.871 KiB.

Test recognition result (before training the model):

Defining learning parameters:

In [ ]:
learning_rate = 0.01f0
opt = Optimisers.Adam(learning_rate)
state = Optimisers.setup(opt, model)

function loss(model, x, y)
    y_oh = Flux.onehotbatch(y, 0:9)  # размер (10, 1, N)
    y_pred = model(x)                 # размер (10, N)
    
    # Добавляем размерность для совпадения с y_oh
    y_pred_reshaped = Flux.unsqueeze(y_pred, dims=2)  # теперь (10, 1, N)
    
    return Flux.mse(y_pred_reshaped, y_oh)
end
Out[0]:
loss (generic function with 1 method)

Defining a function to calculate the accuracy of the model:

In [ ]:
function accuracy(model, X, y)
    correct = 0
    for i in 1:length(y)
        # Подготовка входа: добавить измерение батча
        x_input = reshape(X[:, i], :, 1)  # (features, 1)
        
        # Предсказание модели
        probs = model(x_input)  # размер (10, 1)
        
        # Преобразование в цифру
        predicted_digit = argmax(probs)[1] - 1
        
        # Сравнение с истинной меткой
        if predicted_digit == y[i]
            correct += 1
        end
    end
    return correct / length(y)
end
Out[0]:
accuracy (generic function with 1 method)

The iterative learning process of the model:

In [ ]:
loss_history = []
epochs = 100

for epoch in 1:epochs
    # Вычисление градиентов
    grads = gradient(model) do m
        loss(m, X_train, y_train)
    end
    
    # Обновление модели и состояния
    state, model = Optimisers.update(state, model, grads[1])
    
    # Расчет и сохранение потерь
    current_loss = loss(model, X_train, y_train)
    push!(loss_history, current_loss)
    
    # Расчет точности
    acc = accuracy(model, X_test, y__test) * 100
    
    # Логирование
    if epoch == 1 || epoch % 1 == 0
        println("Epoch $epoch: Training Loss = $current_loss, Accuracy = $acc%")
    end
end
Epoch 1: Training Loss = 0.08964709, Accuracy = 12.314285714285713%
Epoch 2: Training Loss = 0.08764407, Accuracy = 13.814285714285715%
Epoch 3: Training Loss = 0.08620157, Accuracy = 14.399999999999999%
Epoch 4: Training Loss = 0.08494053, Accuracy = 14.371428571428572%
Epoch 5: Training Loss = 0.08356112, Accuracy = 15.828571428571427%
Epoch 6: Training Loss = 0.082328424, Accuracy = 19.071428571428573%
Epoch 7: Training Loss = 0.08158612, Accuracy = 23.864285714285714%
Epoch 8: Training Loss = 0.08063085, Accuracy = 28.221428571428568%
Epoch 9: Training Loss = 0.07972546, Accuracy = 31.321428571428573%
Epoch 10: Training Loss = 0.078886844, Accuracy = 34.30714285714286%
Epoch 11: Training Loss = 0.07808643, Accuracy = 38.44285714285714%
Epoch 12: Training Loss = 0.077700436, Accuracy = 41.84285714285714%
Epoch 13: Training Loss = 0.07742899, Accuracy = 44.48571428571428%
Epoch 14: Training Loss = 0.07700219, Accuracy = 47.35714285714286%
Epoch 15: Training Loss = 0.07660475, Accuracy = 49.471428571428575%
Epoch 16: Training Loss = 0.07617868, Accuracy = 51.41428571428571%
Epoch 17: Training Loss = 0.07569719, Accuracy = 53.22857142857143%
Epoch 18: Training Loss = 0.07526767, Accuracy = 54.478571428571435%
Epoch 19: Training Loss = 0.07488683, Accuracy = 55.72142857142857%
Epoch 20: Training Loss = 0.074611954, Accuracy = 56.57142857142857%
Epoch 21: Training Loss = 0.07444562, Accuracy = 57.107142857142854%
Epoch 22: Training Loss = 0.074215636, Accuracy = 58.25714285714285%
Epoch 23: Training Loss = 0.073985055, Accuracy = 59.58571428571429%
Epoch 24: Training Loss = 0.07374545, Accuracy = 60.621428571428574%
Epoch 25: Training Loss = 0.07343423, Accuracy = 61.76428571428572%
Epoch 26: Training Loss = 0.07313286, Accuracy = 63.29285714285714%
Epoch 27: Training Loss = 0.0728745, Accuracy = 64.26428571428572%
Epoch 28: Training Loss = 0.0726323, Accuracy = 64.92142857142858%
Epoch 29: Training Loss = 0.0724468, Accuracy = 65.7%
Epoch 30: Training Loss = 0.07231355, Accuracy = 66.12857142857142%
Epoch 31: Training Loss = 0.07217003, Accuracy = 66.67142857142856%
Epoch 32: Training Loss = 0.07196423, Accuracy = 67.48571428571428%
Epoch 33: Training Loss = 0.07176558, Accuracy = 68.5%
Epoch 34: Training Loss = 0.07160473, Accuracy = 69.43571428571428%
Epoch 35: Training Loss = 0.07146187, Accuracy = 70.53571428571429%
Epoch 36: Training Loss = 0.0712925, Accuracy = 71.92857142857143%
Epoch 37: Training Loss = 0.07111777, Accuracy = 73.78571428571429%
Epoch 38: Training Loss = 0.07095047, Accuracy = 75.75714285714285%
Epoch 39: Training Loss = 0.07079176, Accuracy = 77.77857142857142%
Epoch 40: Training Loss = 0.07066103, Accuracy = 79.02142857142857%
Epoch 41: Training Loss = 0.070562966, Accuracy = 80.27142857142857%
Epoch 42: Training Loss = 0.070510894, Accuracy = 80.86428571428571%
Epoch 43: Training Loss = 0.070434295, Accuracy = 81.25%
Epoch 44: Training Loss = 0.07028215, Accuracy = 81.69285714285715%
Epoch 45: Training Loss = 0.070133194, Accuracy = 82.0%
Epoch 46: Training Loss = 0.07004519, Accuracy = 82.07142857142857%
Epoch 47: Training Loss = 0.06997859, Accuracy = 82.39999999999999%
Epoch 48: Training Loss = 0.06991084, Accuracy = 82.95%
Epoch 49: Training Loss = 0.06985075, Accuracy = 83.32857142857144%
Epoch 50: Training Loss = 0.06978005, Accuracy = 83.51428571428572%
Epoch 51: Training Loss = 0.0697167, Accuracy = 83.62857142857143%
Epoch 52: Training Loss = 0.069677204, Accuracy = 83.85000000000001%
Epoch 53: Training Loss = 0.06961788, Accuracy = 83.77857142857142%
Epoch 54: Training Loss = 0.069562666, Accuracy = 83.89999999999999%
Epoch 55: Training Loss = 0.069532044, Accuracy = 84.07857142857142%
Epoch 56: Training Loss = 0.069504425, Accuracy = 84.26428571428572%
Epoch 57: Training Loss = 0.06948212, Accuracy = 84.41428571428573%
Epoch 58: Training Loss = 0.06941597, Accuracy = 84.68571428571428%
Epoch 59: Training Loss = 0.06936886, Accuracy = 84.85714285714285%
Epoch 60: Training Loss = 0.06936117, Accuracy = 85.07142857142857%
Epoch 61: Training Loss = 0.069321334, Accuracy = 85.28571428571429%
Epoch 62: Training Loss = 0.06929054, Accuracy = 85.22857142857143%
Epoch 63: Training Loss = 0.069279574, Accuracy = 85.21428571428571%
Epoch 64: Training Loss = 0.069251925, Accuracy = 85.13571428571429%
Epoch 65: Training Loss = 0.069236554, Accuracy = 84.95714285714286%
Epoch 66: Training Loss = 0.06921506, Accuracy = 84.95714285714286%
Epoch 67: Training Loss = 0.069184825, Accuracy = 85.15%
Epoch 68: Training Loss = 0.06915006, Accuracy = 85.5142857142857%
Epoch 69: Training Loss = 0.06913231, Accuracy = 85.75%
Epoch 70: Training Loss = 0.06910449, Accuracy = 85.93571428571428%
Epoch 71: Training Loss = 0.069074914, Accuracy = 86.15%
Epoch 72: Training Loss = 0.069066346, Accuracy = 86.05714285714285%
Epoch 73: Training Loss = 0.06902928, Accuracy = 86.27857142857142%
Epoch 74: Training Loss = 0.06902256, Accuracy = 86.4857142857143%
Epoch 75: Training Loss = 0.06904491, Accuracy = 86.77857142857142%
Epoch 76: Training Loss = 0.069038324, Accuracy = 86.9%
Epoch 77: Training Loss = 0.06900674, Accuracy = 86.6%
Epoch 78: Training Loss = 0.068971805, Accuracy = 86.35000000000001%
Epoch 79: Training Loss = 0.0689787, Accuracy = 86.33571428571429%
Epoch 80: Training Loss = 0.06896412, Accuracy = 86.24285714285715%
Epoch 81: Training Loss = 0.06892337, Accuracy = 86.3%
Epoch 82: Training Loss = 0.068898536, Accuracy = 86.49285714285713%
Epoch 83: Training Loss = 0.068888664, Accuracy = 86.70714285714286%
Epoch 84: Training Loss = 0.06886721, Accuracy = 86.74285714285715%
Epoch 85: Training Loss = 0.06888327, Accuracy = 86.67857142857143%
Epoch 86: Training Loss = 0.06883431, Accuracy = 86.78571428571429%
Epoch 87: Training Loss = 0.06884734, Accuracy = 86.87857142857143%
Epoch 88: Training Loss = 0.06882276, Accuracy = 86.92142857142858%
Epoch 89: Training Loss = 0.06881059, Accuracy = 87.02857142857144%
Epoch 90: Training Loss = 0.06878725, Accuracy = 87.13571428571429%
Epoch 91: Training Loss = 0.06878075, Accuracy = 87.15%
Epoch 92: Training Loss = 0.06877423, Accuracy = 87.17857142857143%
Epoch 93: Training Loss = 0.068756245, Accuracy = 87.17857142857143%
Epoch 94: Training Loss = 0.06874979, Accuracy = 87.16428571428571%
Epoch 95: Training Loss = 0.06873655, Accuracy = 87.12142857142857%
Epoch 96: Training Loss = 0.06873606, Accuracy = 87.15%
Epoch 97: Training Loss = 0.06871858, Accuracy = 87.28571428571429%
Epoch 98: Training Loss = 0.068701506, Accuracy = 87.52142857142857%
Epoch 99: Training Loss = 0.06870646, Accuracy = 87.68571428571428%
Epoch 100: Training Loss = 0.06867965, Accuracy = 87.8%

Visualization of changes in the loss function at each training step:

In [ ]:
plot((1:epochs), loss_history, title="Изменение функции потерь", xlabel="Шаг обучения", ylabel="Функция потерь")
Out[0]:

Displaying results:

In [ ]:
number = 3000
test_img = Vector(df[56000+number,1:784])
test_img = (reshape(test_img, 28, 28)) / 256
using Colors
result = model(X_test[:,number])
println("Известный класс объекта: ", df[56000+number,785], "\n  ", "Распознанная нейросетью цифра: ", (findfirst(x -> x == maximum(result), result)-1))
plot(Gray.(test_img'))
Известный класс объекта: 4
  Распознанная нейросетью цифра: 4
Out[0]:

Conclusion

In this example, pixel brightness data was preprocessed, and the neural network architecture, optimizer parameters, and loss function were determined.
The model was trained and showed a fairly accurate, but not perfect, class breakdown. To improve the quality of recognition, the neural network can be modified by changing the architecture of the layers and increasing the training sample.