Сообщество Engee

Распознавание рукописных цифр

Автор
avatar-mikhailpetrovmikhailpetrov
Notebook

Построение и обучение нейросети для распознавания рукописных цифр

В данном примере будет рассмотрена обработка данных и обучение на их основе нейросетевой модели для классификации изображений. В качестве набора объектов наблюдений выбран датасет MNIST, в котором содержатся 70000 размеченных изображений рукописных цифр. В примере будет использоваться файл формата .csv, в котором изображения развёрнуты в виде табличных данных, содержащих значения яркости для каждого пикселя.

Подключение библиотек для обработки данных:

In [ ]:
Pkg.add(["Colors", "CSV", "Flux", "Optimisers"])
In [ ]:
using CSV, DataFrames

Загрузка данных в переменную:

In [ ]:
df = DataFrame(CSV.File("$(@__DIR__)/mnist_784.csv")); 

Вывод первых пяти строк датафрейма:

In [ ]:
first(df,5)
Out[0]:
5×785 DataFrame
685 columns omitted
Rowpixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9pixel10pixel11pixel12pixel13pixel14pixel15pixel16pixel17pixel18pixel19pixel20pixel21pixel22pixel23pixel24pixel25pixel26pixel27pixel28pixel29pixel30pixel31pixel32pixel33pixel34pixel35pixel36pixel37pixel38pixel39pixel40pixel41pixel42pixel43pixel44pixel45pixel46pixel47pixel48pixel49pixel50pixel51pixel52pixel53pixel54pixel55pixel56pixel57pixel58pixel59pixel60pixel61pixel62pixel63pixel64pixel65pixel66pixel67pixel68pixel69pixel70pixel71pixel72pixel73pixel74pixel75pixel76pixel77pixel78pixel79pixel80pixel81pixel82pixel83pixel84pixel85pixel86pixel87pixel88pixel89pixel90pixel91pixel92pixel93pixel94pixel95pixel96pixel97pixel98pixel99pixel100
Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64Int64
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
20000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
40000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
50000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Вывод первых пяти строк и последнего столбца с данными, который говорит о том, к какому классу пренадлежит объект наблюдения:

In [ ]:
df[1:5,780:785]
Out[0]:
5×6 DataFrame
Rowpixel780pixel781pixel782pixel783pixel784class
Int64Int64Int64Int64Int64Int64
1000005
2000000
3000004
4000001
5000009

Разбиение набора данных на тренировочную и тестовую выборку в соотношении 8 к 2:

In [ ]:
X_train, y_train = Matrix(df[1:56000,1:784]), df[1:56000,785]
X_test, y__test = Matrix(df[56001:end,1:784]), df[56001:end,785]
Out[0]:
([0 0 … 0 0; 0 0 … 0 0; … ; 0 0 … 0 0; 0 0 … 0 0], [1, 8, 5, 9, 8, 0, 3, 1, 3, 2  …  7, 8, 9, 0, 1, 2, 3, 4, 5, 6])

Конвертация выборок в форматы, приемлимые для обработки нейросетью:

In [ ]:
X_train, X_test = convert(Matrix{Float32}, X_train), convert(Matrix{Float32}, X_test)
y_train, y__test = convert(Vector{Float32}, y_train), convert(Vector{Float32}, y__test)
Out[0]:
(Float32[5.0, 0.0, 4.0, 1.0, 9.0, 2.0, 1.0, 3.0, 1.0, 4.0  …  4.0, 0.0, 9.0, 0.0, 6.0, 1.0, 2.0, 2.0, 3.0, 3.0], Float32[1.0, 8.0, 5.0, 9.0, 8.0, 0.0, 3.0, 1.0, 3.0, 2.0  …  7.0, 8.0, 9.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0])

Подключение библиотеки для визуализации данных:

In [ ]:
using Plots

Отображение объекта и его класса:

In [ ]:
test_img = Vector(df[60000,1:784])
test_img = (reshape(test_img, 28, 28)) / 256
using Colors
println("Класс объекта: ", df[60000,785])
plot(Gray.(test_img))
Класс объекта: 8
Out[0]:

Итоговое преобразование данных для обработки нейросетью:

In [ ]:
X_train, X_test = X_train', X_test'
y_train, y__test = y_train', y__test'
Out[0]:
(Float32[5.0 0.0 … 3.0 3.0], Float32[1.0 8.0 … 5.0 6.0])

Подключение библиотеки машинного обучения:

In [ ]:
using Flux, Optimisers;

Определение структуры нейросети:

In [ ]:
model = Chain(
    Dense(784, 15,elu),
    Dense(15, 10,sigmoid),
    softmax
)
Out[0]:
Chain(
  Dense(784 => 15, elu),                # 11_775 parameters
  Dense(15 => 10, σ),                   # 160 parameters
  NNlib.softmax,
)                   # Total: 4 arrays, 11_935 parameters, 46.871 KiB.

Тестовый результат распознавания (до обучения модели):

Определение параметров обучения:

In [ ]:
learning_rate = 0.01f0
opt = Optimisers.Adam(learning_rate)
state = Optimisers.setup(opt, model)

function loss(model, x, y)
    y_oh = Flux.onehotbatch(y, 0:9)  # размер (10, 1, N)
    y_pred = model(x)                 # размер (10, N)
    
    # Добавляем размерность для совпадения с y_oh
    y_pred_reshaped = Flux.unsqueeze(y_pred, dims=2)  # теперь (10, 1, N)
    
    return Flux.mse(y_pred_reshaped, y_oh)
end
Out[0]:
loss (generic function with 1 method)

Определение функции для подсчёта точности модели:

In [ ]:
function accuracy(model, X, y)
    correct = 0
    for i in 1:length(y)
        # Подготовка входа: добавить измерение батча
        x_input = reshape(X[:, i], :, 1)  # (features, 1)
        
        # Предсказание модели
        probs = model(x_input)  # размер (10, 1)
        
        # Преобразование в цифру
        predicted_digit = argmax(probs)[1] - 1
        
        # Сравнение с истинной меткой
        if predicted_digit == y[i]
            correct += 1
        end
    end
    return correct / length(y)
end
Out[0]:
accuracy (generic function with 1 method)

Итеративный процесс обучения модели:

In [ ]:
loss_history = []
epochs = 100

for epoch in 1:epochs
    # Вычисление градиентов
    grads = gradient(model) do m
        loss(m, X_train, y_train)
    end
    
    # Обновление модели и состояния
    state, model = Optimisers.update(state, model, grads[1])
    
    # Расчет и сохранение потерь
    current_loss = loss(model, X_train, y_train)
    push!(loss_history, current_loss)
    
    # Расчет точности
    acc = accuracy(model, X_test, y__test) * 100
    
    # Логирование
    if epoch == 1 || epoch % 1 == 0
        println("Epoch $epoch: Training Loss = $current_loss, Accuracy = $acc%")
    end
end
Epoch 1: Training Loss = 0.08964709, Accuracy = 12.314285714285713%
Epoch 2: Training Loss = 0.08764407, Accuracy = 13.814285714285715%
Epoch 3: Training Loss = 0.08620157, Accuracy = 14.399999999999999%
Epoch 4: Training Loss = 0.08494053, Accuracy = 14.371428571428572%
Epoch 5: Training Loss = 0.08356112, Accuracy = 15.828571428571427%
Epoch 6: Training Loss = 0.082328424, Accuracy = 19.071428571428573%
Epoch 7: Training Loss = 0.08158612, Accuracy = 23.864285714285714%
Epoch 8: Training Loss = 0.08063085, Accuracy = 28.221428571428568%
Epoch 9: Training Loss = 0.07972546, Accuracy = 31.321428571428573%
Epoch 10: Training Loss = 0.078886844, Accuracy = 34.30714285714286%
Epoch 11: Training Loss = 0.07808643, Accuracy = 38.44285714285714%
Epoch 12: Training Loss = 0.077700436, Accuracy = 41.84285714285714%
Epoch 13: Training Loss = 0.07742899, Accuracy = 44.48571428571428%
Epoch 14: Training Loss = 0.07700219, Accuracy = 47.35714285714286%
Epoch 15: Training Loss = 0.07660475, Accuracy = 49.471428571428575%
Epoch 16: Training Loss = 0.07617868, Accuracy = 51.41428571428571%
Epoch 17: Training Loss = 0.07569719, Accuracy = 53.22857142857143%
Epoch 18: Training Loss = 0.07526767, Accuracy = 54.478571428571435%
Epoch 19: Training Loss = 0.07488683, Accuracy = 55.72142857142857%
Epoch 20: Training Loss = 0.074611954, Accuracy = 56.57142857142857%
Epoch 21: Training Loss = 0.07444562, Accuracy = 57.107142857142854%
Epoch 22: Training Loss = 0.074215636, Accuracy = 58.25714285714285%
Epoch 23: Training Loss = 0.073985055, Accuracy = 59.58571428571429%
Epoch 24: Training Loss = 0.07374545, Accuracy = 60.621428571428574%
Epoch 25: Training Loss = 0.07343423, Accuracy = 61.76428571428572%
Epoch 26: Training Loss = 0.07313286, Accuracy = 63.29285714285714%
Epoch 27: Training Loss = 0.0728745, Accuracy = 64.26428571428572%
Epoch 28: Training Loss = 0.0726323, Accuracy = 64.92142857142858%
Epoch 29: Training Loss = 0.0724468, Accuracy = 65.7%
Epoch 30: Training Loss = 0.07231355, Accuracy = 66.12857142857142%
Epoch 31: Training Loss = 0.07217003, Accuracy = 66.67142857142856%
Epoch 32: Training Loss = 0.07196423, Accuracy = 67.48571428571428%
Epoch 33: Training Loss = 0.07176558, Accuracy = 68.5%
Epoch 34: Training Loss = 0.07160473, Accuracy = 69.43571428571428%
Epoch 35: Training Loss = 0.07146187, Accuracy = 70.53571428571429%
Epoch 36: Training Loss = 0.0712925, Accuracy = 71.92857142857143%
Epoch 37: Training Loss = 0.07111777, Accuracy = 73.78571428571429%
Epoch 38: Training Loss = 0.07095047, Accuracy = 75.75714285714285%
Epoch 39: Training Loss = 0.07079176, Accuracy = 77.77857142857142%
Epoch 40: Training Loss = 0.07066103, Accuracy = 79.02142857142857%
Epoch 41: Training Loss = 0.070562966, Accuracy = 80.27142857142857%
Epoch 42: Training Loss = 0.070510894, Accuracy = 80.86428571428571%
Epoch 43: Training Loss = 0.070434295, Accuracy = 81.25%
Epoch 44: Training Loss = 0.07028215, Accuracy = 81.69285714285715%
Epoch 45: Training Loss = 0.070133194, Accuracy = 82.0%
Epoch 46: Training Loss = 0.07004519, Accuracy = 82.07142857142857%
Epoch 47: Training Loss = 0.06997859, Accuracy = 82.39999999999999%
Epoch 48: Training Loss = 0.06991084, Accuracy = 82.95%
Epoch 49: Training Loss = 0.06985075, Accuracy = 83.32857142857144%
Epoch 50: Training Loss = 0.06978005, Accuracy = 83.51428571428572%
Epoch 51: Training Loss = 0.0697167, Accuracy = 83.62857142857143%
Epoch 52: Training Loss = 0.069677204, Accuracy = 83.85000000000001%
Epoch 53: Training Loss = 0.06961788, Accuracy = 83.77857142857142%
Epoch 54: Training Loss = 0.069562666, Accuracy = 83.89999999999999%
Epoch 55: Training Loss = 0.069532044, Accuracy = 84.07857142857142%
Epoch 56: Training Loss = 0.069504425, Accuracy = 84.26428571428572%
Epoch 57: Training Loss = 0.06948212, Accuracy = 84.41428571428573%
Epoch 58: Training Loss = 0.06941597, Accuracy = 84.68571428571428%
Epoch 59: Training Loss = 0.06936886, Accuracy = 84.85714285714285%
Epoch 60: Training Loss = 0.06936117, Accuracy = 85.07142857142857%
Epoch 61: Training Loss = 0.069321334, Accuracy = 85.28571428571429%
Epoch 62: Training Loss = 0.06929054, Accuracy = 85.22857142857143%
Epoch 63: Training Loss = 0.069279574, Accuracy = 85.21428571428571%
Epoch 64: Training Loss = 0.069251925, Accuracy = 85.13571428571429%
Epoch 65: Training Loss = 0.069236554, Accuracy = 84.95714285714286%
Epoch 66: Training Loss = 0.06921506, Accuracy = 84.95714285714286%
Epoch 67: Training Loss = 0.069184825, Accuracy = 85.15%
Epoch 68: Training Loss = 0.06915006, Accuracy = 85.5142857142857%
Epoch 69: Training Loss = 0.06913231, Accuracy = 85.75%
Epoch 70: Training Loss = 0.06910449, Accuracy = 85.93571428571428%
Epoch 71: Training Loss = 0.069074914, Accuracy = 86.15%
Epoch 72: Training Loss = 0.069066346, Accuracy = 86.05714285714285%
Epoch 73: Training Loss = 0.06902928, Accuracy = 86.27857142857142%
Epoch 74: Training Loss = 0.06902256, Accuracy = 86.4857142857143%
Epoch 75: Training Loss = 0.06904491, Accuracy = 86.77857142857142%
Epoch 76: Training Loss = 0.069038324, Accuracy = 86.9%
Epoch 77: Training Loss = 0.06900674, Accuracy = 86.6%
Epoch 78: Training Loss = 0.068971805, Accuracy = 86.35000000000001%
Epoch 79: Training Loss = 0.0689787, Accuracy = 86.33571428571429%
Epoch 80: Training Loss = 0.06896412, Accuracy = 86.24285714285715%
Epoch 81: Training Loss = 0.06892337, Accuracy = 86.3%
Epoch 82: Training Loss = 0.068898536, Accuracy = 86.49285714285713%
Epoch 83: Training Loss = 0.068888664, Accuracy = 86.70714285714286%
Epoch 84: Training Loss = 0.06886721, Accuracy = 86.74285714285715%
Epoch 85: Training Loss = 0.06888327, Accuracy = 86.67857142857143%
Epoch 86: Training Loss = 0.06883431, Accuracy = 86.78571428571429%
Epoch 87: Training Loss = 0.06884734, Accuracy = 86.87857142857143%
Epoch 88: Training Loss = 0.06882276, Accuracy = 86.92142857142858%
Epoch 89: Training Loss = 0.06881059, Accuracy = 87.02857142857144%
Epoch 90: Training Loss = 0.06878725, Accuracy = 87.13571428571429%
Epoch 91: Training Loss = 0.06878075, Accuracy = 87.15%
Epoch 92: Training Loss = 0.06877423, Accuracy = 87.17857142857143%
Epoch 93: Training Loss = 0.068756245, Accuracy = 87.17857142857143%
Epoch 94: Training Loss = 0.06874979, Accuracy = 87.16428571428571%
Epoch 95: Training Loss = 0.06873655, Accuracy = 87.12142857142857%
Epoch 96: Training Loss = 0.06873606, Accuracy = 87.15%
Epoch 97: Training Loss = 0.06871858, Accuracy = 87.28571428571429%
Epoch 98: Training Loss = 0.068701506, Accuracy = 87.52142857142857%
Epoch 99: Training Loss = 0.06870646, Accuracy = 87.68571428571428%
Epoch 100: Training Loss = 0.06867965, Accuracy = 87.8%

Визуализация изменения функции потерь на каждом шаге обучения:

In [ ]:
plot((1:epochs), loss_history, title="Изменение функции потерь", xlabel="Шаг обучения", ylabel="Функция потерь")
Out[0]:

Отображение результатов:

In [ ]:
number = 3000
test_img = Vector(df[56000+number,1:784])
test_img = (reshape(test_img, 28, 28)) / 256
using Colors
result = model(X_test[:,number])
println("Известный класс объекта: ", df[56000+number,785], "\n  ", "Распознанная нейросетью цифра: ", (findfirst(x -> x == maximum(result), result)-1))
plot(Gray.(test_img'))
Известный класс объекта: 4
  Распознанная нейросетью цифра: 4
Out[0]:

Вывод

В данном примере были предобработаны данные о яркостях пикселей, а также определена архитектура нейросети, параметры оптимизатора и функция потерь. Модель была обучена и показала достаточно точное, но не идеальное разбиение по классам. Для улучшения качества распознавания нейросеть может быть модифицирована путём изменения архитектуры слоёв и увеличения обучающей выборки.