Tutorial: Simple Multi-layer Perceptron
In this example, we create a simple https://en.wikipedia.org/wiki/Multilayer_perceptron#:~:text=A%20multilayer%20perceptron%20(MLP)%20is,artificial%20neural%20network%20(ANN).&text=An%20MLP%20consists%20of%20at,uses%20a%20nonlinear%20activation%20function.[multi-layer perceptron] (MLP) that classifies handwritten digits using the MNIST dataset. A MLP consists of at least three layers of stacked perceptrons: Input, hidden, and output. Each neuron of an MLP has parameters (weights and bias) and uses an activation function to compute its output.
To run this example, we need the following packages:
using Flux, Statistics
using Flux.Data: DataLoader
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, params
using Base.Iterators: repeated
using CUDA
using MLDatasets
if has_cuda() # Check if CUDA is available
@info "CUDA is on"
CUDA.allowscalar(false)
end
We set default values for learning rate, batch size, epochs, and the usage of a GPU (if available) for our model:
Base.@kwdef mutable struct Args
rate::Float64 = 3e-4 # learning rate
batchsize::Int = 1024 # batch size
epochs::Int = 10 # number of epochs
device::Function = gpu # set as gpu, if gpu available
end
If a GPU is available on our local system, then Flux uses it for computing the loss and updating the weights and biases when training our model.
Data
We create the function getdata
to load the MNIST train and test data sets from MLDatasets and prepare them for the training process. In addition, we set mini-batches of the data sets by loading them onto a DataLoader object.
function getdata(args)
ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
# Loading Dataset
xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
xtest, ytest = MLDatasets.MNIST.testdata(Float32)
# Reshape Data in order to flatten each image into a linear array
xtrain = Flux.flatten(xtrain)
xtest = Flux.flatten(xtest)
# One-hot-encode the labels
ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)
# Batching
train_data = DataLoader((xtrain, ytrain), batchsize=args.batchsize, shuffle=true)
test_data = DataLoader((xtest, ytest), batchsize=args.batchsize)
return train_data, test_data
end
getdata
performs the following steps:
-
Loads MNIST data set: Loads the train and test set tensors. The shape of train data is
28x28x60000
and test data is28X28X10000
. -
Reshapes the train and test data: Uses the flatten function to reshape the train data set into a
784x60000
array and test data set into a784x10000
. Notice that we reshape the data so that we can pass these as arguments for the input layer of our model (a simple MLP expects a vector as an input). -
One-hot encodes the train and test labels: Creates a batch of one-hot vectors so we can pass the labels of the data as arguments for the loss function. For this example, we use the logitcrossentropy function and it expects data to be one-hot encoded.
-
Creates batches of data: Creates two DataLoader objects (train and test) that handle data mini-batches of size
1024
(as defined above). We create these two objects so that we can pass the entire data set through the loss function at once when training our model. Also, it shuffles the data points during each iteration (shuffle=true
).
Model
As we mentioned above, a MLP consist of three layers that are fully connected. For this example, we define out model with the following layers and dimensions:
-
Input: It has
784
perceptrons (the MNIST image size is28x28
). We flatten the train and test data so that we can pass them as arguments to this layer. -
Hidden: It has
32
perceptrons that use the relu activation function. -
Output: It has
10
perceptrons that output the model’s prediction or probability that a digit is 0 to 9.
We define our model with the build_model
function:
function build_model(; imgsize=(28,28,1), nclasses=10)
return Chain(
Dense(prod(imgsize), 32, relu),
Dense(32, nclasses))
end
Loss functions
Now, we define the loss function loss_all
. It expects a DataLoader object and the model
function we defined aboved as arguments. Notice that this function iterates through the dataloader
object in mini-batches and uses the function logitcrossentropy to compute the difference between the predicted and actual values.
function loss_all(dataloader, model)
l = 0f0
for (x,y) in dataloader
l += logitcrossentropy(model(x), y)
end
l/length(dataloader)
end
In addition, we define the function (accuracy
) to report the accuracy of our model during the training process. To compute the accuray, we need to decode the output of our model using the onecold function.
function accuracy(data_loader, model)
acc = 0
for (x,y) in data_loader
acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)
end
acc/length(data_loader)
end
Train our model
Finally, we create the train
function that calls the functions we defined and trains the model.
function train(; kws...)
# Initializing Model parameters
args = Args(; kws...)
# Load Data
train_data,test_data = getdata(args)
# Construct model
m = build_model()
train_data = args.device.(train_data)
test_data = args.device.(test_data)
m = args.device(m)
loss(x,y) = logitcrossentropy(m(x), y)
## Training
evalcb = () -> @show(loss_all(train_data, m))
opt = Adam(args.rate)
for epoch in 1:args.epochs
@info "Epoch $epoch"
Flux.train!(loss, params(m), train_data, opt, cb = evalcb)
end
@show accuracy(train_data, m)
@show accuracy(test_data, m)
end
train
performs the following steps:
-
Initializes the model parameters: Creates the
args
object that contains the defult values for training our model. -
Loads the train and test data: Calls the function
getdata
we defined above. -
Constructs the model: Builds the model and loads the train and test data sets, and our model onto the GPU (if available).
-
Trains the model: Defines the callback function
evalcb
to show the value of theloss_all
function during the training process. Then, it sets link:@ref Flux.Optimise.Adam[Adam] as the optimiser for training out model. Finally, it runs the training process for10
epochs (as defined in theargs
object) and shows theaccuracy
value for the train and test data.
To see the full version of this example, see Simple multi-layer perceptron - model-zoo.
Resources
Originally published at fluxml.ai on 26 January 2021. Written by Adarsh Kumar, Mike J Innes, Andrew Dinhobl, Jerry Ling, natema, Zhang Shitian, Liliana Badillo, Dhairya Gandhi |