Tutorial: Simple Multi-layer Perceptron

In this example, we create a simple https://en.wikipedia.org/wiki/Multilayer_perceptron#:~:text=A%20multilayer%20perceptron%20(MLP)%20is,artificial%20neural%20network%20(ANN).&text=An%20MLP%20consists%20of%20at,uses%20a%20nonlinear%20activation%20function.[multi-layer perceptron] (MLP) that classifies handwritten digits using the MNIST dataset. A MLP consists of at least three layers of stacked perceptrons: Input, hidden, and output. Each neuron of an MLP has parameters (weights and bias) and uses an activation function to compute its output.

To run this example, we need the following packages:

using Flux, Statistics
using Flux.Data: DataLoader
using Flux: onehotbatch, onecold, logitcrossentropy, throttle, params
using Base.Iterators: repeated
using CUDA
using MLDatasets
if has_cuda()		# Check if CUDA is available
    @info "CUDA is on"
    CUDA.allowscalar(false)
end

We set default values for learning rate, batch size, epochs, and the usage of a GPU (if available) for our model:

Base.@kwdef mutable struct Args
    rate::Float64 = 3e-4    # learning rate
    batchsize::Int = 1024   # batch size
    epochs::Int = 10        # number of epochs
    device::Function = gpu  # set as gpu, if gpu available
end

If a GPU is available on our local system, then Flux uses it for computing the loss and updating the weights and biases when training our model.

Data

We create the function getdata to load the MNIST train and test data sets from MLDatasets and prepare them for the training process. In addition, we set mini-batches of the data sets by loading them onto a DataLoader object.

function getdata(args)
    ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"

    # Loading Dataset
    xtrain, ytrain = MLDatasets.MNIST.traindata(Float32)
    xtest, ytest = MLDatasets.MNIST.testdata(Float32)

    # Reshape Data in order to flatten each image into a linear array
    xtrain = Flux.flatten(xtrain)
    xtest = Flux.flatten(xtest)

    # One-hot-encode the labels
    ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)

    # Batching
    train_data = DataLoader((xtrain, ytrain), batchsize=args.batchsize, shuffle=true)
    test_data = DataLoader((xtest, ytest), batchsize=args.batchsize)

    return train_data, test_data
end

getdata performs the following steps:

Loads MNIST data set: Loads the train and test set tensors. The shape of train data is 28x28x60000 and test data is 28X28X10000.
Reshapes the train and test data: Uses the flatten function to reshape the train data set into a 784x60000 array and test data set into a 784x10000. Notice that we reshape the data so that we can pass these as arguments for the input layer of our model (a simple MLP expects a vector as an input).
One-hot encodes the train and test labels: Creates a batch of one-hot vectors so we can pass the labels of the data as arguments for the loss function. For this example, we use the logitcrossentropy function and it expects data to be one-hot encoded.
Creates batches of data: Creates two DataLoader objects (train and test) that handle data mini-batches of size 1024 (as defined above). We create these two objects so that we can pass the entire data set through the loss function at once when training our model. Also, it shuffles the data points during each iteration (shuffle=true).

Model

As we mentioned above, a MLP consist of three layers that are fully connected. For this example, we define out model with the following layers and dimensions:

Input: It has 784 perceptrons (the MNIST image size is 28x28). We flatten the train and test data so that we can pass them as arguments to this layer.
Hidden: It has 32 perceptrons that use the relu activation function.
Output: It has 10 perceptrons that output the model’s prediction or probability that a digit is 0 to 9.

We define our model with the build_model function:

function build_model(; imgsize=(28,28,1), nclasses=10)
    return Chain(
 	    Dense(prod(imgsize), 32, relu),
            Dense(32, nclasses))
end

Note that we use the functions Dense so that our model is densely (or fully) connected and Chain to chain the computation of the three layers.

Loss functions

Now, we define the loss function loss_all. It expects a DataLoader object and the model function we defined aboved as arguments. Notice that this function iterates through the dataloader object in mini-batches and uses the function logitcrossentropy to compute the difference between the predicted and actual values.

function loss_all(dataloader, model)
    l = 0f0
    for (x,y) in dataloader
        l += logitcrossentropy(model(x), y)
    end
    l/length(dataloader)
end

In addition, we define the function (accuracy) to report the accuracy of our model during the training process. To compute the accuray, we need to decode the output of our model using the onecold function.

function accuracy(data_loader, model)
    acc = 0
    for (x,y) in data_loader
        acc += sum(onecold(cpu(model(x))) .== onecold(cpu(y)))*1 / size(x,2)
    end
    acc/length(data_loader)
end

Train our model

Finally, we create the train function that calls the functions we defined and trains the model.

function train(; kws...)
    # Initializing Model parameters
    args = Args(; kws...)

    # Load Data
    train_data,test_data = getdata(args)

    # Construct model
    m = build_model()
    train_data = args.device.(train_data)
    test_data = args.device.(test_data)
    m = args.device(m)
    loss(x,y) = logitcrossentropy(m(x), y)

    ## Training
    evalcb = () -> @show(loss_all(train_data, m))
    opt = Adam(args.rate)

    for epoch in 1:args.epochs
        @info "Epoch $epoch"
        Flux.train!(loss, params(m), train_data, opt, cb = evalcb)
    end

    @show accuracy(train_data, m)

    @show accuracy(test_data, m)
end

train performs the following steps:

Initializes the model parameters: Creates the args object that contains the defult values for training our model.
Loads the train and test data: Calls the function getdata we defined above.
Constructs the model: Builds the model and loads the train and test data sets, and our model onto the GPU (if available).
Trains the model: Defines the callback function evalcb to show the value of the loss_all function during the training process. Then, it sets link:@ref Flux.Optimise.Adam[Adam] as the optimiser for training out model. Finally, it runs the training process for 10 epochs (as defined in the args object) and shows the accuracy value for the train and test data.

To see the full version of this example, see Simple multi-layer perceptron - model-zoo.

Resources

Originally published at fluxml.ai on 26 January 2021. Written by Adarsh Kumar, Mike J Innes, Andrew Dinhobl, Jerry Ling, natema, Zhang Shitian, Liliana Badillo, Dhairya Gandhi