Engee documentation
Notebook

Using a pre-trained ResNet neural network for image classification

Neural networks are a convenient flexible algorithm that can be trained computationally. With a successful training process, the same neural network can be used in many different tasks. This, for example, is the famous family of neural networks ResNet, created for image classification. In part or in whole, such neural networks are used in a wide variety of tasks where it is necessary to work with numerical representations of images, from graphical databases to style transfer.

In this example, we will run a neural network ResNet of shallow depth (18 layers) for image classification. We will show the whole chain of data preparation and output processing, which for any image will give us a more or less appropriate text label characterising the object depicted in the image.

Preparatory work

The Tape object of the Umlaut library , at the moment, is the main computational container where a neural network from the ONNX format can be unpacked. This startup mechanism is likely to be changed in the near future, as the ONNX library is in the process of being updated.

In [ ]:
Pkg.add(["Umlaut", "ONNX"])
In [ ]:
 import Pkg; Pkg.add("Umlaut", io=devnull)
 import Umlaut: Tape, play!

Of course, we will also need the libraries built into Engee to work with the ONNX format, where pre-trained neural networks are often stored, and a library to work with images.

In [ ]:
using ONNX
using Images
[ Info: Precompiling ONNX [d0dd6a25-fac6-55c0-abf7-829e0c774d20]

Let's install the working folder

In [ ]:
cd( @__DIR__ )

Object classes in ImageNet

The names of all object classes that our pre-trained neural network can recognise are represented as an ordered vector and are loaded from the following file.

In [ ]:
include( "imagenet_classes.jl" );

Input-output functions

Let's write three simple auxiliary functions to feed data into the neural network and process the results:

  1. Loading an image: we scale it to a size of 244*244 (normalisation and edge cropping with aspect ratio preservation would be a welcome addition)
  2. Sorting predictions: the neural network returns a vector of numbers that denote the probability that a particular class from the ImageNet dataset is observed in the image. Let's select k of the most likely represented classes
  3. The Shell for these functions allows you to load an image and output k of the most likely predictions in a single action
In [ ]:
# Загрузка изображения из файла
function imread(path::AbstractString; sz=(224,224))
    img = Images.load(path);
    img = imresize(img, sz);
    x = convert(Array{Float32}, channelview(img))
    # Заменим порядок слоев: CHW -> WHC
    x = permutedims(x, (3, 2, 1))
    return x
end

# Выдача индексов первых k предсказаний
function maxk(a, k)
    b = partialsortperm(a, 1:k, rev=true)
    return collect(zip(b, a[b]))
end

# Загрузка изображения и выдача десяти наиболее вероятных классов в убывающем порядке
function test_image(tape::Tape, path::AbstractString)
    x = imread(path)
    x = reshape(x, size(x)..., 1)
    y = play!(tape, x)
    y = reshape(y, size(y, 1))
    top = maxk(y, 10)
    classes = []
    for (i, (idx, val)) in enumerate(top)
        name = IMAGENET_CLASSES[idx - 1]
        classes = [classes; "$i: $name ($val)"]
    end
    return join(classes, "\n")
end
Out[0]:
test_image (generic function with 1 method)

Download the neural network ResNet18

We will be using a neural network that lies in a file with the extension *.onnx. There are libraries that allow you to create and load this neural network using even higher-level commands (e.g., the Metalhead.jl library from the FluxML collection), but for now we'll do it without additional libraries.

The pre-trained neural network is already in the specified directory, so the command will execute without re-downloading.

In [ ]:
path = "resnet18.onnx"

if !isfile(path)
    download("https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet18-v1-7.onnx", path)
end
In [ ]:
# Создадим пустую матрицу на месте которой будет входное изображение
img = rand( Float32, 224, 224, 3, 1 )

# Загружаем модель в виде объекта Umlaut.Tape
resnet = ONNX.load( path, img );

Let's download some images

If they have already been downloaded, similarly, they will not be downloaded again

In [ ]:
path = "data/"

goose_path = download( "https://upload.wikimedia.org/wikipedia/commons/3/3f/Snow_goose_2.jpg", path*"goose.jpg");
dog_path = download( "https://farm4.staticflickr.com/1301/4694470234_6f27a4f602_o.jpg", path*"dog.jpg");
plane_path = download( "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Rossiya%2C_RA-89043%2C_Sukhoi_Superjet_100-95B_%2851271265892%29.jpg/1024px-Rossiya%2C_RA-89043%2C_Sukhoi_Superjet_100-95B_%2851271265892%29.jpg", path*"plane.jpg");

Categorising images

In [ ]:
display( load(plane_path)[1:5:end, 1:5:end] )
print( test_image( resnet, plane_path ))
No description has been provided for this image
1: airliner (16.3932)
2: wing (12.387247)
3: warplane, military plane (11.546208)
4: airship, dirigible (10.379374)
5: space shuttle (9.501989)
6: missile (9.399774)
7: projectile, missile (8.82921)
8: tiger shark, Galeocerdo cuvieri (7.278539)
9: aircraft carrier, carrier, flattop, attack aircraft carrier (6.265907)
10: can opener, tin opener (6.131121)
In [ ]:
display( load(goose_path)[1:5:end, 1:5:end] )
print( test_image( resnet, goose_path ))
No description has been provided for this image
1: goose (14.927246)
2: crane (11.862924)
3: flamingo (11.377807)
4: spoonbill (11.055638)
5: white stork, Ciconia ciconia (10.838624)
6: American egret, great white heron, Egretta albus (10.136589)
7: pelican (10.013963)
8: bustard (9.504973)
9: peacock (9.44741)
10: albatross, mollymawk (8.912545)
In [ ]:
display( load(dog_path)[1:5:end, 1:5:end] )
print( test_image( resnet, dog_path ))
No description has been provided for this image
1: Pembroke, Pembroke Welsh corgi (16.254753)
2: Cardigan, Cardigan Welsh corgi (14.028244)
3: collie (11.081062)
4: golden retriever (10.7889805)
5: dingo, warrigal, warragal, Canis dingo (10.606851)
6: basenji (10.365379)
7: Shetland sheepdog, Shetland sheep dog, Shetland (9.810743)
8: beagle (9.483083)
9: Labrador retriever (9.437691)
10: Eskimo dog, husky (9.249486)

Conclusion

We have shown that it is not difficult to download a neural network in Engee and perform computations with it.

This mechanism allows us to organise a complex information processing pipeline consisting of high-level components. In particular, we can entrust some stages of information processing to pre-trained neural networks.