Engee documentation
Notebook

Using a pre-trained ResNet neural network for image classification

Neural networks are a convenient flexible algorithm that can be trained in computing. With a successful learning process, the same neural network can be used in many different tasks. This, for example, is the famous family of neural networks. ResNet, created for image classification. Partially or entirely, such neural networks are used in a wide variety of tasks where you need to work with numerical representations of images, from graphical databases to style transfer.

In this example, we will run a neural network. ResNet small depth (18 layers) for image classification. We will show the entire chain of data preparation and processing of output information, which for any image will give us a more or less suitable text label characterizing the object depicted in the picture.

Preparatory work

An object Tape libraries Umlaut, at the moment, is the main container for calculations, where you can decompress a neural network from the format ONNX. This trigger mechanism is likely to be changed in the near future, as the library ONNX it is currently in the process of updating.

In [ ]:
Pkg.add(["Umlaut", "ONNX"])
In [ ]:
 import Pkg; Pkg.add("Umlaut", io=devnull)
 import Umlaut: Tape, play!

Of course, we will also need libraries built into Engee to work with the ONNX format, which often stores pre-trained neural networks, and a library for working with images.

In [ ]:
using ONNX
using Images
[ Info: Precompiling ONNX [d0dd6a25-fac6-55c0-abf7-829e0c774d20]

Install the working folder

In [ ]:
cd( @__DIR__ )

Classes of objects in ImageNet

The names of all classes of objects that our pre-trained neural network can recognize are represented as an ordered vector and loaded from the following file.

In [ ]:
include( "imagenet_classes.jl" );

Input/output functions

Let's write three simple auxiliary functions for feeding data to the neural network and processing the results.:

  1. Image Upload: We scale it to size 244*244 (normalizing and cropping the edges while maintaining the proportions would be a welcome addition)
  2. Prediction sorting: The neural network returns a vector of numbers that indicate the probability that one or another class from the ImageNet dataset is observed in the image. Select k The most likely classes represented are
  3. The shell for these functions allows you to upload an image in one action and output k the most likely predictions
In [ ]:
# Загрузка изображения из файла
function imread(path::AbstractString; sz=(224,224))
    img = Images.load(path);
    img = imresize(img, sz);
    x = convert(Array{Float32}, channelview(img))
    # Заменим порядок слоев: CHW -> WHC
    x = permutedims(x, (3, 2, 1))
    return x
end

# Выдача индексов первых k предсказаний
function maxk(a, k)
    b = partialsortperm(a, 1:k, rev=true)
    return collect(zip(b, a[b]))
end

# Загрузка изображения и выдача десяти наиболее вероятных классов в убывающем порядке
function test_image(tape::Tape, path::AbstractString)
    x = imread(path)
    x = reshape(x, size(x)..., 1)
    y = play!(tape, x)
    y = reshape(y, size(y, 1))
    top = maxk(y, 10)
    classes = []
    for (i, (idx, val)) in enumerate(top)
        name = IMAGENET_CLASSES[idx - 1]
        classes = [classes; "$i: $name ($val)"]
    end
    return join(classes, "\n")
end
Out[0]:
test_image (generic function with 1 method)

Download the neural network ResNet18

We will use the neural network that is in the file with the extension *.onnx. There are libraries that allow you to create and load this neural network using even higher-level commands (for example, the Metalhead library.jl from the collection FluxML), but for now we will do it without additional libraries.

The pre-trained neural network is already located in the specified directory, so the command will be executed without downloading it again.

In [ ]:
path = "resnet18.onnx"

if !isfile(path)
    download("https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet18-v1-7.onnx", path)
end
In [ ]:
# Создадим пустую матрицу на месте которой будет входное изображение
img = rand( Float32, 224, 224, 3, 1 )

# Загружаем модель в виде объекта Umlaut.Tape
resnet = ONNX.load( path, img );

Upload some images

If they have already been downloaded, they will not be downloaded again.

In [ ]:
path = "data/"

goose_path = download( "https://upload.wikimedia.org/wikipedia/commons/3/3f/Snow_goose_2.jpg", path*"goose.jpg");
dog_path = download( "https://farm4.staticflickr.com/1301/4694470234_6f27a4f602_o.jpg", path*"dog.jpg");
plane_path = download( "https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Rossiya%2C_RA-89043%2C_Sukhoi_Superjet_100-95B_%2851271265892%29.jpg/1024px-Rossiya%2C_RA-89043%2C_Sukhoi_Superjet_100-95B_%2851271265892%29.jpg", path*"plane.jpg");

Classifying images

In [ ]:
display( load(plane_path)[1:5:end, 1:5:end] )
print( test_image( resnet, plane_path ))
No description has been provided for this image
1: airliner (16.3932)
2: wing (12.387247)
3: warplane, military plane (11.546208)
4: airship, dirigible (10.379374)
5: space shuttle (9.501989)
6: missile (9.399774)
7: projectile, missile (8.82921)
8: tiger shark, Galeocerdo cuvieri (7.278539)
9: aircraft carrier, carrier, flattop, attack aircraft carrier (6.265907)
10: can opener, tin opener (6.131121)
In [ ]:
display( load(goose_path)[1:5:end, 1:5:end] )
print( test_image( resnet, goose_path ))
No description has been provided for this image
1: goose (14.927246)
2: crane (11.862924)
3: flamingo (11.377807)
4: spoonbill (11.055638)
5: white stork, Ciconia ciconia (10.838624)
6: American egret, great white heron, Egretta albus (10.136589)
7: pelican (10.013963)
8: bustard (9.504973)
9: peacock (9.44741)
10: albatross, mollymawk (8.912545)
In [ ]:
display( load(dog_path)[1:5:end, 1:5:end] )
print( test_image( resnet, dog_path ))
No description has been provided for this image
1: Pembroke, Pembroke Welsh corgi (16.254753)
2: Cardigan, Cardigan Welsh corgi (14.028244)
3: collie (11.081062)
4: golden retriever (10.7889805)
5: dingo, warrigal, warragal, Canis dingo (10.606851)
6: basenji (10.365379)
7: Shetland sheepdog, Shetland sheep dog, Shetland (9.810743)
8: beagle (9.483083)
9: Labrador retriever (9.437691)
10: Eskimo dog, husky (9.249486)

Conclusion

We have shown that it is easy to download a neural network in Engee and use it to perform calculations.

This mechanism allows you to organize a complex information processing pipeline consisting of high-level components. In particular, to entrust some stages of information processing to pre-trained neural networks.