C-code generation for neural networks¶

In this example, we will explore the capabilities of Symbolics and Flux libraries to generate independent C code for various neural networks - full-connected, recurrent and convolutional.

Required libraries¶

The first thing to check is that you have the required libraries installed.

Pkg.add(["Flux", "ChainPlots", "Symbolics"])

Pkg.add( ["Flux", "Symbolics", "ChainPlots"] )

The library Flux allows you to create neural networks whose code is specified in the language Julia. The fact that the neural network is written in a high-level language and not in a third-party library or DSL (Domain Specific Language such as PyTorch) will allow us to use the polymorphism mechanism and convert the neural network into a mathematical expression from which we can generate code.

Until the Flux library is updated to the latest version, the following code will produce a harmless error message Error during loading of extension DiffEqBaseZygoteExt..., which does not interfere with the operation. Just right click on it and select Удалить выбранное.

using Flux, Symbolics, ChainPlots
gr();

Careful, neural networks can contain very many parameters. Translating a neural network containing hundreds or thousands of parameters into a mathematical expression may take considerable time (several tens of seconds), and the expression itself is unlikely to be easy to understand.

Code generation from a single neuron¶

Let's define a "neural network" from a single neuron using the library Flux, and then generate it as a mathematical equation and as platform-independent code.

We will repeat the names of the libraries once to make it clearer where exactly they are used

using Flux
model = Dense( 1=>1, relu )

Dense(1 => 1, relu)  # 2 parameters

This is a very simple customisable algorithm with two parameters, which can be seen in symbolic form as follows:

using Symbolics
@variables x
model( [x] )

The activation function ReLU, quite expectedly, creates a neuron with a piecewise linear description.

All coefficients of this algorithm are chosen randomly, but for the sake of convenience we assign some values to them.

model.weight .= [10]
model.bias .= [20]
model( [x] )

Let's generate the code and see what we get:

c_model_code = build_function( model( [x] ), [x]; target=Symbolics.CTarget(), fname="neural_net", lhsname=:y, rhsnames=[:x] )
println( """$c_model_code""" )

#include <math.h>
void neural_net(double* y, const double* x) {
  y[0] = ifelse(20.0f0 + 10.0f0 * x[0] < 0, 0, 20.0f0 + 10.0f0 * x[0]);
}

We see that, at a minimum, the code contains a non-standard function ifelse, and the parameters double have the suffix f0. These instructions are in some C standard that the library Symbolics is targeting, but since we did not customise the code generation for ourselves, but generated the code out of the box, we will need to transform it a bit. For example like this:

c_fixed_model_code = replace( c_model_code, "double" => "float", "f0" => "f" );

Such code can already be placed in a template and compiled:

c_code_standalone = """
#include <stdio.h>
#include <stdbool.h>

float ifelse(bool cond, float a, float b) { return cond ? a : b; }

$c_fixed_model_code

int main(int argc, char const *argv[]){
    float out[1];
    float in[] = {3.1415};
    
    neural_net( out, in );
    
    printf( "%f\\n", out[0] );
    return 0;
}""";

# Сохраняем в файл доработанный код
open("$(@__DIR__)/neural_net_fc.c", "w") do f
    println( f, "$c_code_standalone" )
end

;gcc -o out_fc neural_net_fc.c -lm

;./out_fc

51.415001

model( [3.1415] )

1-element Vector{Float32}:
 51.415

At this stage, you may have received a warning that it is better not to use Float64 parameters in the neural network, as it does not give a gain in calculation accuracy, but harms performance (especially on GPU). You can translate all neural network parameters to Float32 with the command model = fmap(f32, model).

As we can see, the value calculated using C code and using the neural network defined in Julia are almost the same, with rounding accuracy.

Code generation from a multilayer FC-neural network¶

Let's check code generation on the example of a neural network of two neurons. For now, this will be an ordinary multilayer fully connected neural network (fc, fully connected, sometimes called perceptron).

using Flux
mlp_model = Chain( Dense( 1=>2), Dense(2=>1) )

Chain(
  Dense(1 => 2),                        # 4 parameters
  Dense(2 => 1),                        # 3 parameters
)                   # Total: 4 arrays, 7 parameters, 284 bytes.

Here is an illustration of this multilayer neural network.

using ChainPlots
p = np = plot( mlp_model, titlefontsize=10, size=(300,300), markersize=8, xticks=((0,1,2),["Входной\nслой", "Скрытый\nслой", "Выходной\nслой"]), markerstrokewidth=2, linewidth=1 )

This model is initialised with random values, but can already take vectors, matrices and more as input. Note that a neural network always needs a vector of parameters, even if they are symbolic variables, as input.

using Symbolics
@variables x
c_model_code = build_function( mlp_model( [x] ), [x]; target=Symbolics.CTarget(), fname="neural_net", lhsname=:y, rhsnames=[:x] )
println( c_model_code )

#include <math.h>
void neural_net(double* y, const double* x) {
  y[0] = 0.0068449117f0 * x[0];
}

What's wrong with this code? Note that the expression for a multilayer network has been reduced to an expression for a single neuron. You should have **positioned an activation function (sigmoid, tanh...) other than linear (identity or -skip-) after the first layer of the neural network. A multilayer neural network consisting of only "linear layers" is analogous to a single neuron, and the Symbolics library simply shortened the expression.

After some transformations, this code can be placed in the C Function block on the Engee canvas, or you can add the main function to it and compile it into an executable binary.

Code generation from a recurrent neural network¶

Recurrent neural networks are good to use for analysing time series and generating sequences. They can also be multilayer but have a slightly different neuron topology. Let's see what kind of code we will get.

input_channels_nb = 2
hidden_state_size = 3
rnn = Flux.RNNCell(input_channels_nb, hidden_state_size)

# Эти параметры настраиваются в процессе обучения
# - для наглядности, назначим каждой матрице параметров некоторые значения
# - к сожалению, они не могут быть символьными
rnn.Wh .= 0.1 .+ reshape(1:hidden_state_size*hidden_state_size, hidden_state_size, hidden_state_size) ./ 10
rnn.Wi .= 0.01 .+ reshape(1:hidden_state_size*input_channels_nb, hidden_state_size, input_channels_nb) ./ 100
rnn.b .= 0.001 .+ collect(1:hidden_state_size) ./ 1000

X = Symbolics.variables( :x, 1:input_channels_nb )
H = Symbolics.variables( :h, 1:hidden_state_size )

rnn_model = Flux.Recur(rnn, H)  # Объект, который берет на себя работу с переменными состояния нейросети
y = rnn_model( X ) # Этой командой можно выполнять нейросеть и одновременно обновлять состояние

p = plot( rnn_model, titlefontsize=10, size=(300,300), markersize=8, xticks=((0,1,2),["Входной\nслой", "Рекуррентный\nслой"]), markerstrokewidth=2, linewidth=1 )

c_model_code = build_function( rnn_model( X ), [X; H]; target=Symbolics.CTarget(), fname="neural_net", lhsname=:LHS, rhsnames=[:RHS] )
println( c_model_code )

#include <math.h>
void neural_net(double* LHS, const double* RHS) {
  LHS[0] = tanh(0.002f0 + 0.02f0 * RHS[0] + 0.05f0 * RHS[1] + 0.2f0 * tanh(0.002f0 + 0.02f0 * RHS[0] + 0.05f0 * RHS[1] + 0.2f0 * RHS[2] + 0.5f0 * RHS[3] + 0.8f0 * RHS[4]) + 0.5f0 * tanh(0.003f0 + 0.03f0 * RHS[0] + 0.06f0 * RHS[1] + 0.3f0 * RHS[2] + 0.6f0 * RHS[3] + 0.9f0 * RHS[4]) + 0.8f0 * tanh(0.004f0 + RHS[4] + 0.04f0 * RHS[0] + 0.07f0 * RHS[1] + 0.4f0 * RHS[2] + 0.7f0 * RHS[3]));
  LHS[1] = tanh(0.003f0 + 0.03f0 * RHS[0] + 0.06f0 * RHS[1] + 0.3f0 * tanh(0.002f0 + 0.02f0 * RHS[0] + 0.05f0 * RHS[1] + 0.2f0 * RHS[2] + 0.5f0 * RHS[3] + 0.8f0 * RHS[4]) + 0.6f0 * tanh(0.003f0 + 0.03f0 * RHS[0] + 0.06f0 * RHS[1] + 0.3f0 * RHS[2] + 0.6f0 * RHS[3] + 0.9f0 * RHS[4]) + 0.9f0 * tanh(0.004f0 + RHS[4] + 0.04f0 * RHS[0] + 0.07f0 * RHS[1] + 0.4f0 * RHS[2] + 0.7f0 * RHS[3]));
  LHS[2] = tanh(0.004f0 + 0.04f0 * RHS[0] + 0.07f0 * RHS[1] + 0.4f0 * tanh(0.002f0 + 0.02f0 * RHS[0] + 0.05f0 * RHS[1] + 0.2f0 * RHS[2] + 0.5f0 * RHS[3] + 0.8f0 * RHS[4]) + 0.7f0 * tanh(0.003f0 + 0.03f0 * RHS[0] + 0.06f0 * RHS[1] + 0.3f0 * RHS[2] + 0.6f0 * RHS[3] + 0.9f0 * RHS[4]) + tanh(0.004f0 + RHS[4] + 0.04f0 * RHS[0] + 0.07f0 * RHS[1] + 0.4f0 * RHS[2] + 0.7f0 * RHS[3]));
}

This particular architecture returns the state vector h as an output variable (for it y and h are the same variables).

Other architectures will require a more complex code generation process. For example, the model can be trained using the usual library tools Flux, but a surrogate neural network architecture in the form of a Julia function (e.g., a function that will return y and h) will be developed for the code generation phase.

The amount of refinements for code generation from a recurrent neural network can be estimated by the following pattern, which can hardly be called simple:

c_fixed_model_code = replace( c_model_code, "double" => "float", "f0" => "f" );

c_code_standalone = """
#include <stdio.h>

$c_fixed_model_code

int main(int argc, char const *argv[]){
    float out[3];
    float in[] = {0, 1, 2, 3, 4};
    
    for( int i=0; i<5; i++ ){
        neural_net( out, in );
        printf( "%f %f %f\\n", out[0], out[1], out[2] );
        
        /* Подставим выходное значения в входной вектор в качестве вектора состояния */
        in[2] = out[0];
        in[3] = out[1];
        in[4] = out[2];
    }
    
    return 0;
}""";

# Сохраняем в файл доработанный код
open("$(@__DIR__)/neural_net_rnn.c", "w") do f
    println( f, "$c_code_standalone" )
end

Let's compile and run the code of this neural network.

;gcc -o out_rnn neural_net_rnn.c -lm

;./out_rnn

0.914112 0.952953 0.974463
0.901624 0.944008 0.968435
0.901236 0.943730 0.968246
0.901224 0.943721 0.968240
0.901223 0.943721 0.968239

We called the recurrent neural network 5 times with the same input data and each time we got different results, because the state vector of this neural network was updated.

We got a not very bright example, partly because the input data does not change, partly because we used the activation function tanh, and it tends to saturate the output value (the neural network tends to return numbers closer to 1 - a typical problem for RNNs).

Code generation from convolutional neural network¶

in_channels = 1
out_channels = 1
filter_x_size = 2
filter_y_size = 1
signal_size = 10

cnn_model = Chain(
    Conv((filter_x_size,filter_y_size), in_channels => out_channels, tanh; bias = false),
    Dense(signal_size - filter_x_size + 1 => 1)
    )

Chain(
  Conv((2, 1), 1 => 1, tanh, bias=false),  # 2 parameters
  Dense(9 => 1),                        # 10 parameters
)                   # Total: 3 arrays, 12 parameters, 496 bytes.

To visualise this neural network, we need to specify an example input vector xs with the correct dimensionality (width, height, channels, batches).

xs = rand( signal_size );
p = plot( cnn_model, reshape(xs, :, 1, 1, 1), titlefontsize=10, size=(300,300), markersize=8, xticks=((0,1,2),["Входной\nслой", "Сверточный\nслой", "Выходной\nслой"]), markerstrokewidth=2, linewidth=1 )

For clarity, let's set the last (FC) layer of the neural network to certain weights and offsets, which we will see in the generated code.

cnn_model[2].weight .= reshape( collect(1:(signal_size - filter_x_size + 1)), 1, :);
cnn_model[2].bias .= [0.1];

This model has a rather complex organisation of input data. Let us demonstrate how to execute it:

cnn_model( reshape(xs, :, 1, 1, 1) )[:] # w, h, channels, batch

1-element Vector{Float32}:
 33.048317

In this scenario, we just have to implement the convolution operation as a Julia function. When we simply try to substitute a quadratic matrix of symbolic variables into the model, we get an error UndefRefError: access to undefined reference. So let's look at this neural network and create a function that uses the parameters of the convolution kernel, and we can take the fully connected layer from the original model.

function my_convolution_net( c_in )
    c_out = [tanh( sum(c_in[i:i+1] .* reverse(cnn_model[1].weight[:]) )) for i in 1:length(c_in)-1]
    out = cnn_model[2]( c_out ); # Второй слой модели мы используем в нашем коде без изменений
    return out
end;

What will the code look like after generation?

X = Symbolics.variables( :x, 1:signal_size )
c_model_code = build_function( my_convolution_net( X ), X; target=Symbolics.CTarget(), fname="neural_net", lhsname=:LHS, rhsnames=[:RHS] )
println( c_model_code )

#include <math.h>
void neural_net(double* LHS, const double* RHS) {
  LHS[0] = 0.1f0 + 9.0f0 * tanh(1.0765818f0 * RHS[9] + 0.64346087f0 * RHS[8]) + 2.0f0 * tanh(0.64346087f0 * RHS[1] + 1.0765818f0 * RHS[2]) + 3.0f0 * tanh(0.64346087f0 * RHS[2] + 1.0765818f0 * RHS[3]) + 4.0f0 * tanh(0.64346087f0 * RHS[3] + 1.0765818f0 * RHS[4]) + 5.0f0 * tanh(0.64346087f0 * RHS[4] + 1.0765818f0 * RHS[5]) + 6.0f0 * tanh(0.64346087f0 * RHS[5] + 1.0765818f0 * RHS[6]) + 7.0f0 * tanh(0.64346087f0 * RHS[6] + 1.0765818f0 * RHS[7]) + 8.0f0 * tanh(0.64346087f0 * RHS[7] + 1.0765818f0 * RHS[8]) + tanh(0.64346087f0 * RHS[0] + 1.0765818f0 * RHS[1]);
}

The code looks quite workable. Let's put it into our established template, execute it, and compare it with the results of the previous functions.

c_fixed_model_code = replace( c_model_code, "double" => "float", "f0" => "f" );

c_code_standalone = """
#include <stdio.h>
#include <stdbool.h>

float ifelse(bool cond, float a, float b) { return cond ? a : b; }

$c_fixed_model_code

int main(int argc, char const *argv[]){
    float out[1];
    float in[] = {$(join(xs, ","))};
    
    neural_net( out, in );
    
    printf( "%f\\n", out[0] );
    return 0;
}""";

# Сохраняем в файл доработанный код
open("$(@__DIR__)/neural_net_cnn.c", "w") do f
    println( f, "$c_code_standalone" )
end

;gcc -o out_cnn neural_net_cnn.c -lm

;./out_cnn

33.048321

cnn_model( reshape(xs, :, 1, 1, 1) )[:] # w, h, channels, batch

1-element Vector{Float32}:
 33.048317

my_convolution_net( xs )

1-element Vector{Float32}:
 33.048317

Obviously, the results are similar with rounding accuracy.

Conclusion¶

We have generated code from the three most popular types of neural networks: fully connected (FC, MLP), recurrent (RNN) and convolutional (CNN).

Only for fully-connected neural networks code generation is relatively simple and straightforward, which allows us to significantly automate this process.

In other cases, when automating neural network code generation, it is useful to start from the task (1D or 2D convolutions, use of hidden layer of recurrent neural network). By the number of manual steps in setting up the process of code generation from CNNs or RNNs, it is obvious that automating this process requires anchoring specific scenarios.