Engee documentation
Notebook

Creating a seq2seq model for a regression task

In this example, we will train a deep neural network to predict the remaining life of a gas turbine engine.

Downloading and unpacking data

The CMAPSSData open dataset stores simulated engine operation data.: 100 training examples and 100 validation examples. The training data contains a time series consisting of a sequence of measured parameters recorded from startup to failure. The validation data is terminated before the failure event occurs.

Each row contains 26 variables.:

  • 1: Engine number
  • 2: Operating time (in cycles)
  • 3-5: Engine settings
  • 6-26: Sensor measurements 1-21

Let's prepare the working environment and upload the dataset.

In [ ]:
Pkg.add(["Statistics", "DelimitedFiles", "JLD2", "Flux"])
   Resolving package versions...
  No Changes to `~/.project/Project.toml`
  No Changes to `~/.project/Manifest.toml`
In [ ]:
using DelimitedFiles
In [ ]:
dataTrain = Float32.( readdlm( "CMAPSSData/train_FD001.txt" ));

Prepare the training data

As a result of this processing, we will get arrays XTrain and YTrain, containing time series consisting of features (predictors, predictors) and target variables (targets, targets).

In [ ]:
function dataPreparation( dataTable )
    numObservations = maximum( dataTrain[:,1] )

    predictors = []
    responses = []
    
    for i  1:numObservations
        idx = dataTable[:,1] .== i
        push!( predictors, dataTable[idx, 3:end] )
        timeSteps = dataTable[idx,2]
        push!( responses, reverse(timeSteps) )
    end
    
    return predictors, responses
end
Out[0]:
dataPreparation (generic function with 1 method)
In [ ]:
XTrain, YTrain = dataPreparation( dataTrain );

Remove constant signs

Immutable features don't add anything to our learning process, so we'll get rid of them to speed up calculations. Namely, we will delete the columns from the dataset, where the minimum and maximum are the same values.

In [ ]:
m = [ minimum( table, dims=1 ) for table in XTrain ];
M = [ maximum( table, dims=1 ) for table in XTrain ];
In [ ]:
idxConstant = [ m[i] .== M[i] for i in 1:length(m) ];
constant_features = vec( maximum( vcat(idxConstant...), dims=1 ) );
In [ ]:
XTrain = [ xtrain[:, .!constant_features] for xtrain in XTrain  ];
numFeatures = size( XTrain[1], 2 )
Out[0]:
16

There are 16 signs left for each observation (it is worth remembering that all observations have different lengths, each engine has run a different number of cycles before failure).

Normalization of feature values

Another technique that usually speeds up learning is to prepare the signs by giving each series a mathematical expectation of 0 and a variance of 1. We will collect the signals from all the engines in one column.

In [ ]:
using Statistics

mu = vec(mean( vcat([xtrain for xtrain in XTrain]...), dims=1 ));
sig = vec(std( vcat([xtrain for xtrain in XTrain]...), dims=1 ));

XTrain = [ mapslices(row -> (row .- mu) ./ sig, xtrain; dims=2) for xtrain in XTrain ];
In [ ]:
gr()
plot(
    heatmap( XTrain[1], cbar=false, yflip=true, title="Наблюдения (predictors)", ylabel="Время (циклы)" ),
    heatmap( reshape(YTrain[1],:,1), yflip=true, title="Целевая переменная: остаток ресурса (target)" ),
    size=(800,400)
)
plot!( titlefont = font(9) )
Out[0]:

Limitation for the output variable

We will predict one number – the number of work cycles remaining before failure. It is more important for us that the predictions are more accurate towards the end of the life cycle.

Therefore, we will limit the predicted variable from above to a resource of 150 operating cycles per failure.

In [ ]:
thr = 150
YTrain = [ min.(ytrain, 150) for ytrain in YTrain ];

This is what the sensor readings for some engine in the dataset look like.

In [ ]:
obs_id = 77

dl = size( XTrain[obs_id], 1 );
cl = maximum( findall( YTrain[obs_id] .== thr ))

a = plot( XTrain[obs_id], leg=:false, title="Показания датчиков", label=:none )
vline!(a, [dl], lw=3, lc=:red, label=:none )
plot!(a,  [1, cl], [-4,-4], fillrange=[4,4], fillcolor=:springgreen, fillalpha=0.3, linealpha=0.0, label=:none )

b = plot( YTrain[obs_id], title="Наработка на отказ (в циклах)", label="Ресурс" )
vline!(b, [dl], lw=3, lc=:red, label="Отказ" )
plot!(b,  [1, cl], [0,0], fillrange=[thr+20,thr+20], fillcolor=:springgreen, fillalpha=0.3, linealpha=0.0, label="Полный ресурс" )

plot(a, b, layout=(2,1) )
Out[0]:

Let's sort the data

This is an optional step. Sometime it will help us split the data into more optimal batches, but first we just need to find out what the length of the largest sequence in the sample is.

In [ ]:
# Упорядочить данные по размеру
sequenceLength = [ size(xtrain,1) for xtrain in XTrain ];
idx = sortperm( sequenceLength, rev=true );
XTrain = XTrain[idx];
YTrain = YTrain[idx];
In [ ]:
bar( sort(sequenceLength), xlabel="Последовательность", ylabel="Длительность", title="Отсортированные данные", xflip=true)
Out[0]:

Neural network architecture

We will train two recurrent neural networks and compare the results.:

  • RNN neural network (a regular recurrent layer, followed by several fully connected ones)
  • LSTM neural network (instead of the basic recurrent layer, a slightly more complex, LSTM layer, the remaining layers are the same)

Before updating the library Flux until the latest version, the following code will produce a harmless error message Error during loading of extension DiffEqBaseZygoteExt... which does not interfere with work. Just right-click on it and select Удалить выбранное.

In [ ]:
Pkg.add( "Flux" ); # Установка библиотеки на случай ее отсутствия
   Resolving package versions...
  No Changes to `~/.project/Project.toml`
  No Changes to `~/.project/Manifest.toml`
In [ ]:
using Flux

numResponses = size( YTrain[1], 2 )
numHiddenUnits = 100
numHiddenUnits2 = 40

rnn = Chain(
    RNN(numFeatures => numHiddenUnits), 
    Dense(numHiddenUnits => numHiddenUnits2),
    Dropout(0.5),
    Dense(numHiddenUnits2 => numResponses)
) |> f32;

lstm = Chain(
    LSTM(numFeatures => numHiddenUnits),
    Dense(numHiddenUnits => 50),
    Dropout(0.5),
    Dense(50 => numResponses)
) |> f32;

Let's remember these two neural networks in the state they were in before the start of training, so that we can compare them with trained networks later.

In [ ]:
rnn_empty = deepcopy(rnn);
lstm_empty = deepcopy(lstm);

Learning cycles

First, we will train the RNN neural network.

In [ ]:
epochs = 100;

The training time of this network before the 100th epoch will be 5-7 minutes.

In [ ]:
using JLD2
if !isfile( "rnn.jld2" )
    model = rnn
    data = zip(XTrain, YTrain)
    opt_state = Flux.setup( OptimiserChain(ClipGrad(1), Adam(1e-2)), model )
    loss( , y ) = sqrt(Flux.mse( , y ))

    for epoch  1:epochs
        print( epoch, " ")
        Flux.train!( model, data, opt_state) do m, x, y
            loss(m(x')', y) # Функция потерь - ошибка на каждом элементе датасета
        end
    end

    # Сохраним полученную нейросеть в файл
    jldsave("rnn.jld2"; rnn);
end

For comparison, we will train an LSTM neural network (* be careful, the training time up to the 100th epoch may exceed 15 minutes*). Let's also add that the number of neurons in the hidden layers, of course, affects the learning time, but not as much as the number of epochs.

In [ ]:
if !isfile( "lstm.jld2" )
    model = lstm
    data = zip(XTrain, YTrain)
    opt_state = Flux.setup( OptimiserChain(ClipGrad(1), Adam(1e-2)), model )
    loss( , y ) = sqrt(Flux.mse( , y ))

    for epoch  1:epochs
        print( epoch, " ")
        Flux.train!( model, data, opt_state) do m, x, y
            loss(m(x')', y) # Функция потерь - ошибка на каждом элементе датасета
        end
    end

    jldsave("lstm.jld2"; lstm); # Сохраняем нейросеть
end
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 

It is believed that LSTM neural networks can "internalize" longer-term dependencies than RNNs.

In conventional recurrent neural networks, the influence of recent measurements on the result is always higher than * the influence of earlier measurements*.

That is, RNNs tend to "forget the distant past". But, on the other hand, the LSTM neural network learned 2-3 times slower than the RNN due to the larger number of equations in each neuron.

Checking the results

Neural networks are pretty good at predicting training data:

In [ ]:
using JLD2
@load "rnn.jld2" rnn
@load "lstm.jld2" lstm

test_id = 6

xi = XTrain[test_id]
yi = YTrain[test_id]

# Прогнозы до обучения
y_pred_rnn_empty = rnn_empty(xi');
y_pred_lstm_empty = lstm_empty(xi');

# # Прогнозы после обучения
y_pred_rnn = rnn(xi')
y_pred_lstm = lstm(xi')

plot(
    plot( [y_pred_rnn_empty' y_pred_lstm_empty'], label=["rnn" "lstm"], title="Прогнозы до обучения", lw=2 ),
    plot( [y_pred_rnn' y_pred_lstm' yi], label=["rnn" "lstm" "истина"], title="Прогнозы обученной нейросети", lw=2 ),
    size=(800,300), titlefont=font(11)
)
Out[0]:

As we have seen, before training, the neural network always gave forecasts close to zero. After training, she learned how to associate a vector of 16 dimensions with a scalar forecast of the remaining resource. It can be assumed that the model takes into account the background of the process, since there is always a downward trend in the forecast, despite the rather chaotic behavior of the signal.

Let's study the forecasts for the training sample.

In [ ]:
function get_plot(id)
    
    xi = XTrain[id]
    yi = YTrain[id]
    
    Flux.reset!( rnn )  # Сбросим состояние нейросети, чтобы накопленная информация
    Flux.reset!( lstm ) # от прошлых прогонов не повлияла на будущие прогнозы
    
    y_pred_rnn = rnn(xi');
    y_pred_lstm = lstm(xi');
    
    return plot( [y_pred_rnn' y_pred_lstm' yi], legend=false, lw=2, title="Прогнозы для выборки #$id" )
end

plot( get_plot.(1:16)..., titlefont=font(7), size=(800,600) )
Out[0]:

You can see that based on the LSTM test data, the neural network has learned to determine the beginning of the resource well and track the dynamics to the end. But the non-linearity of the learned characteristic is more often manifested in the learning outcomes of the RNN neural network (the hump in the middle of the function).

Let's prepare the test data:

In [ ]:
dataTest = Float32.( readdlm( "CMAPSSData/test_FD001.txt" ));
XTest, YTest = dataPreparation( dataTest );
XTest = [ xtest[:, .!constant_features] for xtest in XTest  ];
mu = vec(mean( vcat([xtest for xtest in XTest]...), dims=1 ));
sig = vec(std( vcat([xtest for xtest in XTest]...), dims=1 ));
XTest = [ mapslices(row -> (row .- mu) ./ sig, xtest; dims=2) for xtest in XTest ];

# Дополнение: мы знаем остаточный ресурс двигателей в тестовой выборке
YTest = Float32.( readdlm( "CMAPSSData/RUL_FD001.txt" ));
YTest = [ collect((size(XTest[i],1)+(YTest[i])-1):-1:YTest[i]) for i in 1:length(XTest) ];
YTest = [ min.(YTest[i], 150) for i in 1:size(YTest,1) ];

Let's see how the models predict the time to failure for one series of measurements from the total sample (on a test sample).

In [ ]:
function get_plot( id )
    
    xi = XTest[id]
    yi = YTest[id]
    
    Flux.reset!( rnn )  # Сбросим состояние нейросети, чтобы накопленная информация
    Flux.reset!( lstm ) # от прошлых прогонов не повлияла на будущие прогнозы
    
    y_pred_rnn = rnn(xi')
    y_pred_lstm = lstm(xi');
    
    return plot( [y_pred_rnn' y_pred_lstm' yi], legend=false, lw=2, title="Прогнозы для выборки #$id", c=[4 6 3] )
end

plot( get_plot.(1:16)..., titlefont=font(7), size=(800,600), legendfont=font(8) )
Out[0]:

Neural networks perform slightly worse on test data, but it is this information that allows us to choose a further direction for finalizing the architecture and the learning process. As expected, the prediction quality of both neural networks decreases when switching to test data (2.1 times for LSTM and 2.7 times for RNN).

Finally, let's examine the histogram of resource prediction errors at the end of each of the studied lifecycle segments.:

In [ ]:
dataset_final_rul = [ ytest[end] for ytest in YTest ]
rnn_final_prediction = [(Flux.reset!( rnn ); rnn(xi')[end]) for xi in XTest]
lstm_final_prediction = [(Flux.reset!( lstm ); lstm(xi')[end]) for xi in XTest]

plot(
    histogram( dataset_final_rul .- rnn_final_prediction, leg=:none, title="Ошибка прогноза RNN" ),
    histogram( dataset_final_rul .- lstm_final_prediction, leg=:none, title="Ошибка прогноза LSTM" ),
    size=(900,300), titlefont=font(9)
)
Out[0]:

It can be argued that the average error of the LSTM network is closer to 0 than that of the RNN.

We have saved the neural networks to files for future reference.

To load them into memory, you will need to download the library first Flux.

Conclusion

We have trained the neural network to predict the time to failure and now we can assemble a sensor for indirect measurements in order to work it out in the Engee model environment, as well as generate code for the embedded platform and conduct tests in real conditions.

We also note that for educational purposes, in order to speed up the learning process, we implemented a very small neural network and trained it for a fairly small number of cycles. The training of both models took about 10 minutes * in 4 threads on two processors (without GPU)*.