Engee documentation
Notebook

Creating a seq2seq model for a regression problem

In this example, we will train a deep neural network to predict the remaining life of a gas turbine engine.

Loading and unpacking data

The CMAPSSData open dataset stores simulated engine data: 100 training examples and 100 validation examples. The training data contain time series consisting of a sequence of measured parameters recorded from start-up to failure occurrence. The validation data are cut off before a failure event occurs.

Each row contains 26 variables:

  • 1: Engine number
  • 2: Run time (in cycles)
  • 3-5: Motor settings
  • 6-26: Sensor measurements 1-21

Prepare the working environment and load the dataset.

In [ ]:
Pkg.add(["Statistics", "DelimitedFiles", "JLD2", "Flux"])
In [ ]:
using DelimitedFiles
In [ ]:
dataTrain = Float32.( readdlm( "CMAPSSData/train_FD001.txt" ));

Prepare the training data

As a result of this processing, we obtain the arrays XTrain and YTrain, containing time series consisting of traits (predictors, predictors) and target variables (targets, targets).

In [ ]:
function dataPreparation( dataTable )
    numObservations = maximum( dataTrain[:,1] )

    predictors = []
    responses = []
    
    for i  1:numObservations
        idx = dataTable[:,1] .== i
        push!( predictors, dataTable[idx, 3:end] )
        timeSteps = dataTable[idx,2]
        push!( responses, reverse(timeSteps) )
    end
    
    return predictors, responses
end
Out[0]:
dataPreparation (generic function with 1 method)
In [ ]:
XTrain, YTrain = dataPreparation( dataTrain );

Remove constant signs

Constant signs do not add anything to our learning process, so to speed up the calculations we will get rid of them. Namely, we will remove from the dataset the columns whose minimum and maximum have the same values.

In [ ]:
m = [ minimum( table, dims=1 ) for table in XTrain ];
M = [ maximum( table, dims=1 ) for table in XTrain ];
In [ ]:
idxConstant = [ m[i] .== M[i] for i in 1:length(m) ];
constant_features = vec( maximum( vcat(idxConstant...), dims=1 ) );
In [ ]:
XTrain = [ xtrain[:, .!constant_features] for xtrain in XTrain  ];
numFeatures = size( XTrain[1], 2 )
Out[0]:
16

We are left with 16 attributes for each observation (remember that all observations have different lengths, each engine has worked a different number of cycles before failure).

Normalisation of feature values

Another technique that usually speeds up learning is to prepare the features by giving each series a mathematical expectation of 0 and a variance of 1. Let's collect the signals from all the motors into one column.

In [ ]:
using Statistics

mu = vec(mean( vcat([xtrain for xtrain in XTrain]...), dims=1 ));
sig = vec(std( vcat([xtrain for xtrain in XTrain]...), dims=1 ));

XTrain = [ mapslices(row -> (row .- mu) ./ sig, xtrain; dims=2) for xtrain in XTrain ];
In [ ]:
gr()
plot(
    heatmap( XTrain[1], cbar=false, yflip=true, title="Наблюдения (predictors)", ylabel="Время (циклы)" ),
    heatmap( reshape(YTrain[1],:,1), yflip=true, title="Целевая переменная: остаток ресурса (target)" ),
    size=(800,400)
)
plot!( titlefont = font(9) )
Out[0]:

The constraint for the output variable

We will predict a single number - the number of duty cycles remaining before failure occurs. It is more important to us that the predictions are more accurate near the end of the lifecycle.

Therefore, we will limit the predicted variable to 150 MTBF cycles.

In [ ]:
thr = 150
YTrain = [ min.(ytrain, 150) for ytrain in YTrain ];

This is how the sensor readings for a certain motor look like in the dataset.

In [ ]:
obs_id = 77

dl = size( XTrain[obs_id], 1 );
cl = maximum( findall( YTrain[obs_id] .== thr ))

a = plot( XTrain[obs_id], leg=:false, title="Показания датчиков", label=:none )
vline!(a, [dl], lw=3, lc=:red, label=:none )
plot!(a,  [1, cl], [-4,-4], fillrange=[4,4], fillcolor=:springgreen, fillalpha=0.3, linealpha=0.0, label=:none )

b = plot( YTrain[obs_id], title="Наработка на отказ (в циклах)", label="Ресурс" )
vline!(b, [dl], lw=3, lc=:red, label="Отказ" )
plot!(b,  [1, cl], [0,0], fillrange=[thr+20,thr+20], fillcolor=:springgreen, fillalpha=0.3, linealpha=0.0, label="Полный ресурс" )

plot(a, b, layout=(2,1) )
Out[0]:

Let's sort the data

This is an optional step, at some point it will help us split the data into more optimal batches, but first we just need to find out what the length of the longest sequence in the sample is equal to.

In [ ]:
# Упорядочить данные по размеру
sequenceLength = [ size(xtrain,1) for xtrain in XTrain ];
idx = sortperm( sequenceLength, rev=true );
XTrain = XTrain[idx];
YTrain = YTrain[idx];
In [ ]:
bar( sort(sequenceLength), xlabel="Последовательность", ylabel="Длительность", title="Отсортированные данные", xflip=true)
Out[0]:

Neural network architecture

We will train two recurrent neural networks and compare the results:

  • RNN neural network (*a basic recurrent layer followed by several fully connected ones)
  • LSTM neural network (a slightly more complex LSTM layer instead of the basic recurrent layer, the other layers are the same).

Until the Flux library is updated to the latest version, the following code will produce a harmless error message Error during loading of extension DiffEqBaseZygoteExt..., which does not interfere with operation. Just right click on it and select Удалить выбранное.

In [ ]:
Pkg.add( "Flux" ); # Установка библиотеки на случай ее отсутствия
In [ ]:
using Flux

numResponses = size( YTrain[1], 2 )
numHiddenUnits = 100
numHiddenUnits2 = 40

rnn = Chain( RNN(numFeatures => numHiddenUnits), 
             Dense(numHiddenUnits => numHiddenUnits2),
             Dropout( 0.5 ),
             Dense(numHiddenUnits2 => numResponses) ) |> f32;

lstm = Chain(
    LSTM(numFeatures => numHiddenUnits),
    Dense( numHiddenUnits => 50 ),
    Dropout( 0.5 ),
    Dense( 50 => numResponses )
) |> f32;

Let's remember these two neural networks in the state they had before training to compare them with the trained networks later.

In [ ]:
rnn_empty = deepcopy( rnn );
lstm_empty = deepcopy( lstm );

The loss function (RMSE) is formulated as follows:

In [ ]:
function loss(x, y)
    s = 0
    for (xi,yi) in zip(x,y)
        Flux.reset!( model );
        xi_pred = [ model(xi[i,:])[end] for i in 1:size(xi,1) ];
        s = s + (mse(xi_pred, yi)) .* 2
    end
    return sqrt( s )
end
Out[0]:
loss (generic function with 1 method)

It returns us the sum of the errors of all predictions for each sequence in the sample.

Training cycles

First, let's train the RNN neural network.

In [ ]:
epochs = 100;

The training time of this network up to the 100th epoch will be 5-7 minutes.

In [ ]:
model = rnn

opt = Flux.Optimiser(ClipValue(1), Adam(0.01))
θ = Flux.params( model )

for epoch  1:epochs
    print( epoch, " ")
     = gradient(θ) do
        loss( XTrain, YTrain )
    end
    Flux.update!(opt, θ, )
end

# Сохраним полученную нейросеть в файл
using JLD2
if !isfile( "rnn.jld2" ) jldsave("rnn.jld2"; rnn); end

For comparison, let us train an LSTM neural network (careful, the training time up to the 100th epoch may exceed 15 minutes). Let us also add that the number of neurons in the hidden layers, of course, affects the training time, but not as much as the number of epochs.

In [ ]:
model = lstm

opt = Flux.Optimiser(ClipValue(1), Adam(0.01))
θ = Flux.params( model )

for epoch  1:epochs
    print( epoch, " ")
     = gradient(θ) do
        loss( XTrain, YTrain )
    end
    Flux.update!(opt, θ, )
end

# Сохраняем нейросеть
if !isfile( "lstm.jld2" ) jldsave("lstm.jld2"; lstm); end

LSTM neural networks can "learn" longer-term dependencies than RNNs. In conventional recurrent neural networks, the influence of recent measurements on the outcome is always higher than the influence of earlier measurements. That is, RNNs tend to "forget the distant past ".

But, because of the larger number of equations in each neuron, the LSTM neural network learned 2-3 times slower than RNNs.

Checking the results

Neural networks are pretty good at predicting training data:

In [ ]:
test_id = 6

xi = XTrain[test_id]
yi = YTrain[test_id]

# Прогнозы до обучения
y_pred_rnn_empty = [ rnn_empty(xi[i,:])[end] for i in 1:size(xi,1) ];
y_pred_lstm_empty = [ lstm_empty(xi[i,:])[end] for i in 1:size(xi,1) ];
# Прогнозы после обучения
y_pred_rnn = [ rnn(xi[i,:])[end] for i in 1:size(xi,1) ];
y_pred_lstm = [ lstm(xi[i,:])[end] for i in 1:size(xi,1) ];

plot(
    plot( [y_pred_rnn_empty y_pred_lstm_empty], label=["rnn" "lstm"], title="Прогнозы до обучения", lw=2 ),
    plot( [y_pred_rnn y_pred_lstm yi], label=["rnn" "lstm" "истина"], title="Прогнозы обученной нейросети", lw=2 ),
    size=(800,300), titlefont=font(11)
)
Out[0]:

As we have seen, before training, the neural network always produced predictions close to zero. After training, it learnt to associate a vector of 16 dimensions with a scalar prediction of the remaining resource. We can assume that the model takes into account the prehistory of the process, because the forecast always has a decreasing trend, despite the rather chaotic behaviour of the signal.

Let us study the forecasts for the training sample.

In [ ]:
function get_plot(id)
    
    xi = XTrain[id]
    yi = YTrain[id]
    
    Flux.reset!( rnn )  # Сбросим состояние нейросети, чтобы накопленная информация
    Flux.reset!( lstm ) # от прошлых прогонов не повлияла на будущие прогнозы
    
    y_pred_rnn = [ rnn(xi[i,:])[end] for i in 1:size(xi,1) ];
    y_pred_lstm = [ lstm(xi[i,:])[end] for i in 1:size(xi,1) ];
    
    return plot( [y_pred_rnn y_pred_lstm yi], legend=false, lw=2, title="Прогнозы для выборки #$id" )
end

plot( get_plot.(1:16)..., titlefont=font(7), size=(800,600) )
Out[0]:
Out[0]:

It can be seen that on the LSTM test data the neural network has learnt to detect the beginning of the resource well and track the dynamics to the end. But the non-linearity of the learnt feature is more often manifested in the results of RNN neural network training (hump in the middle of the function).

Let's prepare test data:

In [ ]:
dataTest = Float32.( readdlm( "CMAPSSData/test_FD001.txt" ));
XTest, YTest = dataPreparation( dataTest );
XTest = [ xtest[:, .!constant_features] for xtest in XTest  ];
mu = vec(mean( vcat([xtest for xtest in XTest]...), dims=1 ));
sig = vec(std( vcat([xtest for xtest in XTest]...), dims=1 ));
XTest = [ mapslices(row -> (row .- mu) ./ sig, xtest; dims=2) for xtest in XTest ];

# Дополнение: мы знаем остаточный ресурс двигателей в тестовой выборке
YTest = Float32.( readdlm( "CMAPSSData/RUL_FD001.txt" ));
YTest = [ collect((size(XTest[i],1)+(YTest[i])-1):-1:YTest[i]) for i in 1:length(XTest) ];
YTest = [ min.(YTest[i], 150) for i in 1:size(YTest,1) ];

Let's see how the models predict MTBF for one series of measurements from the total sample (on the test sample).

In [ ]:
function get_plot( id )
    
    xi = XTest[id]
    yi = YTest[id]
    
    Flux.reset!( rnn )  # Сбросим состояние нейросети, чтобы накопленная информация
    Flux.reset!( lstm ) # от прошлых прогонов не повлияла на будущие прогнозы
    
    y_pred_rnn = [ rnn(xi[i,:])[end] for i in 1:size(xi,1) ];
    y_pred_lstm = [ lstm(xi[i,:])[end] for i in 1:size(xi,1) ];
    
    return plot( [y_pred_rnn y_pred_lstm yi], legend=false, lw=2, title="Прогнозы для выборки #$id", c=[4 6 3] )
end

plot( get_plot.(1:16)..., titlefont=font(7), size=(800,600), legendfont=font(8) )
Out[0]:
Out[0]:

The neural networks perform slightly worse on test data, but this information allows us to choose a further direction for refining the architecture and training process. As one would expect, the prediction quality of both neural networks drops when we switch to test data (2.1 times for LSTM and 2.7 times for RNN).

Finally, let us study the histogram of resource prediction errors at the end of each of the studied life cycle segments:

In [ ]:
dataset_final_rul = [ ytest[end] for ytest in YTest ]
rnn_final_prediction = [ [(Flux.reset!( rnn ); rnn(xi[i,:])[end]) for i in 1:size(xi,1)][end] for xi in XTest]
lstm_final_prediction = [ [(Flux.reset!( lstm ); lstm(xi[i,:])[end]) for i in 1:size(xi,1)][end] for xi in XTest]

plot(
    histogram( dataset_final_rul .- rnn_final_prediction, leg=:none, title="Ошибка прогноза RNN" ),
    histogram( dataset_final_rul .- lstm_final_prediction, leg=:none, title="Ошибка прогноза LSTM" ),
    size=(900,300), titlefont=font(9)
)
Out[0]:
Out[0]:

It can be argued that the average error of the LSTM network is closer to 0 than that of the RNN.

We saved the neural networks to files for further use.

To load them into memory, you will first need to load the library Flux.

Conclusion

We have trained a neural network to predict MTBF and can now build an indirect measurement sensor to practice it in an Engee model environment, and generate code for an embedded platform and perform real-world testing.

Note also that for training purposes, to speed up the learning process, we implemented a very small neural network and trained it for a fairly small number of cycles. Training both models took about 10 minutes each in 4 threads on two processors (without GPU).