Identification of model parameters in Engee¶

Example description¶

Let's consider the implementation of the task of parametric identification of a model based on data. We have to identify several time-invariant parameters of some model, the structure of which we know in advance.

In this example we will analyse the signal generated by the dynamic model Engee. It is located in the folder start/examples/data_analysis/parametric_identification.

The model consists of two subsystems: one generates the analysed signal, the other is a reference model (without noise).

In the subsystem ModelSystem, five signals are combined by means of summing:

constant component: the number 5,
time component: a line with a slope of 0.01,
first harmonic: a sinusoid with amplitude 1 and angular frequency $2\pi \cdot 0.02$ rad/s,
less prominent second harmonic: sinusoid with amplitude 0.6 and angular frequency $2\pi \cdot 0.008$ rad/s,
white noise with zero offset and dispersion 0.3.

In the subsystem ModelBlock we will be able to substitute the identified parameters. As long as reference parameters are set in it and it generates a reference signal. In it the model is described as a code:

function (c::ModelBlock)(t::Real)
    T0 = 5
    T1 = 0.01
    A1, f1, phi1 = 1, 0.02, 0
    A2, f2, phi2 = 0.6, 0.008, 0
    
    signal = T0 + T1 * t +
             A1 * sin( 2*pi * f1 * t + phi1 ) .+
             A2 * sin( 2*pi * f2 * t + phi2 );
    return signal
end

Let's save the reference values of the parameters:

Pkg.add(["CSV", "HypothesisTests"])

# "Идеальные" параметры линейной части сигнала (смещение, наклон)
T0m = 5;
T1m = 0.01;
# "Идеальные" параметры первой синусоиды (амплитуда, частота, фаза)
A1m = 1;
f1m = 0.02;
phi1m = 0;
# "Идеальные" параметры второй синусоиды (амплитуда, частота, фаза)
A2m = 0.6;
f2m = 0.008;
phi2m = 0;

As a result of running the model, we get 500 seconds of signal. Here is a graph of the last 200 seconds:

The block ModelBlock is now initialised with exactly the same parameters as the dynamic model in the subsystem ModelSystem, so the graph of from_code (dark blue) completely replicates the systematic part of the signal from_model (orange).

What data the model stores¶

Both signals are saved in a CSV table param_id_data.csv with the following columns:

time - time column,
m - column of data generated by the dynamic model (reference parameters and error),
c - column of data generated by the reference model (executed as a code).

The order of actions in this example¶

We have the following steps to perform:

start the dynamic model to synthesise the model data
find the parameters of all the model links in the correct order (assuming that the signal allows us to guess the topology of the model, or that we already know it).
check that after subtracting systematic components from the input signal, the result is normally distributed noise.
fit the parameters into the Engee Function block code and check that the models match each other.

Synthesis of model data¶

Run the following cell to synthesise the data we are going to analyse:

modelName = "param_identification_model"
#engee.close( modelName, force=true ) # Для повторного запуска модели
model = modelName in [m.name for m in engee.get_all_models()] ? engee.open( modelName ) : engee.load( "$(@__DIR__)/$modelName.engee");

results = engee.run( modelName, verbose=false )

Dict{String, DataFrames.DataFrame} with 2 entries:
  "from_code"  => 5001×2 DataFrame…
  "from_model" => 5001×2 DataFrame…

Loading libraries¶

Connection of libraries for data loading, working with tables, analysing spectra and library for polynomial regression.

using CSV, DataFrames, FFTW, Polynomials

Downloading and connecting the library for hypothesis testing (we use normality testing):

#import Pkg; Pkg.add( "HypothesisTests", io=devnull );
using HypothesisTests

Let's load the data¶

From here on we will only work with the signal generated from the dynamic model, but you can see that both methods are interchangeable, and if necessary, this example can be accommodated in a single script.

We will visualise the model signal ModelSystem, which consists of a vector of time and calculated values:

t = results["from_model"].time;
signal = results["from_model"].value;

We will also keep the sampling rate of the time signal (a value inverse to the distance in time between two neighbouring measurements):

sample_rate = 1/(t[2] - t[1])

10.0

What we will identify¶

The reference model ModelBlock is initialised with reference parameter values (the same as in the model ModelSystem).

The model is defined by 8 parameters:

linear trend parameters: $T_0 + t \cdot T_1$
amplitude, frequency and phase of the first sinusoid: $A_1 sin (2 \pi f_1 t + \varphi_1)$
amplitude, frequency and phase of the second sinusoid: $A_2 sin (2 \pi f_2 t + \varphi_2)$

We will not be interested in noise parameters, although they are easily identified by the functions mean and var.

Let's identify the linear component¶

The first thing we will do is to identify and eliminate the linear trend from the signal.

trend = Polynomials.fit( t, signal, 1 ) # Последний аргумент: степень полинома

Let's keep the polynomial parameters as separate variables

T0, T1 = trend[0], trend[1]

(5.267818747854857, 0.00894393330796865)

Let's display the found trend on the graph and show how our data looks like minus this trend:

plot(
    plot( t, [signal trend.(t)], label=["Данные" "Тренд"] ),
    plot(t, signal - trend.(t), lc=:darkblue, label="Без тренда"),
    layout=(2,1)
)

Great, the signal has started to look like a combination of sinusoids and noise.

Identify the first sinusoid¶

To determine the parameters of the sinusoids included in the signal, we plot the spectrum of the signal using the FFTW library.

Its discretisation will be proportional to the number of points in the original signal. That is, if we leave the original discretisation, we can get low accuracy of the spectrum. To get rid of the high-frequency region, we reduce the number of points in the original signal by lowering the sampling frequency of the original signal. To do this, we set the variable downsample_factor. The value 40 means that we will take every 40th point from the original signal.

downsample_factor = 40;

# Дальше мы будем работать с сигналом пониженной дискретизации (signal_downsampled)
idx_downsample = 1:downsample_factor:length(t)
t_downsampled = t[idx_downsample]

# Дискретизация сигнала, из которого мы вычли линейный тренд
signal_downsampled = (signal - trend.(t))[idx_downsample];

Since we are only interested in the positive part of the spectrum, we use the function rfft (we also don't have to do fftshift):

# Найдем спектр сигнала
F1 = rfft(signal_downsampled)

# Рассчитаем вектор частот найденного спектра
freqs = rfftfreq(length(t_downsampled), sample_rate/downsample_factor)

# Выведем график
plot( freqs, abs.(F1) )

Let's find the maximum on this spectrum and determine which frequency this value corresponds to.

_,idx = findmax( abs.(F1) )
print( "Частота наиболее заметной компоненты: ", freqs[idx] )

Частота наиболее заметной компоненты: 0.01984126984126984

When we have found the maximum point on the spectrum, we can identify the amplitude frequency and phase of the most prominent harmonic component. Now we need to do a recombination - generate a harmonic signal from a complex number that describes its parameters.

# Находим параметры синусоиды
A1 = 2 * abs( F1[idx] ) ./ length(t_downsampled )
# Умножение на 2 происходит из-за того, что мощность сигнала в спектре
# делится между двумя пиками: положительным и отрицательным

f1 = abs( freqs[idx] ) # Частота синусоиды
phi1 = angle( F1[idx] ) + pi/2 # Фаза синусоиды

print("A1 = ", A1, "\nf1 = ", f1, "\nphi1 = ", phi1)

A1 = 0.9630993921284998
f1 = 0.01984126984126984
phi1 = 0.3848888013993568

Let's define the first sinusoid as a function of time and plot it:

# Функция, задающая синусоиду с найденными параметрами
sinusoid1(tt) = A1 * sin.( 2*pi*f1*tt .+ phi1 );

# Выводим графики 
plot(
    plot( t, [signal .- trend.(t), sinusoid1.(t)], label=["Данные без лин.тренда" "Первая синусоида"]  ), 
    plot( t, signal .- trend.(t) .- sinusoid1(t), lc=:darkblue, label="Остаточный сигнал" ),
    layout=(2,1)
)

OK, now the signal is a combination of a sine wave and noise.

Let's identify the second sine wave¶

Repeat the same operations as for finding the first sinusoid:

# Сигнал пониженной дискретизации (без тренда и первой синусоиды)
signal_downsampled2 = (signal .- trend.(t) .- sinusoid1(t))[idx_downsample]

# Находим спектр сигнала после вычета первых двух компонент
F2 = rfft(signal_downsampled2)

# Рассчитаем вектор частот найденного спектра
freqs = rfftfreq(length(t_downsampled), sample_rate/downsample_factor)

# Выведем график
plot( freqs, abs.(F2) )

Let's find the parameters of the second sinusoid

# Находим максимум на спектре
_,idx2 = findmax( abs.(F2) )

# Определяем параметры синусоиды, соответствующие этому максимуму
A2 = 2 * abs( F2[idx2] ) ./ length( t_downsampled ) # Амплитуда
f2 = abs( freqs[idx2] ) # Частота
phi2 = angle( F2[idx2] ) + pi/2 # Фаза

print( "A2 = ", A2, "\nf2 = ", f2, "\nphi2 = ", phi2 )

A2 = 0.6193504244923119
f2 = 0.007936507936507936
phi2 = 0.13537312956162761

Let's define the second sinusoid as a function of time and plot it:

# Функция, задающая синусоиду с найденными параметрами
sinusoid2(tt) = A2 * sin.( 2*pi*f2*tt .+ phi2 );

# Выводим графики
plot(
    plot(t, [signal.-trend.(t).-sinusoid1.(t) sinusoid2.(t)], 
        label = ["Сигнал минус две компоненты" "Вторая синусоида"]),
    plot(t, signal.-trend.(t).-sinusoid1(t).-sinusoid2(t), lc=:darkblue,
        label = "Остаточный сигнал", legend=:outertopright),
    layout=(2,1)
)

How do we know if there are significant systematic components left in this signal? One way is to imagine that this signal is a realisation of a random variable and test it for normality of distribution.

Analysing the residual signal¶

Let's test the hypothesis of normal distribution of the residual signal component using Harker-Bera test (JB-test):

h = JarqueBeraTest( signal .- trend.(t) .- sinusoid1(t) .- sinusoid2(t) )

Jarque-Bera normality test
--------------------------
Population details:
    parameter of interest:   skewness and kurtosis
    value under h_0:         "0 and 3"
    point estimate:          "-0.010841427456395828 and 2.903158191134578"

Test summary:
    outcome with 95% confidence: fail to reject h_0
    one-sided p-value:           0.3584

Details:
    number of observations:         5001
    JB statistic:                   2.05218

Note the result (outcome) of the test:

fail to reject h_0 means that there is no good reason to reject the hypothesis $H_0$: with 95% confidence we assert that the observed signal is generated by a random process with a normal distribution
reject h_0 means that the confidence that the observed signal originates from a normal distribution does not exceed 5%

Try to exclude some component from the formula - most likely such a signal will fail Harkke-Bera test and you will get the result reject h_0.

Analysing the results¶

The result obtained serves as a basis for completing the search for additional systematic components.

Let us derive the final report and the parameter estimation error.

df = DataFrame( 
    ["T0"    T0    T0m    string(round(100*abs(T0 - T0m)/T0m, digits=2))*"%";
     "T1"    T1    T1m    string(round(100*abs(T1 - T1m)/T1m, digits=2))*"%";
     "A1"    A1    A1m    string(round(100*abs(A1 - A1m)/A1m, digits=2))*"%";
     "f1"    f1    f1m    string(round(100*abs(f1 - f1m)/f1m, digits=2))*"%";
     "phi1"  phi1  phi1m  string(round(abs(phi1 - phi1m),digits=2))*" рад";
     "A2"    A2    A2m    string(round(100*abs(A2 - A2m)/A2m, digits=2))*"%";
     "f2"    f2    f2m    string(round(100*abs(f2 - f2m)/f2m, digits=2))*"%";
     "phi2"  phi2  phi2m  string(round(abs(phi2 - phi2m),digits=2))*" рад"
    ],
    ["Параметр", "Идентифицировано", "Модельное значение", "Погрешность"]
 )

Create a function that calculates the model data with identified parameters:

function ModelBlock(t)
    global T0, T1, A1, f1, phi1, A2, f2, phi2;

    signal = T0 .+ T1 .* t .+
          A1 .* sin.( 2*pi .* f1 .* t + phi1 ) .+
          A2 .* sin.( 2*pi .* f2 .* t + phi2);
    
    return signal
end;

Let's compare the result of running the identified model and the result of running ModelBlock:

plot( results["from_model"].time,
          [results["from_model"].value, results["from_code"].value, ModelBlock.(t) ],
      label=["Исходный сигнал" "Эталонная модель" "Идентифицированная модель"],
      linecolor=[:lightblue :red :green])

In this example we:

implemented a fairly simple parametric identification procedure,
determined the identification error of all parameters,
verified that the graph of the model with identified parameters is quite similar to the graph of the reference model.

	Параметр	Идентифицировано	Модельное значение	Погрешность
	Any	Any	Any	Any
1	T0	5.26782	5	5.36%
2	T1	0.00894393	0.01	10.56%
3	A1	0.963099	1	3.69%
4	f1	0.0198413	0.02	0.79%
5	phi1	0.384889	0	0.38 рад
6	A2	0.61935	0.6	3.23%
7	f2	0.00793651	0.008	0.79%
8	phi2	0.135373	0	0.14 рад