Engee documentation
Notebook

Trend and harmonic components (Fourier decomposition)

Time series analysis is an important tool in the study of various processes, be it environmental measurements, technical characteristics of equipment or vital signs of a biological organism.

One of the key stages of time series processing is to identify patterns, trends and cyclical components hidden in the measured data. These components make it possible to better understand the dynamics of the phenomena under study and make informed forecasts.

The present work is devoted to the study of time dependencies of carbon monoxide (CO) concentration contained in car exhaust by spectral analysis method. To solve the task, a special algorithm including several consecutive steps has been developed. Let us list them.

  1. Data Parsing: reading raw data from a file and converting it into a convenient format for further processing.

  2. Addition of time scale: calculation of the second mark of each observation relative to the beginning of measurements.

  3. Trend extraction: identification of the long-term trend in CO level by averaging the values in a sliding window and then approximating by a third degree polynomial.

  4. Spectral analysis: application of Fast Fourier Transform (FFT) to identify significant frequencies of fluctuations.

  5. Signal reconstruction: reconstruction of the main harmonic components that form the structure of the process after removing the trend.

  6. Final visualisation: comparison of the reconstructed signal with the original data and evaluation of the quality of the proposed method.

In this paper, we will demonstrate the possibilities of the analytical approach to identify the characteristic features of time series on the example of real experimental data.

Description of the algorithm implementation

Main stages of the algorithm

1. Data preparation

To prepare the data, the function parse_line, which allows to split the string into separate numeric values and bring them to the appropriate data types, is used. After parsing, the data are loaded into a table of type DataFrame.

Then a timestamp in seconds format is added, calculated relative to the first measurement.

2. trend detection

Two methods of trend detection are used:

  • average value in a fixed-size sliding window,
  • approximation by a third degree polynomial.

The first method is convenient due to its simplicity and allows visual estimation of the general trend, while the second method allows accurate parametric modelling.

3. Spectral analysis

After trend elimination, the Fast Fourier Transform (FFT) is applied. Thus, it is possible to identify dominant frequencies characteristic for a given time series. During the analysis of the obtained spectrum, the three largest components (harmonics) are identified.

4. Signal reconstruction

Based on the selected harmonics, an approximate signal is reconstructed, which is superimposed on the original curve, and the trend to assess the accuracy of reconstruction.

In [ ]:
# Функция для парсинга одной строки
function parse_line(line)
    # Заменяем запятые на точки для корректного парсинга чисел
    line = replace(line, "," => ".")
    # Разделяем по пробелам/табам и фильтруем пустые элементы
    elements = filter(!isempty, split(line, r"\s+"))
    # Преобразуем элементы в нужные типы
    co = parse(Float64, elements[1])
    ch = parse(Float64, elements[2])
    co2 = parse(Float64, elements[3])
    o2 = parse(Float64, elements[4])
    lamb = parse(Float64, elements[5])
    n = parse(Int, elements[6])
    nox = parse(Int, elements[7])
    tmas = parse(Float64, elements[8])
    time = Time(elements[9])
    return (CO=co, CH=ch, CO2=co2, O2=o2, lamb=lamb, n=n, NOx=nox, Tmas=tmas, Time=time)
end
Out[0]:
parse_line (generic function with 1 method)
In [ ]:
using DataFrames
# Чтение данных
data_lines = readlines("$(@__DIR__)/data.txt")[2:end]  # Пропускаем заголовок
parsed_data = [parse_line(line) for line in data_lines] # Парсим все строки
df = DataFrame(parsed_data) # Преобразуем в DataFrame
println(first(df, 5)) # Выводим первые 5 строк для проверки
println()
using Dates, FFTW, Statistics
# Добавим колонку с временем в секундах
start_time = df.Time[1]
df[!, :Seconds] = [Dates.value(t - start_time) / 1e9 for t in df.Time]  # переводим наносекунды в секунды
println(first(df, 5)) # Выводим первые 5 строк для проверки
5×9 DataFrame
 Row │ CO       CH       CO2      O2       lamb     n      NOx    Tmas     Time
     │ Float64  Float64  Float64  Float64  Float64  Int64  Int64  Float64  Time
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │    0.64    113.0    13.83     0.75    1.012    820     13      0.0  00:00:00
   2 │    0.64    112.0    13.83     0.75    1.012    770    185      0.0  15:56:25
   3 │    0.64    112.0    13.83     0.75    1.012    790      0      0.0  15:56:26
   4 │    0.64    112.0    13.83     0.75    1.012    800      0      0.0  15:56:27
   5 │    0.64    112.0    13.84     0.74    1.012    990      0      0.0  15:56:28

5×10 DataFrame
 Row │ CO       CH       CO2      O2       lamb     n      NOx    Tmas     Time      Seconds
     │ Float64  Float64  Float64  Float64  Float64  Int64  Int64  Float64  Time      Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────
   1 │    0.64    113.0    13.83     0.75    1.012    820     13      0.0  00:00:00      0.0
   2 │    0.64    112.0    13.83     0.75    1.012    770    185      0.0  15:56:25  57385.0
   3 │    0.64    112.0    13.83     0.75    1.012    790      0      0.0  15:56:26  57386.0
   4 │    0.64    112.0    13.83     0.75    1.012    800      0      0.0  15:56:27  57387.0
   5 │    0.64    112.0    13.84     0.74    1.012    990      0      0.0  15:56:28  57388.0

Next, we move on to the selection of the variable to be analysed. Let's take the variable CO (it can be replaced by any other variable):

In [ ]:
i = 1 # Номер столбца для анализа
y = df[:,i]
column_names = names(df) # Получаем имена столбцов
println("Имя столбца для анализа: $(column_names[i])")
t = df.Seconds.-(15*60*60+56*60); t[1] = 0;
println("t: $(t[1:5])") # Выводим первые 5 строк для проверки
Имя столбца для анализа: CO
t: [0.0, 25.0, 26.0, 27.0, 28.0]

Next, we plot the trend component. We use a moving average to highlight the trend:

In [ ]:
window_size = 15  # размер окна для сглаживания
trend = [mean(y[max(1, i-window_size):min(end, i+window_size)]) for i in 1:length(y)]

# График исходных данных и тренда
plot(t, y, label="Исходные данные", xlabel="Время (с)", ylabel="CO", legend=:topleft)
plot!(t, trend, label="Тренд", linewidth=2)
Out[0]:

For a more accurate trend, we can use a polynomial approximation instead of a moving average:

In [ ]:
using Polynomials
trend_coef = fit(t, y, 3)  # полином третьей степени
trend = trend_coef.(t)

# График исходных данных и тренда
plot(t, y, label="Исходные данные", xlabel="Время (с)", ylabel="CO", legend=:topleft)
plot!(t, trend, label="Тренд", linewidth=2)
Out[0]:

Let us perform a Fourier decomposition for the harmonic component.

In [ ]:
# Удаляем тренд для анализа периодических составляющих
detrended = y .- trend

# Вычисляем БПФ
n = length(detrended)
freq = fftfreq(n, 1/mean(diff(t)))  # частоты в Гц
fft_vals = fft(detrended)

# Амплитуды и фазы
amplitude = abs.(fft_vals) ./ n * 2
phase = angle.(fft_vals)

# Оставляем только положительные частоты (симметрия БПФ)
pos_freq = freq[1:div(n,2)+1]
pos_ampl = amplitude[1:div(n,2)+1]

# График амплитудного спектра
plot(pos_freq, pos_ampl, xlabel="Частота (Гц)", ylabel="Амплитуда", 
     title="Амплитудный спектр", legend=false)
Out[0]:

Let us reconstruct the main harmonics. Let's choose some most significant frequencies and recover their contribution:

In [ ]:
# Находим 3 наиболее значимые частоты
top_n = 3
sorted_idx = sortperm(pos_ampl, rev=true)
top_freq = pos_freq[sorted_idx[1:top_n]]
top_ampl = pos_ampl[sorted_idx[1:top_n]]
top_phase = phase[sorted_idx[1:top_n]]

# Восстанавливаем гармоники
harmonics = zeros(n)
for (f, a, ϕ) in zip(top_freq, top_ampl, top_phase)
    harmonics .+= a .* cos.(2π * f * t .+ ϕ)
end

# Итоговый график
plot(t, detrended, label="Детрендированные данные", xlabel="Время (с)", ylabel="CO")
plot!(t, harmonics, label="Основные гармоники", linewidth=2)
Out[0]:

Full decomposition: trend and harmonics.

In [ ]:
plot(t, y, label="Исходные данные", xlabel="Время (с)", ylabel="CO")
plot!(t, trend, label="Тренд", linewidth=2)
plot!(t, trend .+ harmonics, label="Тренд + гармоники", linewidth=2, linestyle=:dash)
Out[0]:

Conclusion

In the course of the analysis it was possible to identify the following important aspects of the behaviour of the time series under study.

  • Long-term fluctuations of CO level, represented by a clear trend determined by vehicle movement and external environmental factors, have been identified.
  • The structure of periodic changes in CO concentration, expressed through the allocation of three main frequencies of oscillations, was determined.
  • The quality of the reconstruction of the original signal was evaluated, which showed a good correlation between the detrended data and the main harmonics.

Thus, the developed approach proved to be effective for analysing signals such as temporal dependence of substance concentrations, allowing researchers to extract useful information even from noisy data of a real experiment.

The proposed algorithm can be easily adapted to analyse similar time series in other areas of research, where it is important to separate data into trend and high-frequency fluctuations.