Engee documentation
Notebook

Trend and harmonic components (Fourier decomposition)

Time series analysis is an important tool for studying various processes, whether it's environmental measurements, technical characteristics of equipment, or vital signs of a biological organism.

One of the key stages of time series processing is to identify patterns, trends, and cyclical components hidden in the measured data. These components make it possible to better understand the dynamics of the phenomena under study and make informed predictions.

This work is devoted to the study of the time dependence of the concentration of carbon monoxide (CO) contained in the exhaust of a car by spectral analysis. To solve this problem, a special algorithm has been developed that includes several sequential steps. Let's list them.

  1. Data Parsing: Reading raw data from a file and converting it into a convenient format for further processing.

  2. Adding a timeline: Calculating the second mark of each observation relative to the beginning of measurements.

  3. Trend identification: identification of a long-term trend in the CO level by averaging values in a sliding window and then approximating with a third-degree polynomial.

  4. Spectral analysis: application of fast Fourier transform (FFT) to identify significant oscillation frequencies.

  5. Signal reconstruction: restoration of the main harmonic components that form the structure of the process after removing the trend.

  6. Final visualization: comparison of the reconstructed signal with the initial data and evaluation of the quality of the proposed method.

In this paper, we will demonstrate the possibilities of an analytical approach to identify the characteristic features of time series using the example of real experimental data.

Description of the algorithm implementation

The main stages of the algorithm

1. Data preparation

The function is used to prepare the data parse_line, which allows splitting a string into separate numeric values and converting them to the appropriate data types. After parsing, the data is loaded into a table like DataFrame.

Then a timestamp is added in seconds format, calculated relative to the first measurement.

2. Trend identification

There are two ways to determine the trend:

  • the average value in a sliding window of a fixed size,
    • approximation by a third degree polynomial.

The first method is convenient in its simplicity and allows you to visually assess the overall trend, while the second allows for an accurate parametric model.

3. Spectral analysis

After the trend is eliminated, the fast Fourier transform (FFT) is applied. Thus, it is possible to identify the dominant frequencies characteristic of this time series. During the analysis of the obtained spectrum, the three largest components (harmonics) are distinguished.

4. Signal reconstruction

Based on the selected harmonics, an approximate signal is restored, which is superimposed on the original curve, and the trend is used to evaluate the accuracy of the restoration.

In [ ]:
# Функция для парсинга одной строки
function parse_line(line)
    # Заменяем запятые на точки для корректного парсинга чисел
    line = replace(line, "," => ".")
    # Разделяем по пробелам/табам и фильтруем пустые элементы
    elements = filter(!isempty, split(line, r"\s+"))
    # Преобразуем элементы в нужные типы
    co = parse(Float64, elements[1])
    ch = parse(Float64, elements[2])
    co2 = parse(Float64, elements[3])
    o2 = parse(Float64, elements[4])
    lamb = parse(Float64, elements[5])
    n = parse(Int, elements[6])
    nox = parse(Int, elements[7])
    tmas = parse(Float64, elements[8])
    time = Time(elements[9])
    return (CO=co, CH=ch, CO2=co2, O2=o2, lamb=lamb, n=n, NOx=nox, Tmas=tmas, Time=time)
end
Out[0]:
parse_line (generic function with 1 method)
In [ ]:
using DataFrames
# Чтение данных
data_lines = readlines("$(@__DIR__)/data.txt")[2:end]  # Пропускаем заголовок
parsed_data = [parse_line(line) for line in data_lines] # Парсим все строки
df = DataFrame(parsed_data) # Преобразуем в DataFrame
println(first(df, 5)) # Выводим первые 5 строк для проверки
println()
using Dates, FFTW, Statistics
# Добавим колонку с временем в секундах
start_time = df.Time[1]
df[!, :Seconds] = [Dates.value(t - start_time) / 1e9 for t in df.Time]  # переводим наносекунды в секунды
println(first(df, 5)) # Выводим первые 5 строк для проверки
5×9 DataFrame
 Row │ CO       CH       CO2      O2       lamb     n      NOx    Tmas     Time
     │ Float64  Float64  Float64  Float64  Float64  Int64  Int64  Float64  Time
─────┼──────────────────────────────────────────────────────────────────────────────
   1 │    0.64    113.0    13.83     0.75    1.012    820     13      0.0  00:00:00
   2 │    0.64    112.0    13.83     0.75    1.012    770    185      0.0  15:56:25
   3 │    0.64    112.0    13.83     0.75    1.012    790      0      0.0  15:56:26
   4 │    0.64    112.0    13.83     0.75    1.012    800      0      0.0  15:56:27
   5 │    0.64    112.0    13.84     0.74    1.012    990      0      0.0  15:56:28

5×10 DataFrame
 Row │ CO       CH       CO2      O2       lamb     n      NOx    Tmas     Time      Seconds
     │ Float64  Float64  Float64  Float64  Float64  Int64  Int64  Float64  Time      Float64
─────┼───────────────────────────────────────────────────────────────────────────────────────
   1 │    0.64    113.0    13.83     0.75    1.012    820     13      0.0  00:00:00      0.0
   2 │    0.64    112.0    13.83     0.75    1.012    770    185      0.0  15:56:25  57385.0
   3 │    0.64    112.0    13.83     0.75    1.012    790      0      0.0  15:56:26  57386.0
   4 │    0.64    112.0    13.83     0.75    1.012    800      0      0.0  15:56:27  57387.0
   5 │    0.64    112.0    13.84     0.74    1.012    990      0      0.0  15:56:28  57388.0

Next, we proceed to selecting a variable for analysis. Let's take the variable CO (you can replace it with any other one):

In [ ]:
i = 1 # Номер столбца для анализа
y = df[:,i]
column_names = names(df) # Получаем имена столбцов
println("Имя столбца для анализа: $(column_names[i])")
t = df.Seconds.-(15*60*60+56*60); t[1] = 0;
println("t: $(t[1:5])") # Выводим первые 5 строк для проверки
Имя столбца для анализа: CO
t: [0.0, 25.0, 26.0, 27.0, 28.0]

Next, we will build the trend component. We use a moving average to highlight the trend.:

In [ ]:
window_size = 15  # размер окна для сглаживания
trend = [mean(y[max(1, i-window_size):min(end, i+window_size)]) for i in 1:length(y)]

# График исходных данных и тренда
plot(t, y, label="Исходные данные", xlabel="Время (с)", ylabel="CO", legend=:topleft)
plot!(t, trend, label="Тренд", linewidth=2)
Out[0]:

For a more accurate trend, you can use a polynomial approximation instead of a moving average.:

In [ ]:
using Polynomials
trend_coef = fit(t, y, 3)  # полином третьей степени
trend = trend_coef.(t)

# График исходных данных и тренда
plot(t, y, label="Исходные данные", xlabel="Время (с)", ylabel="CO", legend=:topleft)
plot!(t, trend, label="Тренд", linewidth=2)
Out[0]:

Let's perform the Fourier decomposition for the harmonic component.

In [ ]:
# Удаляем тренд для анализа периодических составляющих
detrended = y .- trend

# Вычисляем БПФ
n = length(detrended)
freq = fftfreq(n, 1/mean(diff(t)))  # частоты в Гц
fft_vals = fft(detrended)

# Амплитуды и фазы
amplitude = abs.(fft_vals) ./ n * 2
phase = angle.(fft_vals)

# Оставляем только положительные частоты (симметрия БПФ)
pos_freq = freq[1:div(n,2)+1]
pos_ampl = amplitude[1:div(n,2)+1]

# График амплитудного спектра
plot(pos_freq, pos_ampl, xlabel="Частота (Гц)", ylabel="Амплитуда", 
     title="Амплитудный спектр", legend=false)
Out[0]:

Let's restore the main harmonics. Let's select a few of the most significant frequencies and restore their contribution.:

In [ ]:
# Находим 3 наиболее значимые частоты
top_n = 3
sorted_idx = sortperm(pos_ampl, rev=true)
top_freq = pos_freq[sorted_idx[1:top_n]]
top_ampl = pos_ampl[sorted_idx[1:top_n]]
top_phase = phase[sorted_idx[1:top_n]]

# Восстанавливаем гармоники
harmonics = zeros(n)
for (f, a, ϕ) in zip(top_freq, top_ampl, top_phase)
    harmonics .+= a .* cos.(2π * f * t .+ ϕ)
end

# Итоговый график
plot(t, detrended, label="Детрендированные данные", xlabel="Время (с)", ylabel="CO")
plot!(t, harmonics, label="Основные гармоники", linewidth=2)
Out[0]:

Complete decomposition: trend and harmonics.

In [ ]:
plot(t, y, label="Исходные данные", xlabel="Время (с)", ylabel="CO")
plot!(t, trend, label="Тренд", linewidth=2)
plot!(t, trend .+ harmonics, label="Тренд + гармоники", linewidth=2, linestyle=:dash)
Out[0]:

Conclusion

The analysis revealed the following important aspects of the behavior of the time series under study.

  • Long-term fluctuations in CO levels are highlighted, represented by a clear trend determined by vehicle movement and external environmental factors.
  • The structure of periodic changes in CO concentration is determined, expressed through the allocation of three main oscillation frequencies.
  • The quality of the reconstruction of the original signal was evaluated, which showed a good correlation between the detrended data and the fundamental harmonics.

Thus, the developed approach proved to be effective for analyzing signals such as the time dependence of concentrations of substances, allowing researchers to extract useful information even from noisy data from a real experiment.

The proposed algorithm can be easily adapted to analyze similar time series in other fields of research, where it is important to divide data into trend and high-frequency fluctuations.