Trend and harmonic components (Fourier decomposition)
Time series analysis is an important tool for studying various processes, whether it's environmental measurements, technical characteristics of equipment, or vital signs of a biological organism.
One of the key stages of time series processing is to identify patterns, trends, and cyclical components hidden in the measured data. These components make it possible to better understand the dynamics of the phenomena under study and make informed predictions.
This work is devoted to the study of the time dependence of the concentration of carbon monoxide (CO) contained in the exhaust of a car by spectral analysis. To solve this problem, a special algorithm has been developed that includes several sequential steps. Let's list them.
-
Data Parsing: Reading raw data from a file and converting it into a convenient format for further processing.
-
Adding a timeline: Calculating the second mark of each observation relative to the beginning of measurements.
-
Trend identification: identification of a long-term trend in the CO level by averaging values in a sliding window and then approximating with a third-degree polynomial.
-
Spectral analysis: application of fast Fourier transform (FFT) to identify significant oscillation frequencies.
-
Signal reconstruction: restoration of the main harmonic components that form the structure of the process after removing the trend.
-
Final visualization: comparison of the reconstructed signal with the initial data and evaluation of the quality of the proposed method.
In this paper, we will demonstrate the possibilities of an analytical approach to identify the characteristic features of time series using the example of real experimental data.
Description of the algorithm implementation
The main stages of the algorithm
1. Data preparation
The function is used to prepare the data parse_line, which allows splitting a string into separate numeric values and converting them to the appropriate data types. After parsing, the data is loaded into a table like DataFrame.
Then a timestamp is added in seconds format, calculated relative to the first measurement.
2. Trend identification
There are two ways to determine the trend:
- the average value in a sliding window of a fixed size,
- approximation by a third degree polynomial.
The first method is convenient in its simplicity and allows you to visually assess the overall trend, while the second allows for an accurate parametric model.
3. Spectral analysis
After the trend is eliminated, the fast Fourier transform (FFT) is applied. Thus, it is possible to identify the dominant frequencies characteristic of this time series. During the analysis of the obtained spectrum, the three largest components (harmonics) are distinguished.
4. Signal reconstruction
Based on the selected harmonics, an approximate signal is reconstructed, which is superimposed on the original curve, and a trend is used to evaluate the accuracy of the restoration.
# A function for parsing a single string
function parse_line(line)
# We replace commas with dots for correct parsing of numbers.
line = replace(line, "," => ".")
# We divide by spaces/tabs and filter the empty elements.
elements = filter(!isempty, split(line, r"\s+"))
# Converting the elements to the desired types
co = parse(Float64, elements[1])
ch = parse(Float64, elements[2])
co2 = parse(Float64, elements[3])
o2 = parse(Float64, elements[4])
lamb = parse(Float64, elements[5])
n = parse(Int, elements[6])
nox = parse(Int, elements[7])
tmas = parse(Float64, elements[8])
time = Time(elements[9])
return (CO=co, CH=ch, CO2=co2, O2=o2, lamb=lamb, n=n, NOx=nox, Tmas=tmas, Time=time)
end
using DataFrames
# Reading data
data_lines = readlines("$(@__DIR__)/data.txt")[2:end] # Skipping the title
parsed_data = [parse_line(line) for line in data_lines] # Parsing all lines
df = DataFrame(parsed_data) # Converting it to a DataFrame
println(first(df, 5)) # We output the first 5 lines for verification
println()
using Dates, FFTW, Statistics
# Add a column with the time in seconds
start_time = df.Time[1]
df[!, :Seconds] = [Dates.value(t - start_time) / 1e9 for t in df.Time] # converting nanoseconds to seconds
println(first(df, 5)) # We output the first 5 lines for verification
Next, we proceed to selecting a variable for analysis. Let's take the variable CO (you can replace it with any other one):
i = 1 # The number of the column to analyze
y = df[:,i]
column_names = names(df) # Getting the column names
println("Column name to analyze: $(column_names[i])")
t = df.Seconds.-(15*60*60+56*60); t[1] = 0;
println("t: $(t[1:5])") # We output the first 5 lines for verification
Next, we will build the trend component. We use a moving average to highlight the trend.:
window_size = 15 # window size for smoothing
trend = [mean(y[max(1, i-window_size):min(end, i+window_size)]) for i in 1:length(y)]
# Graph of initial data and trend
plot(t, y, label="Initial data", xlabel="Time (s)", ylabel="CO", legend=:topleft)
plot!(t, trend, label="The trend", linewidth=2)
For a more accurate trend, you can use a polynomial approximation instead of a moving average.:
using Polynomials
trend_coef = fit(t, y, 3) # a polynomial of the third degree
trend = trend_coef.(t)
# Graph of initial data and trend
plot(t, y, label="Initial data", xlabel="Time (s)", ylabel="CO", legend=:topleft)
plot!(t, trend, label="The trend", linewidth=2)
Let's perform the Fourier decomposition for the harmonic component.
# Removing the trend to analyze the periodic components
detrended = y .- trend
# Calculating the FFT
n = length(detrended)
freq = fftfreq(n, 1/mean(diff(t))) # frequencies in Hz
fft_vals = fft(detrended)
# Amplitudes and phases
amplitude = abs.(fft_vals) ./ n * 2
phase = angle.(fft_vals)
# We leave only positive frequencies (FFT symmetry)
pos_freq = freq[1:div(n,2)+1]
pos_ampl = amplitude[1:div(n,2)+1]
# Graph of the amplitude spectrum
plot(pos_freq, pos_ampl, xlabel="Frequency (Hz)", ylabel="The amplitude",
title="The amplitude spectrum", legend=false)
Let's restore the main harmonics. Let's select a few of the most significant frequencies and restore their contribution.:
# We find the 3 most significant frequencies
top_n = 3
sorted_idx = sortperm(pos_ampl, rev=true)
top_freq = pos_freq[sorted_idx[1:top_n]]
top_ampl = pos_ampl[sorted_idx[1:top_n]]
top_phase = phase[sorted_idx[1:top_n]]
# Restoring harmonics
harmonics = zeros(n)
for (f, a, ϕ) in zip(top_freq, top_ampl, top_phase)
harmonics .+= a .* cos.(2π * f * t .+ ϕ)
end
# Final schedule
plot(t, detrended, label="Detrended data", xlabel="Time (s)", ylabel="CO")
plot!(t, harmonics, label="Basic harmonics", linewidth=2)
Complete decomposition: trend and harmonics.
plot(t, y, label="Initial data", xlabel="Time (s)", ylabel="CO")
plot!(t, trend, label="The trend", linewidth=2)
plot!(t, trend .+ harmonics, label="Trend + harmonics", linewidth=2, linestyle=:dash)
Conclusion
The analysis revealed the following important aspects of the behavior of the time series under study.
- Long-term fluctuations in CO levels are highlighted, represented by a clear trend determined by vehicle movement and external environmental factors.
- The structure of periodic changes in CO concentration is determined, expressed through the allocation of three main oscillation frequencies.
- The quality of the reconstruction of the original signal was evaluated, which showed a good correlation between the detrended data and the fundamental harmonics.
Thus, the developed approach proved to be effective for analyzing signals such as the time dependence of concentrations of substances, allowing researchers to extract useful information even from noisy data from a real experiment.
The proposed algorithm can be easily adapted to analyze similar time series in other fields of research, where it is important to separate data into trend and high-frequency fluctuations.