Filtering 1-D data
Data filtering is an important step before training a neural network. Without a clean signal, the model can learn from noise and outliers, which will greatly degrade the quality of predictions. After filtering, the data will become cleaner, and the neural network will be able to focus on the main thing — those patterns that are really important.
In this example, we will upload a mat file with an ideal signal and a noisy version, and then sequentially apply several popular smoothing and decomposition methods: moving average, median filter, Savitsky-Golay filter, Hampel filter, LOESS smoothing, TV denoise, singular spectral analysis and empirical decomposition (EMD).
Preparation for work
Needs: visualization, reading .mat, different processing and decomposition algorithms. To do this, import the necessary packages
include("$(@__DIR__)/InstallPackages.jl")
using Plots
using MAT
using RollingFunctions
using SavitzkyGolay
using HampelOutliers
using Loess
using TVDenoise
using SingularSpectrumAnalysis
using EMD
We read the file, take out the matrix result, from it — the standard Y and a noisy measured signal X. Let's compare them right away
# Чтение данных
path = "$(@__DIR__)/file.mat"
mat = matopen(path)     
data = read(mat)     
data = data["result"];      
Y = data["thetTgt"]
X = data["thetTgt_MFit"];
# Визуализация данных
plot(X', label="Шумные данные")
plot!(Y', label="Эталонные данные")
It can be seen that the noisy signal fluctuates very strongly around the reference signal. It would be good to smooth out RF noise fluctuations.
For the filtering algorithms to work correctly, it is necessary that there are no gaps in the data. Gaps at the edges of the signal are often found in the measured data. If they are present in the signal under study, just take a slice of the data.
nan_idxs = findall(isnan, X)
println("Количесвто пропусков в массиве Y = ", sum(findall(isnan, Y))[1],  " Количесвто пропусков в массиве X = ", sum(findall(isnan, X))[1])
ff = findfirst(x -> !isnan(x), X)
ll = findlast(x -> !isnan(x), X)
X = X[ff:ll]
Y = Y[ff:ll]
The moving average algorithm
The moving average algorithm takes a fixed-size window, runs it sequentially through the entire data series, and calculates the average value inside this window at each position.
w = 10
x = vec(X)
raw = rollmean(x, w)       
pad = fill(NaN, w-1)          
meanX = vcat(pad, raw);   
plot([Y' vec(X) meanX], label=["Исходный сигнал" "Шумовой сигнал" "Отфильтрованный сигнал"], ylabel="", xlabel="t", legend=:topleft)
It can be seen that the noise is becoming smoothed out. The big emissions, of course, haven't gone away. This method is the simplest.
The Savitsky–Goley Filter
The Savitsky–Goley filter is a smoothing method in which a least squares polynomial of a given degree is constructed for each position of a window of a fixed odd length and its coefficients are calculated. Then the value at the center point is replaced by the prediction of this polynomial.
y2_sg = savitzky_golay(vec(X), 101, 5);
plot([Y' vec(X) y2_sg.y], label=["Исходный сигнал" "Шумовой сигнал" "Отфильтрованный сигнал"], ylabel="", xlabel="t", legend=:topleft)
Hampel filter
The hampel filter is similar to the median filter. For each fixed-size window, it first finds the median and evaluates the spread through the median absolute deviation (MAD), and then checks each point: if its distance to the median is greater than the specified threshold, then put the median of the window instead, otherwise leave unchanged. This way the filter reliably eliminates emissions.
yHampel = Hampel.filter(vec(X), 30; boundary=:reflect, threshold=0);
plot([Y' vec(X) yHampel], label=["Исходный сигнал" "Шумовой сигнал" "Отфильтрованный сигналl"], ylabel="", xlabel="t", legend=:topleft)
LOESS-smoothing
At each step, LOESS smoothing builds a local regression for the neighborhood of a point, which predicts a new value at the central point.
N = length(vec(X))
t = collect(1.0:N)
model = loess(reshape(t, N, 1), vec(X); span=0.2, degree=1);
X_loess = predict(model, t);
plot([Y' vec(X) X_loess], label=["Исходный сигнал" "Шумовой сигнал" "Отфильтрованный сигналl"], ylabel="", xlabel="t", legend=:topleft)
Total Variation Denoising
Total Variation Denoising minimizes discontinuities and noise by solving the optimization problem of finding a new signal that is close to the original L2 norm, but with minimal overall variation.
kw = Dict(:isotropic=>false, :maxit=>100, :tol=>1e-4, :verbose=>false)
λ = 30.0
r = 3.0
denoised1d, hist1d = tvd(vec(X), λ, r;
                         kw...);
plot([Y' vec(X) denoised1d], label=["Исходный сигнал" "Шумовой сигнал" "Отфильтрованный сигналll"], ylabel="", xlabel="t", legend=:topleft)
Singular Value Spectral Analysis (SSA)
Singular spectral is a time series analysis method that uses linear algebra and signal theory to extract information from data. The essence of the method is that the time series is divided into singular value components and then these components are used for analysis and forecasting.
Visualize the extracted trend
L = 40
trend, seasonal = analyze(vec(X), L)
plot(vec(X),         label="Шумный сигнал", alpha=0.5)
plot!(trend,     label="Тренд",       lw=2)
plot!(Y', label="Эталон", lw=2)
Empirical decomposition
Empirical modal decomposition divides a signal into a set of internal modal functions — components with oscillations of different frequencies.
imf = EMD.emd(vec(X))
t = 1:length(vec(X))
plot(t, imf[7,:], label="IMF 7")
plot!(Y', label="Эталон", lw=2)
Conclusions
In this example, various signal preprocessing (filtering) methods were analyzed.