Nonlinear data approximation

Introduction

In this example, we will look at how to approximate a nonlinear function from data using various Julia algorithms.

Adding the necessary libraries:

using Plots, LsqFit, Optim

Setting the task

The task is to approximate the function using the following data:

xdata = [
    0.0000    5.8955
    0.1000    3.5639
    0.2000    2.5173
    0.3000    1.9790
    0.4000    1.8990
    0.5000    1.3938
    0.6000    1.1359
    0.7000    1.0096
    0.8000    1.0343
    0.9000    0.8435
    1.0000    0.6856
    1.1000    0.6100
    1.2000    0.5392
    1.3000    0.3946
    1.4000    0.3903
    1.5000    0.5474
    1.6000    0.3459
    1.7000    0.1370
    1.8000    0.2211
    1.9000    0.1704
    2.0000    0.2636
];

Let's plot the data points:

t = xdata[:, 1];
y = xdata[:, 2];
graph1 = plot();
scatter!(t, y, title = "Data points", markerstrokecolor = :blue, markercolor = :white, markerstrokewidth = 1, markersize = 4, legend = false);
display(graph1)

Let's try to approximate the function to the data on the graph.

Solution using the curve_fit function

To approximate the function to the data using curve_fit, we compare the parameters in the function with the variable x as follows:

x(1) =

x(2) =

x(3) =

x(4) =

Let's define a function for approximation:

F(xdata, x) = x[1] * exp.(-x[2] .* xdata) + x[3] * exp.(-x[4] .* xdata);

We arbitrarily set the starting point x0 as follows:

x0 = [1.0, 1.0, 1.0, 0.0];

Let's run the curve_fit function and plot the resulting approximation.:

start_time = time()
fit = curve_fit(F, t, y, x0);
x = fit.param;
resnorm = sum(fit.resid.^2);
elapsed_time = time() - start_time;
println(x)
println("Calculation time: $elapsed_time seconds")

[3.0068987272691983, 10.586437798894883, 2.8890341644746096, 1.4003175885557395]
Время вычисления: 0.002053976058959961 секунд

график2 = scatter(t, y, markerstrokecolor=:blue, markercolor=:white, markerstrokewidth=1, markersize=4, label="Data points")
plot!(t, F(t, x), linewidth=2, color=:blue, label="The approximation")
title!("The approximation")
display(graph2)

Solution using the optimize function

To solve the problem using optimize, we define the objective function as the sum of the squares of the residuals.:

Fsumsquares(x) = sum((F(t, x) - y).^2);
start_time = time()
result = optimize(Fsumsquares, x0, LBFGS());
xunc = Optim.minimizer(result)     ;        
ressquared = Optim.minimum(result)     ;    
eflag = Optim.converged(result) ? 1 : 0   ;
elapsed_time = time() - start_time;
println(xunc)
println("Calculation time: $elapsed_time seconds")

[2.8890341327114255, 1.4003175757616197, 3.0068987558767297, 10.586437572259701]
Время вычисления: 0.04426288604736328 секунд

Note that optimize finds the same solution as curve_fit, but requires significantly more time for calculations. Since the order of the variables is arbitrary, the calculation results are arranged in a different order.

Separation of linear and nonlinear tasks

Note that the approximation problem is linear in terms of parameters and . This means that for any values and You can use the inverse division operator to find the values of C₁ and c₂ that solve the least squares problem.

We reformulate the problem as two-dimensional by searching for optimal values. and . Values and are calculated at each step using the inverse division operator.

function fitvector(lam, xdata, ydata)
    A = [exp(-λ * x) for x in xdata, λ in lam]
    c = A \ ydata
    yEst = A * c
    return yEst
end

Let's solve the problem using curve_fit, starting from a two-dimensional starting point [1, 0]:

x02 = [1.0, 0.0];
F2(t_data, x) = fitvector(x, t_data, y)
start_time = time()
fit2 = curve_fit(F2, t, y, x02);
x2 = fit2.param;
resnorm2 = sum(fit2.resid.^2);
exitflag2 = fit2.converged ? 1 : 0;
elapsed_time = time() - start_time;
println(x2)
# println("Calculation time: $elapsed_time seconds")

[10.5864375304706, 1.400317572907302]

The split problem is most stable to the initial approximation

Choosing an unsuccessful starting point for an initial problem with four parameters leads to a local solution that is not global. However, choosing a starting point with the same unsuccessful values of λ₁ and λ₂ for a split problem with two parameters leads to a global solution. To demonstrate this result, let's restart the original problem with a starting point that leads to a relatively unsuccessful local solution, and compare the resulting approximation with the global solution.

x0bad = [10.0, 1.0, 1.0, 0.0]
start_time = time()
fitbad = curve_fit(F, t, y, x0bad);
xbad = fitbad.param;
resnormbad = sum(fitbad.resid.^2);
exitflagbad = fitbad.converged ? 1 : 0;
elapsed_time = time() - start_time;
println(xbad)
println("Calculation time: $elapsed_time seconds")

[5.875979032555211, 2.4785902042541337, -0.7523543061281264, 2.4785903664742834]
Время вычисления: 0.0038368701934814453 секунд

график3 = scatter(t, y, markerstrokecolor=:blue, markercolor=:white, markerstrokewidth=1, markersize=4, label="Data")
plot!(t, F(t, x), linewidth=2, color=:blue, label="Successful parameters")
plot!(t, F(t, xbad), linewidth=2, color=:red, label="Failed parameters")
title!("The approximation")
display(graph3)

println("Residual rate at a successful endpoint: $(resnorm);") 
println("The remaining norm at the failed endpoint is $(resnormbad).")

Остаточная норма в удачной конечной точке: 0.14772257123485094;
Остаточная норма в неудачной конечной точке 2.2172997805818286.

Conclusion

This example demonstrates the key aspects of nonlinear data approximation using optimization techniques.:

Efficiency of specialized algorithms - the curve_fit function, designed specifically for curve approximation tasks, showed significantly higher efficiency compared to the optimize function, requiring fewer calculations of the objective function to achieve the same result.
The importance of parameter separation - the allocation of linear parameters of the problem makes it possible to significantly simplify the initial four-parameter problem to a two-dimensional one, which not only preserves computational efficiency, but also increases the stability of the method.
Stability to initial approximations - the split formulation of the problem demonstrated increased reliability, allowing avoiding local minima and finding a global solution even with an unsuccessful choice of initial parameter values.
Practical significance - the presented approaches have wide applicability in various fields where accurate approximation of experimental data by complex nonlinear models is required, including chemical kinetics, biological processes and physical experiments.

The method of separation of linear and nonlinear parameters is of particular value because it combines computational efficiency with increased reliability of convergence to a global solution, which makes it the preferred choice for solving practical problems of nonlinear approximation.