Engee documentation
Notebook

Least squares curve approximation

This example demonstrates how to perform a nonlinear least squares curve approximation task using the [problem-oriented optimization] workflow (https://engee.com/community/catalogs/projects/algoritm-resheniia-zadach-optimizatsii ).

We will use the functions of the [JuMP.jl] library(https://engee.com/helpcenter/stable/ru/julia/JuMP/index.html ) to formulate an optimization problem using the nonlinear optimization library Ipopt.jl, the random number generation library Random.jl, a library for working with probability distributions Distributions.jl and the library Plots.jl to visualize the results.

Installing Libraries

If the latest version of the package is not installed in your environment JuMP, uncomment and run the cell below:

In [ ]:
Pkg.add(["Ipopt", "Distributions", "JuMP"])
In [ ]:
#Pkg.add("JuMP");

To launch a new version of the library after the installation is complete, click on the "Personal Account" button:

Screenshot_20240710_134852.png Then click on the "Stop" button: Screenshot_20240710_2.png

Restart the session by clicking the "Start Engee" button:

Screenshot_20240710_135431.png

Task description

The optimization task is to perform a nonlinear least squares curve approximation.

A model dependence equation for this task:

where , , and — these are unknown parameters, — response, and t is the time.

The purpose of the task is to find the values and which minimize the objective function:

Connecting libraries

Connect the library JuMP:

In [ ]:
using JuMP; 

Connect the library of the nonlinear solver Ipopt:

In [ ]:
using Ipopt;

Connect the random number generation library Random:

In [ ]:
using Random;

Connect the library to work with probability distributions Distributions:

In [ ]:
using Distributions;

Connect the charting library Plots:

In [ ]:
using Plots;

Data generation

For this task, we will generate artificial noisy data.

Set an arbitrary grain of the random number generator:

In [ ]:
Random.seed!(0);

Create variables A_true and r_true, containing the true values of the model parameters. These variables are used to generate artificial data. ydata.

For the true values, use , , and :

In [ ]:
A_true = [1, 2];
r_true = [-1, -3];

Variables A_true and r_true they are the benchmark that the optimization algorithm should try to recreate. Comparing the optimization results with the true values allows us to evaluate the accuracy and efficiency of the curve approximation process.

Generate 200 random values tdata in the range from 0 to 3 as time data. Also create a variable noisedata, containing noise, which we will add to the final data ydata.

In [ ]:
tdata = sort(3 * rand(200));
noisedata = 0.05 * randn(length(tdata)); 
ydata = A_true[1] * exp.(r_true[1] * tdata) + A_true[2] * exp.(r_true[2] * tdata) + noisedata;

Display the resulting points on a graph:

In [ ]:
plot(tdata, ydata, seriestype=:scatter, color=:red, legend=:right, label="Исходные данные")
xlabel!("t")
ylabel!("Отклик")
Out[0]:

As you can see from the graph, the data contains noise. Therefore, the solution to the problem probably won't exactly match the true parameters. and .

Creating an optimization task

Create an optimization task using the function Model() and specify the name of the solver in parentheses.:

In [ ]:
model = Model(Ipopt.Optimizer)
Out[0]:
A JuMP Model
├ solver: Ipopt
├ objective_sense: FEASIBILITY_SENSE
├ num_variables: 0
├ num_constraints: 0
└ Names registered in the model: none

Create variables and containing two values each, – , , and :

In [ ]:
@variable(model, A[1:2]);
@variable(model, r[1:2]);

Create a non-linear objective function that minimizes the sum of squares:

In [ ]:
@NLobjective(model, Min, sum((A[1]*exp(r[1]*t) + A[2]*exp(r[2]*t) - y)^2 for (t,y) in zip(tdata, ydata)));

Setting the appropriate initial values is an important step for the efficient and accurate solution of optimization problems, significantly affecting the speed, quality and reliability of the optimization process.

Specify the initial values , , and .

In [ ]:
set_start_value.(A, [1/2, 3/2]);
set_start_value.(r, [-1/2, -3/2]);

Print the initial values and :

In [ ]:
println("Начальные значения A: ", start_value.(A))
println("Начальные значения r: ", start_value.(r))
Начальные значения A: [0.5, 1.5]
Начальные значения r: [-0.5, -1.5]

Problem solving

Solve the optimization problem:

In [ ]:
optimize!(model)
This is Ipopt version 3.14.13, running with linear solver MUMPS 5.6.1.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       10

Total number of variables............................:        4
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  7.7600291e+00 0.00e+00 1.27e+01  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  5.9773864e+00 0.00e+00 8.83e+00  -1.0 8.14e-02   2.0 1.00e+00 1.00e+00f  1
   2  3.5889330e+00 0.00e+00 5.31e+00  -1.0 1.50e-01   1.5 1.00e+00 1.00e+00f  1
   3  1.6781193e+00 0.00e+00 2.57e+00  -1.0 1.98e-01   1.0 1.00e+00 1.00e+00f  1
   4  8.3132157e-01 0.00e+00 1.17e+00  -1.0 2.34e-01   0.6 1.00e+00 1.00e+00f  1
   5  5.4494882e-01 0.00e+00 4.52e-01  -1.0 2.47e-01   0.1 1.00e+00 1.00e+00f  1
   6  4.9173214e-01 0.00e+00 5.15e-01  -1.7 2.48e-01    -  1.00e+00 1.00e+00f  1
   7  4.8601288e-01 0.00e+00 3.37e-02  -1.7 5.88e-02  -0.4 1.00e+00 1.00e+00f  1
   8  4.8080205e-01 0.00e+00 3.94e-01  -2.5 2.66e-01    -  1.00e+00 1.00e+00f  1
   9  4.7632633e-01 0.00e+00 8.84e-02  -2.5 1.17e-01    -  1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  4.7615210e-01 0.00e+00 3.43e-03  -2.5 3.40e-02    -  1.00e+00 1.00e+00f  1
  11  4.7615108e-01 0.00e+00 1.79e-05  -3.8 2.09e-03    -  1.00e+00 1.00e+00f  1
  12  4.7615108e-01 0.00e+00 2.24e-09  -8.6 3.05e-05    -  1.00e+00 1.00e+00f  1

Number of Iterations....: 12

                                   (scaled)                 (unscaled)
Objective...............:   4.7615108059062977e-01    4.7615108059062977e-01
Dual infeasibility......:   2.2379096544650201e-09    2.2379096544650201e-09
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   2.2379096544650201e-09    2.2379096544650201e-09


Number of objective function evaluations             = 13
Number of objective gradient evaluations             = 13
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 12
Total seconds in IPOPT                               = 0.010

EXIT: Optimal Solution Found.

Save the values and in variables:

In [ ]:
A_sol = value.(A);
r_sol = value.(r);

Output the optimization results:

In [ ]:
println("Оптимизированные значения A: ", A_sol)
println("Оптимизированные значения r: ", r_sol)
Оптимизированные значения A: [1.0151611001028849, 2.0255646656856654]
Оптимизированные значения r: [-1.0125976166336879, -3.0589072699017805]

Visualization and analysis of results

Calculate the result of the curve approximation:

In [ ]:
fitted_response = A_sol[1] * exp.(r_sol[1] * tdata) + A_sol[2] * exp.(r_sol[2] * tdata);

Display the results on a graph:

In [ ]:
plot!(tdata, fitted_response, color=:blue, label="Приближённая кривая")
title!("Приближённый отклик")
Out[0]:

You can compare the results obtained with the values originally set. , , and . And the round function will allow you to round up excessively accurate values.

Compare the results with the original values:

In [ ]:
percent_diffs = ((A_sol .- A_true) ./ A_true) .* 100
avg_percent_diff = mean(percent_diffs)
println("Процентная разница: ",  round.(percent_diffs, digits=2))
println("Средняя процентная разница: ", round.(avg_percent_diff, digits=2))
Процентная разница: [1.52, 1.28]
Средняя процентная разница: 1.4

Compare the results with the original values:

In [ ]:
percent_diffs = ((r_sol .- r_true) ./ r_true) .* 100
avg_percent_diff = mean(percent_diffs)
println("Процентная разница: ", round.(percent_diffs, digits=2))
println("Средняя процентная разница: ", round.(avg_percent_diff, digits=2))
Процентная разница: [1.26, 1.96]
Средняя процентная разница: 1.61

So the results are on average, they differ from the initial values by , and the results on .

Conclusion

In this example, we solved a nonlinear problem of curve approximation using the least squares method (OLS) using a problem-oriented approach. We also visualized and compared the results with the true parameters of the model.