Engee documentation
Notebook

Curve approximation by least squares method

This example demonstrates how to perform a non-linear curve approximation (approximation) problem using the problem-oriented optimisation workflow.

We will use the functions of the JuMP.jl library to formulate the optimisation problem, the nonlinear optimisation library Ipopt.jl, the random number generation library Random.jl, the library for working with probability distributions Distributions.jl and the library Plots.jl to visualise the results.

Installing the libraries

If your environment does not have the latest version of the JuMP package installed , uncomment and run the box below:

In [ ]:
Pkg.add(["Ipopt", "Distributions", "JuMP"])
In [ ]:
#Pkg.add("JuMP");

To launch a new version of the library after the installation is complete, click on the "My Account" button:

screenshot_20240710_134852.png Then click on the "Stop" button:

screenshot_20240710_2.png

Restart the session by pressing the "Start Engee" button:

screenshot_20240710_135431.png

Task Description

The optimisation problem is to perform a non-linear least squares approximation of a curve.

Model dependence equation for this task:

$$ y(t)=A_1exp(r_1t)+A_2exp(r_2t) $$

where $A_1$, $A_2$, $r_1$ and $r_2$ are the unknown parameters,$y$ is the response, and t is time.

The goal of the problem is to find the values $A$ and $r$, which minimise the target function:

$$ \sum \limits _{t \in \mathbb{tdata} } (y(t)-ydata)^2 $$

Connecting libraries

Connect the library JuMP:

In [ ]:
using JuMP; 

Connect the nonlinear solver library Ipopt:

In [ ]:
using Ipopt;

Connect the random number generation library Random:

In [ ]:
using Random;

Connect the library for working with probability distributions Distributions:

In [ ]:
using Distributions;

Connect the graphing library Plots:

In [ ]:
using Plots;

Data generation

For this task, we will generate artificial noisy data.

Specify an arbitrary grain of the random number generator:

In [ ]:
Random.seed!(0);

Create variables A_true and r_true, containing the true values of the model parameters. These variables are used to generate artificial data ydata.

Use $A_1=1$, $A_2=2$, $r_1=-1$ and $r_2=-3$ as true values:

In [ ]:
A_true = [1, 2];
r_true = [-1, -3];

The variables A_true and r_true are the benchmark that the optimisation algorithm should try to recreate. Comparing the optimisation results with the true values allows us to assess the accuracy and efficiency of the curve approximation process.

Generate 200 random values of tdata ranging from 0 to 3 as time data. Also create a variable noisedata, containing noise, which we will add to the final data ydata.

In [ ]:
tdata = sort(3 * rand(200));
noisedata = 0.05 * randn(length(tdata)); 
ydata = A_true[1] * exp.(r_true[1] * tdata) + A_true[2] * exp.(r_true[2] * tdata) + noisedata;

Plot the resulting points on a graph:

In [ ]:
plot(tdata, ydata, seriestype=:scatter, color=:red, legend=:right, label="Исходные данные")
xlabel!("t")
ylabel!("Отклик")
Out[0]:

As you can see from the graph, the data contains noise. Therefore, the solution of the problem will probably not exactly match the true parameters $A$ and $r$.

Creating an optimisation problem

Create an optimisation problem using the function Model() and specify the name of the solver in brackets:

In [ ]:
model = Model(Ipopt.Optimizer)
Out[0]:
A JuMP Model
├ solver: Ipopt
├ objective_sense: FEASIBILITY_SENSE
├ num_variables: 0
├ num_constraints: 0
└ Names registered in the model: none

Create variables $A$ and $r$, containing two values each, - $A_1$, $A_2$, $r_1$ and $r_2$:

In [ ]:
@variable(model, A[1:2]);
@variable(model, r[1:2]);

Create a non-linear objective function minimising the sum of squares:

In [ ]:
@NLobjective(model, Min, sum((A[1]*exp(r[1]*t) + A[2]*exp(r[2]*t) - y)^2 for (t,y) in zip(tdata, ydata)));

Setting appropriate initial values is an important step for efficient and accurate solution of optimisation problems, significantly affecting the speed, quality and reliability of the optimisation process.

Specify the initial values $A_1=\frac{1}{2}$, $A_2=\frac{3}{2}$, $r_1=-\frac{1}{2}$ and $r_2=-\frac{3}{2}$.

In [ ]:
set_start_value.(A, [1/2, 3/2]);
set_start_value.(r, [-1/2, -3/2]);

Output the initial values $A$ and $r$:

In [ ]:
println("Начальные значения A: ", start_value.(A))
println("Начальные значения r: ", start_value.(r))
Начальные значения A: [0.5, 1.5]
Начальные значения r: [-0.5, -1.5]

Solution of the problem

Solve the optimisation problem:

In [ ]:
optimize!(model)
This is Ipopt version 3.14.13, running with linear solver MUMPS 5.6.1.

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:       10

Total number of variables............................:        4
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  7.7600291e+00 0.00e+00 1.27e+01  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
   1  5.9773864e+00 0.00e+00 8.83e+00  -1.0 8.14e-02   2.0 1.00e+00 1.00e+00f  1
   2  3.5889330e+00 0.00e+00 5.31e+00  -1.0 1.50e-01   1.5 1.00e+00 1.00e+00f  1
   3  1.6781193e+00 0.00e+00 2.57e+00  -1.0 1.98e-01   1.0 1.00e+00 1.00e+00f  1
   4  8.3132157e-01 0.00e+00 1.17e+00  -1.0 2.34e-01   0.6 1.00e+00 1.00e+00f  1
   5  5.4494882e-01 0.00e+00 4.52e-01  -1.0 2.47e-01   0.1 1.00e+00 1.00e+00f  1
   6  4.9173214e-01 0.00e+00 5.15e-01  -1.7 2.48e-01    -  1.00e+00 1.00e+00f  1
   7  4.8601288e-01 0.00e+00 3.37e-02  -1.7 5.88e-02  -0.4 1.00e+00 1.00e+00f  1
   8  4.8080205e-01 0.00e+00 3.94e-01  -2.5 2.66e-01    -  1.00e+00 1.00e+00f  1
   9  4.7632633e-01 0.00e+00 8.84e-02  -2.5 1.17e-01    -  1.00e+00 1.00e+00f  1
iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
  10  4.7615210e-01 0.00e+00 3.43e-03  -2.5 3.40e-02    -  1.00e+00 1.00e+00f  1
  11  4.7615108e-01 0.00e+00 1.79e-05  -3.8 2.09e-03    -  1.00e+00 1.00e+00f  1
  12  4.7615108e-01 0.00e+00 2.24e-09  -8.6 3.05e-05    -  1.00e+00 1.00e+00f  1

Number of Iterations....: 12

                                   (scaled)                 (unscaled)
Objective...............:   4.7615108059062977e-01    4.7615108059062977e-01
Dual infeasibility......:   2.2379096544650201e-09    2.2379096544650201e-09
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Variable bound violation:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   0.0000000000000000e+00    0.0000000000000000e+00
Overall NLP error.......:   2.2379096544650201e-09    2.2379096544650201e-09


Number of objective function evaluations             = 13
Number of objective gradient evaluations             = 13
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 12
Total seconds in IPOPT                               = 0.010

EXIT: Optimal Solution Found.

Store the values $A$ and $r$ in variables:

In [ ]:
A_sol = value.(A);
r_sol = value.(r);

Output the results of the optimisation:

In [ ]:
println("Оптимизированные значения A: ", A_sol)
println("Оптимизированные значения r: ", r_sol)
Оптимизированные значения A: [1.0151611001028849, 2.0255646656856654]
Оптимизированные значения r: [-1.0125976166336879, -3.0589072699017805]

Visualise and analyse the results

Calculate the result of the curve approximation:

In [ ]:
fitted_response = A_sol[1] * exp.(r_sol[1] * tdata) + A_sol[2] * exp.(r_sol[2] * tdata);

Plot the results on a graph:

In [ ]:
plot!(tdata, fitted_response, color=:blue, label="Приближённая кривая")
title!("Приближённый отклик")
Out[0]:

You can compare the obtained results with the originally set values $A_1=1$, $A_2=2$, $r_1=-1$ and $r_2=-3$. And the round function allows you to round excessively accurate values.

Compare the results of $A$ with the original values:

In [ ]:
percent_diffs = ((A_sol .- A_true) ./ A_true) .* 100
avg_percent_diff = mean(percent_diffs)
println("Процентная разница: ",  round.(percent_diffs, digits=2))
println("Средняя процентная разница: ", round.(avg_percent_diff, digits=2))
Процентная разница: [1.52, 1.28]
Средняя процентная разница: 1.4

Compare the results of $r$ with the original values:

In [ ]:
percent_diffs = ((r_sol .- r_true) ./ r_true) .* 100
avg_percent_diff = mean(percent_diffs)
println("Процентная разница: ", round.(percent_diffs, digits=2))
println("Средняя процентная разница: ", round.(avg_percent_diff, digits=2))
Процентная разница: [1.26, 1.96]
Средняя процентная разница: 1.61

Thus, the results of $A$ are on average different from the original values by $1.4\%$, and the results of $r$ by $1.61\%$.

Conclusion

In this example, we have solved a non-linear curve approximation problem using the least squares method (LSM) using a problem-oriented approach. We have also visualised and compared the results with the true model parameters.