Computing Hessians
Страница в процессе перевода. |
This tutorial was generated using Literate.jl. Download the source as a .jl
file.
The purpose of this tutorial is to demonstrate how to compute the Hessian of the Lagrangian of a nonlinear program.
This is an advanced tutorial that interacts with the low-level nonlinear interface of MathOptInterface. By default, JuMP exports the
|
Given a nonlinear program:
the Hessian of the Lagrangian is computed as:
where is a primal point, is a scalar (typically ), and is a vector of weights corresponding to the Lagrangian dual of the constraints.
This tutorial uses the following packages:
using JuMP
import Ipopt
import LinearAlgebra
import Random
import SparseArrays
The basic model
To demonstrate how to interact with the lower-level nonlinear interface, we need an example model. The exact model isn’t important; we use the model from The Rosenbrock function tutorial, with some additional constraints to demonstrate various features of the lower-level interface.
model = Model(Ipopt.Optimizer)
set_silent(model)
@variable(model, x[i = 1:2], start = -i)
@constraint(model, g_1, x[1]^2 <= 1)
@constraint(model, g_2, (x[1] + x[2])^2 <= 2)
@objective(model, Min, (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2)
optimize!(model)
@assert is_solved_and_feasible(model)
The analytic solution
With a little work, it is possible to analytically derive the correct hessian:
function analytic_hessian(x, σ, μ)
g_1_H = [2.0 0.0; 0.0 0.0]
g_2_H = [2.0 2.0; 2.0 2.0]
f_H = zeros(2, 2)
f_H[1, 1] = 2.0 + 1200.0 * x[1]^2 - 400.0 * x[2]
f_H[1, 2] = f_H[2, 1] = -400.0 * x[1]
f_H[2, 2] = 200.0
return σ * f_H + μ' * [g_1_H, g_2_H]
end
analytic_hessian (generic function with 1 method)
Here are various points:
analytic_hessian([1, 1], 0, [0, 0])
2×2 Matrix{Float64}:
0.0 0.0
0.0 0.0
analytic_hessian([1, 1], 0, [1, 0])
2×2 Matrix{Float64}:
2.0 0.0
0.0 0.0
analytic_hessian([1, 1], 0, [0, 1])
2×2 Matrix{Float64}:
2.0 2.0
2.0 2.0
analytic_hessian([1, 1], 1, [0, 0])
2×2 Matrix{Float64}:
802.0 -400.0
-400.0 200.0
Create a nonlinear model
JuMP delegates automatic differentiation to the MOI.Nonlinear
submodule. Therefore, to compute the Hessian of the Lagrangian, we need to create a MOI.Nonlinear.Model
object:
rows = Any[]
nlp = MOI.Nonlinear.Model()
for (F, S) in list_of_constraint_types(model)
if F <: VariableRef
continue # Skip variable bounds
end
for ci in all_constraints(model, F, S)
push!(rows, ci)
object = constraint_object(ci)
MOI.Nonlinear.add_constraint(nlp, object.func, object.set)
end
end
MOI.Nonlinear.set_objective(nlp, objective_function(model))
nlp
A Nonlinear.Model with:
1 objective
0 parameters
0 expressions
2 constraints
It is important that we save the constraint indices in a vector rows
, so that we know the order of the constraints in the nonlinear model.
Next, we need to convert our model into an MOI.Nonlinear.Evaluator
, specifying an automatic differentiation backend. In this case, we use MOI.Nonlinear.SparseReverseMode
:
evaluator = MOI.Nonlinear.Evaluator(
nlp,
MOI.Nonlinear.SparseReverseMode(),
index.(all_variables(model)),
)
Nonlinear.Evaluator with available features:
* :Grad
* :Jac
* :JacVec
* :Hess
* :HessVec
* :ExprGraph
Before computing anything with the evaluator, we need to initialize it. Use MOI.features_available
to see what we can query:
MOI.features_available(evaluator)
6-element Vector{Symbol}:
:Grad
:Jac
:JacVec
:Hess
:HessVec
:ExprGraph
Consult the MOI documentation for specifics, but to obtain the Hessian matrix, we need to initialize :Hess
:
MOI.initialize(evaluator, [:Hess])
MOI represents the Hessian as a sparse matrix. Get the sparsity pattern as follows:
hessian_sparsity = MOI.hessian_lagrangian_structure(evaluator)
7-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
(2, 1)
(1, 1)
(1, 1)
(2, 2)
(2, 1)
The sparsity pattern has a few properties of interest:
-
Each element
(i, j)
indicates a structural non-zero in rowi
and columnj
-
The list may contain duplicates, in which case we should add the values together
-
The list does not need to be sorted
-
The list may contain any mix of lower- or upper-triangular indices
This format matches Julia’s sparse-triplet form of a SparseArray, so we can convert from the sparse Hessian representation to a Julia SparseArray as follows:
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
n = num_variables(model)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
0.0 ⋅
0.0 0.0
Of course, knowing where the zeros are isn’t very interesting. We really want to compute the value of the Hessian matrix at a point.
MOI.eval_hessian_lagrangian(evaluator, V, ones(n), 1.0, ones(length(rows)))
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
806.0 ⋅
-398.0 202.0
In practice, we often want to compute the value of the hessian at the optimal solution.
First, we compute the primal solution. To do so, we need a vector of the variables in the order that they were passed to the solver:
x = all_variables(model)
2-element Vector{VariableRef}:
x[1]
x[2]
Here x[1]
is the variable that corresponds to column 1, and so on. Here’s the optimal primal solution:
x_optimal = value.(x)
2-element Vector{Float64}:
0.7903587565231842
0.6238546272155127
Next, we need the optimal dual solution associated with the nonlinear constraints (this is where it is important to record the order of the constraints as we added them to nlp
):
y_optimal = dual.(rows)
2-element Vector{Float64}:
-8.038451738599348e-8
-0.05744089305771262
Now we can compute the Hessian at the optimal primal-dual point:
MOI.eval_hessian_lagrangian(evaluator, V, x_optimal, 1.0, y_optimal)
H = SparseArrays.sparse(I, J, V, n, n)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
501.944 ⋅
-316.258 199.885
However, this Hessian isn’t quite right because it isn’t symmetric. We can fix this by filling in the appropriate off-diagonal terms:
function fill_off_diagonal(H)
ret = H + H'
row_vals = SparseArrays.rowvals(ret)
non_zeros = SparseArrays.nonzeros(ret)
for col in 1:size(ret, 2)
for i in SparseArrays.nzrange(ret, col)
if col == row_vals[i]
non_zeros[i] /= 2
end
end
end
return ret
end
fill_off_diagonal(H)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 4 stored entries:
501.944 -316.258
-316.258 199.885
Putting everything together:
function compute_optimal_hessian(model::Model)
rows = Any[]
nlp = MOI.Nonlinear.Model()
for (F, S) in list_of_constraint_types(model)
for ci in all_constraints(model, F, S)
push!(rows, ci)
object = constraint_object(ci)
MOI.Nonlinear.add_constraint(nlp, object.func, object.set)
end
end
MOI.Nonlinear.set_objective(nlp, objective_function(model))
x = all_variables(model)
backend = MOI.Nonlinear.SparseReverseMode()
evaluator = MOI.Nonlinear.Evaluator(nlp, backend, index.(x))
MOI.initialize(evaluator, [:Hess])
hessian_sparsity = MOI.hessian_lagrangian_structure(evaluator)
I = [i for (i, _) in hessian_sparsity]
J = [j for (_, j) in hessian_sparsity]
V = zeros(length(hessian_sparsity))
MOI.eval_hessian_lagrangian(evaluator, V, value.(x), 1.0, dual.(rows))
H = SparseArrays.sparse(I, J, V, length(x), length(x))
return Matrix(fill_off_diagonal(H))
end
H_star = compute_optimal_hessian(model)
2×2 Matrix{Float64}:
501.944 -316.258
-316.258 199.885
If we compare our solution against the analytical solution:
analytic_hessian(value.(x), 1.0, dual.([g_1, g_2]))
2×2 Matrix{Float64}:
501.944 -316.258
-316.258 199.885
If we look at the eigenvalues of the Hessian:
LinearAlgebra.eigvals(H_star)
2-element Vector{Float64}:
0.4443995924983142
701.3843426037456
we see that they are all positive. Therefore, the Hessian is positive definite, and so the solution found by Ipopt is a local minimizer.
Jacobians
In addition to the Hessian, it is also possible to query other parts of the nonlinear model. For example, the Jacobian of the constraints can be queried using MOI.jacobian_structure
and MOI.eval_constraint_jacobian
.
function compute_optimal_jacobian(model::Model)
rows = Any[]
nlp = MOI.Nonlinear.Model()
for (F, S) in list_of_constraint_types(model)
for ci in all_constraints(model, F, S)
if !(F <: VariableRef)
push!(rows, ci)
object = constraint_object(ci)
MOI.Nonlinear.add_constraint(nlp, object.func, object.set)
end
end
end
MOI.Nonlinear.set_objective(nlp, objective_function(model))
x = all_variables(model)
backend = MOI.Nonlinear.SparseReverseMode()
evaluator = MOI.Nonlinear.Evaluator(nlp, backend, index.(x))
# Initialize the Jacobian
MOI.initialize(evaluator, [:Jac])
# Query the Jacobian structure
sparsity = MOI.jacobian_structure(evaluator)
I, J, V = first.(sparsity), last.(sparsity), zeros(length(sparsity))
# Query the Jacobian values
MOI.eval_constraint_jacobian(evaluator, V, value.(x))
return SparseArrays.sparse(I, J, V, length(rows), length(x))
end
compute_optimal_jacobian(model)
2×2 SparseArrays.SparseMatrixCSC{Float64, Int64} with 3 stored entries:
1.58072 ⋅
2.82843 2.82843
Compare that to the analytic solution:
y = value.(x)
[2y[1] 0; 2y[1]+2y[2] 2y[1]+2y[2]]
2×2 Matrix{Float64}:
1.58072 0.0
2.82843 2.82843