Acceleration methods: N-GMRES and O-ACCEL
Constructors
NGMRES(;
alphaguess = LineSearches.InitialStatic(),
linesearch = LineSearches.HagerZhang(),
manifold = Flat(),
wmax::Int = 10,
ϵ0 = 1e-12,
nlprecon = GradientDescent(
alphaguess = LineSearches.InitialStatic(alpha=1e-4,scaled=true),
linesearch = LineSearches.Static(),
manifold = manifold),
nlpreconopts = Options(iterations = 1, allow_f_increases = true),
)
OACCEL(;manifold::Manifold = Flat(),
alphaguess = LineSearches.InitialStatic(),
linesearch = LineSearches.HagerZhang(),
nlprecon = GradientDescent(
alphaguess = LineSearches.InitialStatic(alpha=1e-4,scaled=true),
linesearch = LineSearches.Static(),
manifold = manifold),
nlpreconopts = Options(iterations = 1, allow_f_increases = true),
ϵ0 = 1e-12,
wmax::Int = 10)
Description
These algorithms take a step given by the nonlinear preconditioner nlprecon
and proposes an accelerated step on a subspace spanned by the previous wmax
iterates.
-
N-GMRES accelerates based on a minimization of an approximation to the norm of the
gradient.
-
O-ACCEL accelerates based on a minimization of a n approximation to the objective.
N-GMRES was originally developed for solving nonlinear systems [1], and reduces to GMRES for linear problems. Application of the algorithm to optimization is covered, for example, in [2]. A description of O-ACCEL and its connection to N-GMRES can be found in [3].
We recommend trying LBFGS on your problem before N-GMRES or O-ACCEL. All three algorithms have similar computational cost and memory requirements, however, L-BFGS is more efficient for many problems.
Example
This example shows how to accelerate GradientDescent
on the Extended Rosenbrock problem. First, we try to optimize using GradientDescent
.
using Optim, OptimTestProblems
UP = UnconstrainedProblems
prob = UP.examples["Extended Rosenbrock"]
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, GradientDescent())
The algorithm does not converge within 1000 iterations.
Results of Optimization Algorithm * Algorithm: Gradient Descent * Starting Point: [-1.2,1.0, ...] * Minimizer: [0.8923389282461412,0.7961268644300445, ...] * Minimum: 2.898230e-01 * Iterations: 1000 * Convergence: false * |x - x'| ≤ 0.0e+00: false |x - x'| = 4.02e-04 * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false |f(x) - f(x')| = 2.38e-03 |f(x)| * |g(x)| ≤ 1.0e-08: false |g(x)| = 8.23e-02 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: true * Objective Calls: 2525 * Gradient Calls: 2525
Now, we use OACCEL
to accelerate GradientDescent
.
# Default nonlinear procenditioner for `OACCEL`
nlprecon = GradientDescent(alphaguess=LineSearches.InitialStatic(alpha=1e-4,scaled=true),
linesearch=LineSearches.Static())
# Default size of subspace that OACCEL accelerates over is `wmax = 10`
oacc10 = OACCEL(nlprecon=nlprecon, wmax=10)
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, oacc10)
This drastically improves the GradientDescent
algorithm, converging in 87 iterations.
Results of Optimization Algorithm * Algorithm: O-ACCEL preconditioned with Gradient Descent * Starting Point: [-1.2,1.0, ...] * Minimizer: [1.0000000011361219,1.0000000022828495, ...] * Minimum: 3.255053e-17 * Iterations: 87 * Convergence: true * |x - x'| ≤ 0.0e+00: false |x - x'| = 6.51e-08 * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false |f(x) - f(x')| = 7.56e+02 |f(x)| * |g(x)| ≤ 1.0e-08: true |g(x)| = 1.06e-09 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 285 * Gradient Calls: 285
We can improve the acceleration further by changing the acceleration subspace size wmax
.
oacc5 = OACCEL(nlprecon=nlprecon, wmax=5)
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, oacc5)
Now, the O-ACCEL algorithm has accelerated GradientDescent
to converge in 50 iterations.
Results of Optimization Algorithm * Algorithm: O-ACCEL preconditioned with Gradient Descent * Starting Point: [-1.2,1.0, ...] * Minimizer: [0.9999999999392858,0.9999999998784691, ...] * Minimum: 9.218164e-20 * Iterations: 50 * Convergence: true * |x - x'| ≤ 0.0e+00: false |x - x'| = 2.76e-07 * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false |f(x) - f(x')| = 5.18e+06 |f(x)| * |g(x)| ≤ 1.0e-08: true |g(x)| = 4.02e-11 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 181 * Gradient Calls: 181
As a final comparison, we can do the same with N-GMRES.
ngmres5 = NGMRES(nlprecon=nlprecon, wmax=5)
optimize(UP.objective(prob), UP.gradient(prob), prob.initial_x, ngmres5)
Again, this significantly improves the GradientDescent
algorithm, and converges in 63 iterations.
Results of Optimization Algorithm * Algorithm: Nonlinear GMRES preconditioned with Gradient Descent * Starting Point: [-1.2,1.0, ...] * Minimizer: [0.9999999998534468,0.9999999997063993, ...] * Minimum: 5.375569e-19 * Iterations: 63 * Convergence: true * |x - x'| ≤ 0.0e+00: false |x - x'| = 9.94e-09 * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false |f(x) - f(x')| = 1.29e+03 |f(x)| * |g(x)| ≤ 1.0e-08: true |g(x)| = 4.94e-11 * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 222 * Gradient Calls: 222