Optimisers.jl
|
The page is in the process of being translated. |
Installation: OptimizationOptimisers.jl
To use this package, install the OptimizationOptimisers package:
import Pkg;
Pkg.add("OptimizationOptimisers");
In addition to the optimisation algorithms provided by the Optimisers.jl package this subpackage also provides the Sophia optimisation algorithm.
List of optimizers
-
Optimisers.Descent: Classic gradient descent optimizer with learning rate-
solve(problem, Descent(η)) -
ηis the learning rate -
Defaults:
-
η = 0.1
-
-
-
Optimisers.Momentum: Classic gradient descent optimizer with learning rate and momentum-
solve(problem, Momentum(η, ρ)) -
ηis the learning rate -
ρis the momentum -
Defaults:
-
η = 0.01 -
ρ = 0.9
-
-
-
Optimisers.Nesterov: Gradient descent optimizer with learning rate and Nesterov momentum-
solve(problem, Nesterov(η, ρ)) -
ηis the learning rate -
ρis the Nesterov momentum -
Defaults:
-
η = 0.01 -
ρ = 0.9
-
-
-
Optimisers.RMSProp: RMSProp optimizer-
solve(problem, RMSProp(η, ρ)) -
ηis the learning rate -
ρis the momentum -
Defaults:
-
η = 0.001 -
ρ = 0.9
-
-
-
Optimisers.Adam: Adam optimizer-
solve(problem, Adam(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999)
-
-
-
Optimisers.RAdam: Rectified Adam optimizer-
solve(problem, RAdam(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999)
-
-
-
Optimisers.OAdam: Optimistic Adam optimizer-
solve(problem, OAdam(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.5, 0.999)
-
-
-
Optimisers.AdaMax: AdaMax optimizer-
solve(problem, AdaMax(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999)
-
-
-
Optimisers.ADAGrad: ADAGrad optimizer-
solve(problem, ADAGrad(η)) -
ηis the learning rate -
Defaults:
-
η = 0.1
-
-
-
Optimisers.ADADelta: ADADelta optimizer-
solve(problem, ADADelta(ρ)) -
ρis the gradient decay factor -
Defaults:
-
ρ = 0.9
-
-
-
Optimisers.AMSGrad: AMSGrad optimizer-
solve(problem, AMSGrad(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999)
-
-
-
Optimisers.NAdam: Nesterov variant of the Adam optimizer-
solve(problem, NAdam(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999)
-
-
-
Optimisers.AdamW: AdamW optimizer-
solve(problem, AdamW(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
decayis the decay to weights -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999) -
decay = 0
-
-
-
Optimisers.ADABelief: ADABelief variant of Adam-
solve(problem, ADABelief(η, β::Tuple)) -
ηis the learning rate -
β::Tupleis the decay of momentums -
Defaults:
-
η = 0.001 -
β::Tuple = (0.9, 0.999)
-
-