Line search and step size settings
The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:
\[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]
$\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.
The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep
and Agnostic
) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch
and Adaptive
) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic
or Adaptive
.
All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod
. The key method they have to implement is FrankWolfe.perform_line_search
which is called at every iteration to compute the step size gamma.
FrankWolfe.LineSearchMethod
— TypeLine search method to apply once the direction is computed. A LineSearchMethod
must implement
perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)
with d = x - v
. It may also implement build_linesearch_workspace(x, gradient)
which creates a workspace structure that is passed as last argument to perform_line_search
.
FrankWolfe.perform_line_search
— Functionperform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)
Returns the step size gamma
for step size strategy ls
.
FrankWolfe.Adaptive
— TypeModified adaptive line search test from:
S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313
It replaces the original test implemented in the AdaptiveZerothOrder line search based on:
Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.
FrankWolfe.AdaptiveZerothOrder
— TypeSlight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)
\[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]
The parameter alpha ∈ (0,1]
relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5
. The Adaptive
struct keeps track of the Lipschitz constant estimate L_est
. The keyword argument relaxed_smoothness
allows testing with an alternative smoothness condition,
\[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]
This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.
It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search
also has a should_upgrade
keyword argument on whether there should be a temporary upgrade to BigFloat
for extended precision.
FrankWolfe.Agnostic
— TypeComputes step size: l/(l + t)
at iteration t
, given l > 0
.
Using l > 2
leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.
Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096
See also the paper that introduced the study of open-loop step-sizes with l > 2:
Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838
Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))
S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313
FrankWolfe.Backtracking
— TypeBacktracking(limit_num_steps, tol, tau)
Backtracking line search strategy, see Pedregosa, Negiar, Askari, Jaggi (2018).
FrankWolfe.FixedStep
— TypeFixed step size strategy. The step size can still be truncated by the gamma_max
argument.
FrankWolfe.GeneralizedAgnostic
— TypeComputes step size: g(t)/(t + g(t))
at iteration t
, given g: R_{>= 0} -> R_{>= 0}
.
Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))
S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313
This step-size is as fast as the step-size gammat = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gammat = l / (t + l) for any l in N.
FrankWolfe.Goldenratio
— TypeGoldenratio
Simple golden-ratio based line search Golden Section Search, based on Combettes, Pokutta (2020) code and adapted.
FrankWolfe.MonotonicNonConvexStepSize
— TypeMonotonicNonConvexStepSize{F}
Represents a monotonic open-loop non-convex step size. Contains a halving factor N
increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N)
.
FrankWolfe.MonotonicStepSize
— TypeMonotonicStepSize{F}
Represents a monotonic open-loop step size. Contains a halving factor N
increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N)
.
FrankWolfe.Nonconvex
— TypeComputes step size: 1/sqrt(t + 1)
.
FrankWolfe.Secant
— TypeSecant(limit_num_steps, tol, domain_oracle)
Secant line search strategy, which iteratively refines the step size using the secant method. This method is geared towards problems with self-concordant functions (but might require extra structure) and potentially faster than the backtracking line search. The order of convergence is superlinear with exponent 1.618 (Golden Ratio) but not quite quadratic. Convergence is not guaranteed in general.
Arguments
inner_ls::LSM
: A fallback line search in case the last gamma in Secant does not satisfy the tolerance. Is only used ifsafe==true
. (defaultBacktracking
)safe::Bool
: Flag indicating whether the fallback line search should be used. If false, the best gamma from Secant is used. (defaulttrue
)limit_num_steps::Int
: Maximum number of iterations for the secant method. (default 40)tol::Float64
: Tolerance for convergence. (default 1e-8)domain_oracle::Function
, returns true if the argument x is in the domain of the objective function f.
References
FrankWolfe.Shortstep
— TypeComputes the 'Short step' step size: dual_gap / (L * norm(x - v)^2)
, where L
is the Lipschitz constant of the gradient, x
is the current iterate, and v
is the current Frank-Wolfe vertex.
See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.
Index
FrankWolfe.Adaptive
FrankWolfe.AdaptiveZerothOrder
FrankWolfe.Agnostic
FrankWolfe.Backtracking
FrankWolfe.FixedStep
FrankWolfe.GeneralizedAgnostic
FrankWolfe.Goldenratio
FrankWolfe.LineSearchMethod
FrankWolfe.MonotonicNonConvexStepSize
FrankWolfe.MonotonicStepSize
FrankWolfe.Nonconvex
FrankWolfe.Secant
FrankWolfe.Shortstep
FrankWolfe.perform_line_search