Line search and step size settings

The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

\[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

$\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. FixedStep and Agnostic) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. GoldenSearch and Adaptive) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are Agnostic or Adaptive.

All step size computation strategies are subtypes of FrankWolfe.LineSearchMethod. The key method they have to implement is FrankWolfe.perform_line_search which is called at every iteration to compute the step size gamma.

FrankWolfe.LineSearchMethodType

Line search method to apply once the direction is computed. A LineSearchMethod must implement

perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

with d = x - v. It may also implement build_linesearch_workspace(x, gradient) which creates a workspace structure that is passed as last argument to perform_line_search.

source
FrankWolfe.perform_line_searchFunction
perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)

Returns the step size gamma for step size strategy ls.

source
FrankWolfe.AdaptiveType

Modified adaptive line search test from:

S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313

It replaces the original test implemented in the AdaptiveZerothOrder line search based on:

Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.

source
FrankWolfe.AdaptiveZerothOrderType

Slight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

\[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

The parameter alpha ∈ (0,1] relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is 0.5. The Adaptive struct keeps track of the Lipschitz constant estimate L_est. The keyword argument relaxed_smoothness allows testing with an alternative smoothness condition,

\[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. perform_line_search also has a should_upgrade keyword argument on whether there should be a temporary upgrade to BigFloat for extended precision.

source
FrankWolfe.AgnosticType

Computes step size: l/(l + t) at iteration t, given l > 0.

Using l > 2 leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.

Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096

See also the paper that introduced the study of open-loop step-sizes with l > 2:

Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838

Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

source
FrankWolfe.GeneralizedAgnosticType

Computes step size: g(t)/(t + g(t)) at iteration t, given g: R_{>= 0} -> R_{>= 0}.

Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

This step-size is as fast as the step-size gammat = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gammat = l / (t + l) for any l in N.

source
FrankWolfe.MonotonicNonConvexStepSizeType
MonotonicNonConvexStepSize{F}

Represents a monotonic open-loop non-convex step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 1 / sqrt(t + 1) * 2^(-N).

source
FrankWolfe.MonotonicStepSizeType
MonotonicStepSize{F}

Represents a monotonic open-loop step size. Contains a halving factor N increased at each iteration until there is primal progress gamma = 2 / (t + 2) * 2^(-N).

source
FrankWolfe.ShortstepType

Computes the 'Short step' step size: dual_gap / (L * norm(x - v)^2), where L is the Lipschitz constant of the gradient, x is the current iterate, and v is the current Frank-Wolfe vertex.

source

See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

Index