# Line search and step size settings

The step size dictates how far one traverses along a local descent direction. More specifically, the step size $\gamma_t$ is used at each iteration to determine how much the next iterate moves towards the new vertex:

\[x_{t+1} = x_t - \gamma_t (x_t - v_t).\]

$\gamma_t = 1$ implies that the next iterate is exactly the vertex, a zero $\gamma_t$ implies that the iterate is not moving.

The following are step size selection rules for Frank Wolfe algorithms. Some methodologies (e.g. `FixedStep`

and `Agnostic`

) depend only on the iteration number and induce series $\gamma_t$ that are independent of the problem data, while others (e.g. `GoldenSearch`

and `Adaptive`

) change according to local information about the function; the adaptive methods often require extra function and/or gradient computations. The typical options for convex optimization are `Agnostic`

or `Adaptive`

.

All step size computation strategies are subtypes of `FrankWolfe.LineSearchMethod`

. The key method they have to implement is `FrankWolfe.perform_line_search`

which is called at every iteration to compute the step size gamma.

`FrankWolfe.LineSearchMethod`

— TypeLine search method to apply once the direction is computed. A `LineSearchMethod`

must implement

`perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)`

with `d = x - v`

. It may also implement `build_linesearch_workspace(x, gradient)`

which creates a workspace structure that is passed as last argument to `perform_line_search`

.

`FrankWolfe.perform_line_search`

— Function`perform_line_search(ls::LineSearchMethod, t, f, grad!, gradient, x, d, gamma_max, workspace)`

Returns the step size `gamma`

for step size strategy `ls`

.

`FrankWolfe.Adaptive`

— TypeModified adaptive line search test from:

S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), preprint, https://arxiv.org/abs/2311.05313

It replaces the original test implemented in the AdaptiveZerothOrder line search based on:

Pedregosa, F., Negiar, G., Askari, A., and Jaggi, M. (2020). "Linearly convergent Frank–Wolfe with backtracking line-search", Proceedings of AISTATS.

`FrankWolfe.AdaptiveZerothOrder`

— TypeSlight modification of the Adaptive Step Size strategy from Pedregosa, Negiar, Askari, Jaggi (2018)

\[ f(x_t + \gamma_t (x_t - v_t)) - f(x_t) \leq - \alpha \gamma_t \langle \nabla f(x_t), x_t - v_t \rangle + \alpha^2 \frac{\gamma_t^2 \|x_t - v_t\|^2}{2} M ~.\]

The parameter `alpha ∈ (0,1]`

relaxes the original smoothness condition to mitigate issues with nummerical errors. Its default value is `0.5`

. The `Adaptive`

struct keeps track of the Lipschitz constant estimate `L_est`

. The keyword argument `relaxed_smoothness`

allows testing with an alternative smoothness condition,

\[ \langle \nabla f(x_t + \gamma_t (x_t - v_t) ) - \nabla f(x_t), x_t - v_t \rangle \leq \gamma_t M \|x_t - v_t\|^2 ~.\]

This condition yields potentially smaller and more stable estimations of the Lipschitz constant while being more computationally expensive due to the additional gradient computation.

It is also the fallback when the Lipschitz constant estimation fails due to numerical errors. `perform_line_search`

also has a `should_upgrade`

keyword argument on whether there should be a temporary upgrade to `BigFloat`

for extended precision.

`FrankWolfe.Agnostic`

— TypeComputes step size: `l/(l + t)`

at iteration `t`

, given `l > 0`

.

Using `l > 2`

leads to faster convergence rates than l = 2 over strongly and some uniformly convex set.

Accelerated Affine-Invariant Convergence Rates of the Frank-Wolfe Algorithm with Open-Loop Step-Sizes, Wirth, Peña, Pokutta (2023), https://arxiv.org/abs/2310.04096

See also the paper that introduced the study of open-loop step-sizes with l > 2:

Acceleration of Frank-Wolfe Algorithms with Open-Loop Step-Sizes, Wirth, Kerdreux, Pokutta, (2023), https://arxiv.org/abs/2205.12838

Fixing l = -1, results in the step size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

**S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313**

`FrankWolfe.Backtracking`

— Type`Backtracking(limit_num_steps, tol, tau)`

Backtracking line search strategy, see Pedregosa, Negiar, Askari, Jaggi (2018).

`FrankWolfe.FixedStep`

— TypeFixed step size strategy. The step size can still be truncated by the `gamma_max`

argument.

`FrankWolfe.GeneralizedAgnostic`

— TypeComputes step size: `g(t)/(t + g(t))`

at iteration `t`

, given `g: R_{>= 0} -> R_{>= 0}`

.

Defaults to the best open-loop step-size gamma_t = (2 + log(t+1)) / (t + 2 + log(t+1))

S. Pokutta "The Frank-Wolfe algorith: a short introduction" (2023), https://arxiv.org/abs/2311.05313

This step-size is as fast as the step-size gamma*t = 2 / (t + 2) up to polylogarithmic factors. Further, over strongly convex and some uniformly convex sets, it is faster than any traditional step-size gamma*t = l / (t + l) for any l in N.

`FrankWolfe.Goldenratio`

— Type`Goldenratio`

Simple golden-ratio based line search Golden Section Search, based on Combettes, Pokutta (2020) code and adapted.

`FrankWolfe.MonotonicNonConvexStepSize`

— Type`MonotonicNonConvexStepSize{F}`

Represents a monotonic open-loop non-convex step size. Contains a halving factor `N`

increased at each iteration until there is primal progress `gamma = 1 / sqrt(t + 1) * 2^(-N)`

.

`FrankWolfe.MonotonicStepSize`

— Type`MonotonicStepSize{F}`

Represents a monotonic open-loop step size. Contains a halving factor `N`

increased at each iteration until there is primal progress `gamma = 2 / (t + 2) * 2^(-N)`

.

`FrankWolfe.Nonconvex`

— TypeComputes step size: `1/sqrt(t + 1)`

.

`FrankWolfe.Secant`

— Type`Secant(limit_num_steps, tol, domain_oracle)`

Secant line search strategy, which iteratively refines the step size using the secant method. This method is geared towards problems with self-concordant functions (but might require extra structure) and potentially faster than the backtracking line search. The order of convergence is superlinear with exponent 1.618 (Golden Ratio) but not quite quadratic. Convergence is not guaranteed in general.

**Arguments**

`limit_num_steps::Int`

: Maximum number of iterations for the secant method. (default 40)`tol::Float64`

: Tolerance for convergence. (default 1e-8)`domain_oracle::Function`

, returns true if the argument x is in the domain of the objective function f.

**References**

`FrankWolfe.Shortstep`

— TypeComputes the 'Short step' step size: `dual_gap / (L * norm(x - v)^2)`

, where `L`

is the Lipschitz constant of the gradient, `x`

is the current iterate, and `v`

is the current Frank-Wolfe vertex.

See Pedregosa, Negiar, Askari, Jaggi (2020) for the adaptive step size, Carderera, Besançon, Pokutta (2021) for the monotonic step size.

## Index

`FrankWolfe.Adaptive`

`FrankWolfe.AdaptiveZerothOrder`

`FrankWolfe.Agnostic`

`FrankWolfe.Backtracking`

`FrankWolfe.FixedStep`

`FrankWolfe.GeneralizedAgnostic`

`FrankWolfe.Goldenratio`

`FrankWolfe.LineSearchMethod`

`FrankWolfe.MonotonicNonConvexStepSize`

`FrankWolfe.MonotonicStepSize`

`FrankWolfe.Nonconvex`

`FrankWolfe.Secant`

`FrankWolfe.Shortstep`

`FrankWolfe.perform_line_search`