How does it work?
FrankWolfe.jl
contains generic routines to solve optimization problems of the form
\[\min_{x \in \mathcal{C}} f(x)\]
where $\mathcal{C}$ is a compact convex set and $f$ is a differentiable function. These routines work by solving a sequence of linear subproblems:
\[\min_{x \in \mathcal{C}} \langle d_k, x \rangle \quad \text{where} \quad d_k = \nabla f(x_k)\]
Linear Minimization Oracles
The Linear Minimization Oracle (LMO) is a key component, which is called at each iteration of the FW algorithm. Given a direction $d$, it returns an optimal vertex of the feasible set:
\[v \in \arg \min_{x\in \mathcal{C}} \langle d,x \rangle.\]
Custom LMOs
To be used by the algorithms provided here, an LMO must be a subtype of FrankWolfe.LinearMinimizationOracle
and implement the following method:
compute_extreme_point(lmo::LMO, direction; kwargs...) -> v
This method should minimize $v \mapsto \langle d, v \rangle$ over the set $\mathcal{C}$ defined by the LMO. Note that this means the set $\mathcal{C}$ doesn't have to be represented explicitly: all we need is to be able to minimize a linear function over it, even if the minimization procedure is a black box.
Pre-defined LMOs
If you don't want to define your LMO manually, several common implementations are available out-of-the-box:
- Simplices: unit simplex, probability simplex
- Balls in various norms
- Polytopes: K-sparse, Birkhoff
You can use an oracle defined via a Linear Programming solver (e.g. SCIP
or HiGHS
) with MathOptInferface
: see FrankWolfe.MathOptLMO
.
Finally, we provide wrappers to combine oracles easily, for example in a product.
See Combettes, Pokutta (2021) for references on most LMOs implemented in the package and their comparison with projection operators.
Optimization algorithms
The package features several variants of Frank-Wolfe that share the same basic API.
Most of the algorithms listed below also have a lazified version: see Braun, Pokutta, Zink (2016).
Standard Frank-Wolfe (FW)
It is implemented in the frank_wolfe
function.
See Jaggi (2013) for an overview.
This algorithm works both for convex and non-convex functions (use step size rule FrankWolfe.Nonconvex()
in the second case).
Away-step Frank-Wolfe (AFW)
It is implemented in the away_frank_wolfe
function.
See Lacoste-Julien, Jaggi (2015) for an overview.
Stochastic Frank-Wolfe (SFW)
It is implemented in the FrankWolfe.stochastic_frank_wolfe
function.
Blended Conditional Gradients (BCG)
It is implemented in the blended_conditional_gradient
function, with a built-in stability feature that temporarily increases accuracy.
See Braun, Pokutta, Tu, Wright (2018).
Pairwise Frank-Wolfe (PFW)
It is implemented in the pairwise_frank_wolfe
function. See Lacoste-Julien, Jaggi (2015) for an overview.
Blended Pairwise Conditional Gradients (BPCG)
It is implemented in the FrankWolfe.blended_pairwise_conditional_gradient
function, with a minor modification to improve sparsity.
See Tsuji, Tanaka, Pokutta (2021)
Comparison
The following table compares the characteristics of the algorithms presented in the package:
Algorithm | Progress/Iteration | Time/Iteration | Sparsity | Numerical Stability | Active Set | Lazifiable |
---|---|---|---|---|---|---|
FW | Low | Low | Low | High | No | Yes |
AFW | Medium | Medium-High | Medium | Medium-High | Yes | Yes |
B(P)CG | High | Medium-High | High | Medium | Yes | By design |
SFW | Low | Low | Low | High | No | No |
While the standard Frank-Wolfe algorithm can only move towards extreme points of the compact convex set $\mathcal{C}$, Away-step Frank-Wolfe can move away from them. The following figure from our paper illustrates this behaviour:
.
Both algorithms minimize a quadratic function (whose contour lines are depicted) over a simple polytope (the black square). When the minimizer lies on a face, the standard Frank-Wolfe algorithm zig-zags towards the solution, while its Away-step variant converges more quickly.
Block-Coordinate Frank-Wolfe (BCFW)
It is implemented in the FrankWolfe.block_coordinate_frank_wolfe
function.
See Lacoste-Julien, Jaggi, Schmidt, Pletscher (2013) and Beck, Pauwels, Sabach (2015) for more details about different variants of Block-Coordinate Frank-Wolfe.
Alternating Linear Minimization (ALM)
It is implemented in the FrankWolfe.alternating_linear_minimization
function.