Algorithms
This section contains all main algorithms of the package. These are the ones typical users will call.
The typical signature for these algorithms is:
my_algorithm(f, grad!, lmo, x0)
Standard algorithms
FrankWolfe.frank_wolfe
— Methodfrank_wolfe(f, grad!, lmo, x0; ...)
Simplest form of the Frank-Wolfe algorithm. Returns a tuple (x, v, primal, dual_gap, traj_data)
with:
x
final iteratev
last vertex from the LMOprimal
primal valuef(x)
dual_gap
final Frank-Wolfe gaptraj_data
vector of trajectory information.
FrankWolfe.lazified_conditional_gradient
— Methodlazified_conditional_gradient(f, grad!, lmo_base, x0; ...)
Similar to FrankWolfe.frank_wolfe
but lazyfying the LMO: each call is stored in a cache, which is looked up first for a good-enough direction. The cache used is a FrankWolfe.MultiCacheLMO
or a FrankWolfe.VectorCacheLMO
depending on whether the provided cache_size
option is finite.
FrankWolfe.stochastic_frank_wolfe
— Methodstochastic_frank_wolfe(f::StochasticObjective, lmo, x0; ...)
Stochastic version of Frank-Wolfe, evaluates the objective and gradient stochastically, implemented through the FrankWolfe.StochasticObjective
interface.
Keyword arguments include batch_size
to pass a fixed batch_size
or a batch_iterator
implementing batch_size = FrankWolfe.batchsize_iterate(batch_iterator)
for algorithms like Variance-reduced and projection-free stochastic optimization, E Hazan, H Luo, 2016.
Similarly, a constant momentum
can be passed or replaced by a momentum_iterator
implementing momentum = FrankWolfe.momentum_iterate(momentum_iterator)
.
FrankWolfe.block_coordinate_frank_wolfe
— Functionblock_coordinate_frank_wolfe(f, grad!, lmo::ProductLMO{N}, x0; ...) where {N}
Block-coordinate version of the Frank-Wolfe algorithm. Minimizes objective f
over the product of feasible domains specified by the lmo
. The optional argument the update_order
is of type FrankWolfe.BlockCoordinateUpdateOrder
and controls the order in which the blocks are updated. The argument update_step
is a single instance or tuple of FrankWolfe.UpdateStep
and defines which FW-algorithms to use to update the iterates in the different blocks.
The method returns a tuple (x, v, primal, dual_gap, traj_data)
with:
x
cartesian product of final iteratesv
cartesian product of last vertices of the LMOsprimal
primal valuef(x)
dual_gap
final Frank-Wolfe gaptraj_data
vector of trajectory information.
See S. Lacoste-Julien, M. Jaggi, M. Schmidt, and P. Pletscher 2013 and A. Beck, E. Pauwels and S. Sabach 2015 for more details about Block-Coordinate Frank-Wolfe.
Active-set based methods
The following algorithms maintain the representation of the iterates as a convex combination of vertices.
Away-step
FrankWolfe.away_frank_wolfe
— Methodaway_frank_wolfe(f, grad!, lmo, x0; ...)
Frank-Wolfe with away steps. The algorithm maintains the current iterate as a convex combination of vertices in the FrankWolfe.ActiveSet
data structure. See M. Besançon, A. Carderera and S. Pokutta 2021 for illustrations of away steps.
Pairwise Frank-Wolfe
FrankWolfe.blended_pairwise_conditional_gradient
— Methodblended_pairwise_conditional_gradient(f, grad!, lmo, x0; kwargs...)
Implements the BPCG algorithm from Tsuji, Tanaka, Pokutta (2021). The method uses an active set of current vertices. Unlike away-step, it transfers weight from an away vertex to another vertex of the active set.
FrankWolfe.blended_pairwise_conditional_gradient
— Methodblended_pairwise_conditional_gradient(f, grad!, lmo, active_set::AbstractActiveSet; kwargs...)
Warm-starts BPCG with a pre-defined active_set
.
FrankWolfe.pairwise_frank_wolfe
— Methodpairwise_frank_wolfe(f, grad!, lmo, x0; ...)
Frank-Wolfe with pairwise steps. The algorithm maintains the current iterate as a convex combination of vertices in the FrankWolfe.ActiveSet
data structure. See M. Besançon, A. Carderera and S. Pokutta 2021 for illustrations of away steps. Unlike away-step, it transfers weight from an away vertex to another vertex.
Blended Conditional Gradient
FrankWolfe.accelerated_simplex_gradient_descent_over_probability_simplex
— Methodaccelerated_simplex_gradient_descent_over_probability_simplex
Minimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using Nesterov's accelerated gradient descent.
FrankWolfe.blended_conditional_gradient
— Methodblended_conditional_gradient(f, grad!, lmo, x0)
Entry point for the Blended Conditional Gradient algorithm. See Braun, Gábor, et al. "Blended conditonal gradients" ICML 2019. The method works on an active set like FrankWolfe.away_frank_wolfe
, performing gradient descent over the convex hull of active vertices, removing vertices when their weight drops to 0 and adding new vertices by calling the linear oracle in a lazy fashion.
FrankWolfe.build_reduced_problem
— Methodbuild_reduced_problem(atoms::AbstractVector{<:AbstractVector}, hessian, weights, gradient, tolerance)
Given an active set formed by vectors , a (constant) Hessian and a gradient constructs a quadratic problem over the unit probability simplex that is equivalent to minimizing the original function over the convex hull of the active set. If λ are the barycentric coordinates of dimension equal to the cardinality of the active set, the objective function is:
f(λ) = reduced_linear^T λ + 0.5 * λ^T reduced_hessian λ
In the case where we find that the current iterate has a strong-Wolfe gap over the convex hull of the active set that is below the tolerance we return nothing (as there is nothing to do).
FrankWolfe.lp_separation_oracle
— MethodReturns either a tuple (y, val)
with y
an atom from the active set satisfying the progress criterion and val
the corresponding gap dot(y, direction)
or the same tuple with y
from the LMO.
inplace_loop
controls whether the iterate type allows in-place writes. kwargs
are passed on to the LMO oracle.
FrankWolfe.minimize_over_convex_hull!
— Methodminimize_over_convex_hull!
Given a function f with gradient grad! and an active set active_set this function will minimize the function over the convex hull of the active set until the strong-wolfe gap over the active set is below tolerance.
It will either directly minimize over the convex hull using simplex gradient descent, or it will transform the problem to barycentric coordinates and minimize over the unit probability simplex using gradient descent or Nesterov's accelerated gradient descent.
FrankWolfe.projection_simplex_sort
— Methodprojection_simplex_sort(x; s=1.0)
Perform a projection onto the probability simplex of radius s
using a sorting algorithm.
FrankWolfe.simplex_gradient_descent_over_convex_hull
— Methodsimplex_gradient_descent_over_convex_hull(f, grad!, gradient, active_set, tolerance, t, time_start, non_simplex_iter)
Minimizes an objective function over the convex hull of the active set until the Strong-Wolfe gap is below tolerance using simplex gradient descent.
FrankWolfe.simplex_gradient_descent_over_probability_simplex
— Methodsimplex_gradient_descent_over_probability_simplex
Minimizes an objective function over the unit probability simplex until the Strong-Wolfe gap is below tolerance using gradient descent.
FrankWolfe.strong_frankwolfe_gap
— MethodChecks the strong Frank-Wolfe gap for the reduced problem.
FrankWolfe.strong_frankwolfe_gap_probability_simplex
— Methodstrong_frankwolfe_gap_probability_simplex
Compute the Strong-Wolfe gap over the unit probability simplex given a gradient.
Blended Pairwise Conditional Gradient
Alternating Methods
Problems over intersections of convex sets, i.e.
\[\min_{x \in \bigcap_{i=1}^n P_i} f(x),\]
pose a challenge as one has to combine the information of two or more LMOs.
FrankWolfe.alternating_linear_minimization
converts the problem into a series of subproblems over single sets. To find a point within the intersection, one minimizes both the distance to the iterates of the other subproblems and the original objective function.
FrankWolfe.alternating_projections
solves feasibility problems over intersections of feasible regions.
FrankWolfe.alternating_linear_minimization
— Methodalternating_linear_minimization(bc_algo::BlockCoordinateMethod, f, grad!, lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}
Alternating Linear Minimization minimizes the objective f
over the intersections of the feasible domains specified by lmos
. The tuple x0
defines the initial points for each domain. Returns a tuple (x, v, primal, dual_gap, dist2, traj_data)
with:
x
cartesian product of final iteratesv
cartesian product of last vertices of the LMOsprimal
primal valuef(x)
dual_gap
final Frank-Wolfe gapdist2
is 1/2 of the sum of squared, pairwise distances between iteratestraj_data
vector of trajectory information.
FrankWolfe.alternating_projections
— Methodalternating_projections(lmos::NTuple{N,LinearMinimizationOracle}, x0; ...) where {N}
Computes a point in the intersection of feasible domains specified by lmos
. Returns a tuple (x, v, dual_gap, dist2, traj_data)
with:
x
cartesian product of final iteratesv
cartesian product of last vertices of the LMOsdual_gap
final Frank-Wolfe gapdist2
is 1/2 * sum of squared, pairwise distances between iteratestraj_data
vector of trajectory information.