TY - JOUR

T1 - Optimization in high dimensions via accelerated, parallel, and proximal coordinate descent

AU - Fercoq, Olivier

AU - Richtarik, Peter

N1 - Funding Information:
This work was supported by EPSRC grant EP/I017127/1 (Mathematics for Vast Digital Resources) and by the Centre for Numerical Algorithms and Intelligent Software (funded by EPSRC grant EP/G036136/1 and the Scottish Funding Council).
Publisher Copyright:
© 2016 Society for Industrial and Applied Mathematics.

PY - 2016

Y1 - 2016

N2 - We propose a new randomized coordinate descent method for minimizing the sum of convex functions, each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method has been proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2LR 2/(k + 1)2, where k is the iteration counter is a data-weighted average degree of separability of the loss function, L is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel, and distributed versions of randomized block coordinate descent. Due to this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve upon the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, and linear and semidefinite programming.

AB - We propose a new randomized coordinate descent method for minimizing the sum of convex functions, each of which depends on a small number of coordinates only. Our method (APPROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method has been proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2LR 2/(k + 1)2, where k is the iteration counter is a data-weighted average degree of separability of the loss function, L is the average of Lipschitz constants associated with the coordinates and individual functions in the sum, and R is the distance of the initial point from the minimizer. We show that the method can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent, rendering it impractical. The fact that the method depends on the average degree of separability, and not on the maximum degree, can be attributed to the use of new safe large stepsizes, leading to improved expected separable overapproximation (ESO). These are of independent interest and can be utilized in all existing parallel randomized coordinate descent algorithms based on the concept of ESO. In special cases, our method recovers several classical and recent algorithms such as simple and accelerated proximal gradient descent, as well as serial, parallel, and distributed versions of randomized block coordinate descent. Due to this flexibility, APPROX had been used successfully by the authors in a graduate class setting as a modern introduction to deterministic and randomized proximal gradient methods. Our bounds match or improve upon the best known bounds for each of the methods APPROX specializes to. Our method has applications in a number of areas, including machine learning, submodular optimization, and linear and semidefinite programming.

KW - Acceleration

KW - Big data

KW - Complexity

KW - Convex optimization

KW - Parallel methods

KW - Partial separability

KW - Proximal methods

KW - Randomized coordinate descent

UR - http://www.scopus.com/inward/record.url?scp=84996538672&partnerID=8YFLogxK

U2 - 10.1137/16M1085905

DO - 10.1137/16M1085905

M3 - Article

AN - SCOPUS:84996538672

VL - 58

SP - 739

EP - 771

JO - SIAM Review

JF - SIAM Review

SN - 0036-1445

IS - 4

ER -