Accelerated coordinate descent with arbitrary sampling and best rates for minibatches

Filip Hanzely, Peter Richtárik

Research output: Contribution to conferencePaperpeer-review

15 Scopus citations

Abstract

Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (ACD) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. While minibatch variants of ACD are more popular and relevant in practice, there is no importance sampling for ACD that outperforms the standard uniform minibatch sampling. Through insights enabled by our general analysis, we design new importance sampling for minibatch ACD which significantly outperforms previous state-of-the-art minibatch ACD in practice. We prove a rate that is at most O(pτ) times worse than the rate of minibatch ACD with uniform sampling, but can be O(n/τ) times better, where τ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose τ ≪ n, and often τ = O(1), our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated CD as well, achieving improvements on previous best rates.

Original languageEnglish (US)
StatePublished - 2020
Event22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019 - Naha, Japan
Duration: Apr 16 2019Apr 18 2019

Conference

Conference22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019
Country/TerritoryJapan
CityNaha
Period04/16/1904/18/19

Bibliographical note

Publisher Copyright:
© 2019 by the author(s).

ASJC Scopus subject areas

  • Artificial Intelligence
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Accelerated coordinate descent with arbitrary sampling and best rates for minibatches'. Together they form a unique fingerprint.

Cite this