Variance-Reduced Methods for Machine Learning

Robert M. Gower, Mark Schmidt, Francis Bach, Peter Richtarik

Research output: Contribution to journalArticlepeer-review

66 Scopus citations

Abstract

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight years have seen an exciting new development: Variance reduction for stochastic optimization methods. These variance-reduced (VR) methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory and practice. These speedups underline the surge of interest in VR methods and the fast-growing body of work on this topic. This review covers the key principles and main developments behind VR methods for optimization with finite data sets and is aimed at nonexpert readers. We focus mainly on the convex setting and leave pointers to readers interested in extensions for minimizing nonconvex functions.
Original languageEnglish (US)
Pages (from-to)1968-1983
Number of pages16
JournalProceedings of the IEEE
Volume108
Issue number11
DOIs
StatePublished - Oct 16 2020

Bibliographical note

KAUST Repository Item: Exported on 2020-11-16
Acknowledgements: The authors would like to thank Quanquan Gu, Julien Mairal, Tong Zhang, and Lin Xiao for valuable suggestions and comments on an earlier draft of this article. In particular, Quanquan’s recommendations for the nonconvex section improved the organization of our Section IV-G.

Fingerprint

Dive into the research topics of 'Variance-Reduced Methods for Machine Learning'. Together they form a unique fingerprint.

Cite this