New Convergence Aspects of Stochastic Gradient Algorithms

Lam M. Nguyen, Phuong Ha Nguyen, Peter Richtarik, Katya Scheinberg, Martin Takac, Marten van Dijk

Research output: Contribution to journalArticlepeer-review

40 Scopus citations

Abstract

The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is violated for cases where the objective function is strongly convex. In Bottou et al. (2018), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. We show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime. We then move on to the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime in the case of diminished learning rate. It is well-known that SGD converges if a sequence of learning rates {ηt} satisfies P∞t=0 ηt → ∞ and P∞t=0 η2t < ∞.We show the convergence of SGD for strongly convex objective function without using bounded gradient assumption when {ηt} is a diminishing sequence and P∞ t=0 ηt → ∞. Inother words, we extend the current state-of-the-art class of learning rates satisfying the convergence of SGD.
Original languageEnglish (US)
JournalJournal of Machine Learning Research
Volume20
StatePublished - 2019

Bibliographical note

KAUST Repository Item: Exported on 2021-07-13
Acknowledgements: Phuong Ha Nguyen and Marten van Dijk were supported in part by AFOSR MURI under award number FA9550-14-1-0351. Katya Scheinberg was partially supported by NSF Grants CCF 16-18717 and CCF 17-40796. Martin Takáč was partially supported by the NSF Grant CCF-1618717, CMMI-1663256 and CCF-1740796

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Statistics and Probability
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'New Convergence Aspects of Stochastic Gradient Algorithms'. Together they form a unique fingerprint.

Cite this