A distributed flexible delay-tolerant proximal gradient algorithm

Konstantin Mishchenko, Franck Iutzeler, Jérôme Malick

Research output: Contribution to journalArticlepeer-review

13 Scopus citations


We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective can be written as a sum of smooth functions, local to each worker, and a nonsmooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines' computational power, and functions' smoothness. A unique feature is that the step sizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the merits of the proposed method.
Original languageEnglish (US)
Pages (from-to)933-959
Number of pages27
JournalSIAM Journal on Optimization
Issue number1
StatePublished - Mar 17 2020

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank Robert Gower for valuable comments on the first versions of this paper.


Dive into the research topics of 'A distributed flexible delay-tolerant proximal gradient algorithm'. Together they form a unique fingerprint.

Cite this