We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective can be written as a sum of smooth functions, local to each worker, and a nonsmooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines' computational power, and functions' smoothness. A unique feature is that the step sizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the merits of the proposed method.
|Original language||English (US)|
|Number of pages||27|
|Journal||SIAM Journal on Optimization|
|State||Published - Mar 17 2020|
Bibliographical noteKAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank Robert Gower for valuable comments on the first versions of this paper.