EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression

Kaja Gruntkowska, Alexander Tyurin*, Peter Richtárik

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. We obtain novel methods supporting bidirectional compression (both from the server to the workers and vice versa) that enjoy new state-of-the-art theoretical communication complexity for convex and nonconvex problems. Our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. Moreover, in the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.

Original languageEnglish (US)
Pages11761-11807
Number of pages47
StatePublished - 2023
Event40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States
Duration: Jul 23 2023Jul 29 2023

Conference

Conference40th International Conference on Machine Learning, ICML 2023
Country/TerritoryUnited States
CityHonolulu
Period07/23/2307/29/23

Bibliographical note

Publisher Copyright:
© 2023 Proceedings of Machine Learning Research. All rights reserved.

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression'. Together they form a unique fingerprint.

Cite this