Abstract
We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE, DASHA-MVR and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020). In particular, to achieve an ε-stationary point, and considering the random sparsifier RandK as an example, our methods compute the optimal number of gradients O(√m/ε√n) and O(σ/ε3/2n) in finite-sum and expectation form cases, respectively, while maintaining the SOTA communication complexity O(d/ε√n). Furthermore, unlike MARINA, the new methods DASHA, DASHA-PAGE and DASHA-MVR send compressed vectors only, which makes them more practical for federated learning. We extend our results to the case when the functions satisfy the Polyak-Łojasiewicz condition. Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models.
Original language | English (US) |
---|---|
State | Published - 2023 |
Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: May 1 2023 → May 5 2023 |
Conference
Conference | 11th International Conference on Learning Representations, ICLR 2023 |
---|---|
Country/Territory | Rwanda |
City | Kigali |
Period | 05/1/23 → 05/5/23 |
Bibliographical note
Publisher Copyright:© 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language