Better Methods and Theory for Federated Learning: Compression, Client Selection and Heterogeneity

  • Samuel Horvath

Student thesis: Doctoral Thesis


Federated learning (FL) is an emerging machine learning paradigm involving multiple clients, e.g., mobile phone devices, with an incentive to collaborate in solving a machine learning problem coordinated by a central server. FL was proposed in 2016 by Konecny et al. and McMahan et al. as a viable privacy-preserving alternative to traditional centralized machine learning since, by construction, the training data points are decentralized and never transferred by the clients to a central server. Therefore, to a certain degree, FL mitigates the privacy risks associated with centralized data collection. Unfortunately, optimization for FL faces several specific issues that centralized optimization usually does not need to handle. In this thesis, we identify several of these challenges and propose new methods and algorithms to address them, with the ultimate goal of enabling practical FL solutions supported with mathematically rigorous guarantees. In particular, in the first four chapters after the introduction, we focus on the communication bottleneck and devise novel compression mechanisms and tools that can provably accelerate the training process. In the sixth chapter, we address another significant challenge of FL: partial participation of clients in each round of the training process. More concretely, we propose the first importance client sampling strategy that is compatible with two core privacy requirements of FL: secure aggregation and statelessness of clients. The seventh chapter is dedicated to another challenge in the cross-device FL setting—system heterogeneity, i.e., the diversity in clients’ processing capabilities and network bandwidth, and the communication overhead caused by slow connections. To tackle this, we introduce the ordered dropout (OD) mechanism. OD promotes an ordered, nested representation of knowledge in neural networks and enables the extraction of lower-footprint sub-models without retraining, which offers fair and accurate learning in this challenging FL setting. Lastly, in the eigh, we study several key algorithmic ingredients behind some of the most popular methods for cross-device FL aimed to tackle heterogeneity and communication bottleneck. In particular, we propose a general framework for analyzing methods employing all these techniques simultaneously, which helps us better understand their combined effect. Our approach identifies several inconsistencies and enables better utilization of these components, including the popular practice of running multiple local training steps before aggregation.
Date of AwardJun 27 2022
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Sciences and Engineering
SupervisorPeter Richtarik (Supervisor)

Cite this