TY - GEN
T1 - DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate
AU - Soori, Saeed
AU - Mischenko, Konstantin
AU - Mokhtari, Aryan
AU - Dehnavi, Maryam Mehri
AU - Gurbuzbalaban, Mert
N1 - KAUST Repository Item: Exported on 2021-09-02
PY - 2020
Y1 - 2020
N2 - In this paper, we consider distributed algorithms for solving the empirical risk minimization problem under the master/worker
communication model. We develop a distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence. To our knowledge, this is the first distributed asynchronous algorithm with superlinear convergence guarantees. Our algorithm is communication-efficient in the sense that at every iteration the master node and workers communicate vectors of size O(p), where p is the dimension of the decision variable. The proposed method is based on a distributed asynchronous averaging scheme of decision vectors and gradients in a way to effectively capture the local Hessian information of the objective function. Our convergence theory supports asynchronous computations subject to both bounded delays and unbounded delays with a bounded time-average. Unlike in the majority of asynchronous optimization literature, we do not require choosing smaller stepsize when delays are huge. We provide numerical experiments that match our theoretical results and showcase significant improvement comparing to state-of-the-art distributed algorithms
AB - In this paper, we consider distributed algorithms for solving the empirical risk minimization problem under the master/worker
communication model. We develop a distributed asynchronous quasi-Newton algorithm that can achieve superlinear convergence. To our knowledge, this is the first distributed asynchronous algorithm with superlinear convergence guarantees. Our algorithm is communication-efficient in the sense that at every iteration the master node and workers communicate vectors of size O(p), where p is the dimension of the decision variable. The proposed method is based on a distributed asynchronous averaging scheme of decision vectors and gradients in a way to effectively capture the local Hessian information of the objective function. Our convergence theory supports asynchronous computations subject to both bounded delays and unbounded delays with a bounded time-average. Unlike in the majority of asynchronous optimization literature, we do not require choosing smaller stepsize when delays are huge. We provide numerical experiments that match our theoretical results and showcase significant improvement comparing to state-of-the-art distributed algorithms
UR - http://hdl.handle.net/10754/670893
UR - https://par.nsf.gov/servlets/purl/10256971
M3 - Conference contribution
BT - Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy
ER -