TY - JOUR
T1 - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization
AU - Horvath, Samuel
AU - Lei, Lihua
AU - Richtarik, Peter
AU - Jordan, Michael I.
N1 - KAUST Repository Item: Exported on 2023-05-18
PY - 2022/5/12
Y1 - 2022/5/12
N2 - Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the “geometrization” technique introduced by [L. Lei and M. I. Jordan, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 148--156] and the SARAH algorithm of [L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 2613--2621], we propose the geometrized SARAH algorithm for nonconvex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak--Łojasiewicz (PL) constant, if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
AB - Adaptivity is an important yet under-studied property in modern optimization theory. The gap between the state-of-the-art theory and the current practice is striking in that algorithms with desirable theoretical guarantees typically involve drastically different settings of hyperparameters, such as step size schemes and batch sizes, in different regimes. Despite the appealing theoretical results, such divisive strategies provide little, if any, insight to practitioners to select algorithms that work broadly without tweaking the hyperparameters. In this work, blending the “geometrization” technique introduced by [L. Lei and M. I. Jordan, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017, pp. 148--156] and the SARAH algorithm of [L. M. Nguyen, J. Liu, K. Scheinberg, and M. Takáč, Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 2613--2621], we propose the geometrized SARAH algorithm for nonconvex finite-sum and stochastic optimization. Our algorithm is proved to achieve adaptivity to both the magnitude of the target accuracy and the Polyak--Łojasiewicz (PL) constant, if present. In addition, it achieves the best-available convergence rate for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
UR - http://hdl.handle.net/10754/661763
UR - https://epubs.siam.org/doi/10.1137/21M1394308
U2 - 10.1137/21M1394308
DO - 10.1137/21M1394308
M3 - Article
SN - 2577-0187
VL - 4
SP - 634
EP - 648
JO - SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE
JF - SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE
IS - 2
ER -