TY - GEN

T1 - SSRGD: Simple stochastic recursive gradient descent for escaping saddle points

AU - Li, Zhize

N1 - KAUST Repository Item: Exported on 2020-10-09

PY - 2019/1/1

Y1 - 2019/1/1

N2 - We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an (?, d)-second-order stationary point with Oe(vn/?2 + vn/d4 + n/d3) stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an ?-first-order stationary point with O(n + vn/?2) stochastic gradients. These results are almost optimal since Fang et al. [11] provided a lower bound ?(vn/?2) for finding even just an ?-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [4]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.

AB - We analyze stochastic gradient algorithms for optimizing nonconvex problems. In particular, our goal is to find local minima (second-order stationary points) instead of just finding first-order stationary points which may be some bad unstable saddle points. We show that a simple perturbed version of stochastic recursive gradient descent algorithm (called SSRGD) can find an (?, d)-second-order stationary point with Oe(vn/?2 + vn/d4 + n/d3) stochastic gradient complexity for nonconvex finite-sum problems. As a by-product, SSRGD finds an ?-first-order stationary point with O(n + vn/?2) stochastic gradients. These results are almost optimal since Fang et al. [11] provided a lower bound ?(vn/?2) for finding even just an ?-first-order stationary point. We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e.g., Neon2 [4]). Moreover, the simple SSRGD algorithm gets a simpler analysis. Besides, we also extend our results from nonconvex finite-sum problems to nonconvex online (expectation) problems, and prove the corresponding convergence results.

UR - http://hdl.handle.net/10754/665498

UR - http://papers.neurips.cc/paper/8431-ssrgd-simple-stochastic-recursive-gradient-descent-for-escaping-saddle-points.pdf

UR - http://www.scopus.com/inward/record.url?scp=85090177315&partnerID=8YFLogxK

M3 - Conference contribution

BT - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019

PB - Neural information processing systems foundation

ER -