TY - JOUR

T1 - Semi-Stochastic Gradient Descent Methods

AU - Konečný, Jakub

AU - Richtárik, Peter

N1 - Generated from Scopus record by KAUST IRTS on 2023-09-25

PY - 2017/5/23

Y1 - 2017/5/23

N2 - In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.

AB - In this paper we study the problem of minimizing the average of a large number of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. For strongly convex objectives, the method converges linearly. The total work needed for the method to output an epsilon-accurate solution in expectation, measured in the number of passes over data, is proportional to the condition number of the problem and inversely proportional to the number of functions forming the average. This is achieved by running the method with number of stochastic gradient evaluations per epoch proportional to conditioning of the problem. The SVRG method of Johnson and Zhang arises as a special case. To illustrate our theoretical results, S2GD only needs the workload equivalent to about 2.1 full gradient evaluations to find a 10e-6 accurate solution for a problem with 10e9 functions and a condition number of 10e3.

UR - http://journal.frontiersin.org/article/10.3389/fams.2017.00009/full

UR - http://www.scopus.com/inward/record.url?scp=85097311075&partnerID=8YFLogxK

U2 - 10.3389/fams.2017.00009

DO - 10.3389/fams.2017.00009

M3 - Article

SN - 2297-4687

VL - 3

JO - Frontiers in Applied Mathematics and Statistics

JF - Frontiers in Applied Mathematics and Statistics

ER -