Scholars@Duke publication: Stochastic nested variance reduction for nonconvex optimization

Stochastic nested variance reduction for nonconvex optimization

Publication , Journal Article

Zhou, D; Xu, P; Gu, Q

Published in: Journal of Machine Learning Research

May 1, 2020

We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K + 1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ε-approximate first-order stationary point within Oe(n∧ε−2 + ε−3 ∧n1/2ε−2)1 number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+ n2/3ε−2) and that of SCSG O(n∧ε−2 + ε−10/3 ∧n2/3ε−2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms. Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG + Neon2finite algorithm achieves Oe(n1/2ε−2 + nε−H3 + n3/4ε−H7/2) gradient complexity to converge to an (ε, εH)-second-order stationary point, which outperforms SVRG+Neon2finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG + Neon2online achieves Oe(ε−3 + ε−H5 + ε−2ε−H3) gradient complexity, which is better than both SVRG + Neon2online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.

Duke Scholars

Author Pan Xu Biostatistics & Bioinformatics, Division of Integrative Geno ...

Published In

Journal of Machine Learning Research

EISSN

1533-7928

ISSN

1532-4435

Publication Date

May 1, 2020

Volume

Related Subject Headings

Artificial Intelligence & Image Processing
4905 Statistics
4611 Machine learning
17 Psychology and Cognitive Sciences
08 Information and Computing Sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhou, D., Xu, P., & Gu, Q. (2020). Stochastic nested variance reduction for nonconvex optimization. Journal of Machine Learning Research, 21.

Published In

Journal of Machine Learning Research

EISSN

1533-7928

ISSN

1532-4435

Publication Date

May 1, 2020

Volume

Related Subject Headings

Artificial Intelligence & Image Processing
4905 Statistics
4611 Machine learning
17 Psychology and Cognitive Sciences
08 Information and Computing Sciences