Randomized block-coordinate adaptive algorithms for nonconvex optimization problems
Nonconvex optimization problems have always been one focus in deep learning, in which many fast adaptive algorithms based on momentum are applied. However, the full gradient computation of high-dimensional feature vector in the above tasks become prohibitive. To reduce the computation cost for optimizers on nonconvex optimization problems typically seen in deep learning, this work proposes a randomized block-coordinate adaptive optimization algorithm, named RAda, which randomly picks a block from the full coordinates of the parameter vector and then sparsely computes its gradient. We prove that RAda converges to a δ-accurate solution with the stochastic first-order complexity of O(1/δ2), where δ is the upper bound of the gradient's square, under nonconvex cases. Experiments on public datasets including CIFAR-10, CIFAR-100, and Penn TreeBank, verify that RAda outperforms the other compared algorithms in terms of the computational cost.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences
- 40 Engineering
- 09 Engineering
- 08 Information and Computing Sciences
Citation
Published In
DOI
ISSN
Publication Date
Volume
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences
- 40 Engineering
- 09 Engineering
- 08 Information and Computing Sciences