Scholars@Duke publication: How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks

How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks

Publication , Conference

Zhou, M; Ge, R

Published in: Advances in Neural Information Processing Systems

January 1, 2024

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. We further strengthen this local convergence analysis by incorporating early-stage feature learning analysis. Our results demonstrate that feature learning not only happens at the initial gradient steps, but can also occur towards the end of training.

Duke Scholars

Author Rong Ge Computer Science

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2024

Volume

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhou, M., & Ge, R. (2024). How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks. In Advances in Neural Information Processing Systems (Vol. 37).

Zhou, M., and R. Ge. “How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks.” In Advances in Neural Information Processing Systems, Vol. 37, 2024.

Zhou M, Ge R. How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks. In: Advances in Neural Information Processing Systems. 2024.

Zhou, M., and R. Ge. “How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks.” Advances in Neural Information Processing Systems, vol. 37, 2024.

Zhou M, Ge R. How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks. Advances in Neural Information Processing Systems. 2024.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2024

Volume

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology