Optimal approximation rate of ReLU networks in terms of width and depth
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width O(max{d⌊N1/d⌋,N+2}) and depth O(L) can approximate a Hölder continuous function on [0,1]d with an approximation rate O(λd(N2L2lnN)−α/d), where α∈(0,1] and λ>0 are Hölder order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function f on [0,1]d, the approximation rate becomes O(dωf((N2L2lnN)−1/d)), where ωf(⋅) is the modulus of continuity. We also extend our analysis to any continuous function f on a bounded set. Particularly, if ReLU networks with depth 31 and width O(N) are used to approximate one-dimensional Lipschitz continuous functions on [0,1] with a Lipschitz constant λ>0, the approximation rate in terms of the total number of parameters, W=O(N2), becomes [Formula presented], which has not been discovered in the literature for fixed-depth ReLU networks.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- General Mathematics
- 4901 Applied mathematics
- 0102 Applied Mathematics
- 0101 Pure Mathematics
Citation
Published In
DOI
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- General Mathematics
- 4901 Applied mathematics
- 0102 Applied Mathematics
- 0101 Pure Mathematics