Abstract
We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies. Our results lead to specific predictions of the time it will take a network to learn functions of varying frequency. These predictions match the empirical behavior of both shallow and deep networks.
Original language | English |
---|---|
Title of host publication | Advances in Neural Information Processing Systems |
Editors | H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alche-Buc, E. Fox, R. Garnett |
Pages | 4763-4772 |
Number of pages | 11 |
Volume | 32 |
State | Published - 2019 |
Event | 33rd Conference on Neural Information Processing Systems - Vancouver, Canada Duration: 8 Dec 2019 → 14 Dec 2019 Conference number: 33rd |
Conference
Conference | 33rd Conference on Neural Information Processing Systems |
---|---|
Abbreviated title | NeurIPS 2019 |
Country/Territory | Canada |
City | Vancouver |
Period | 8/12/19 → 14/12/19 |