Abstract
The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning.In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two.Moreover, this network is an ℓ2-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor.For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.
| Original language | English |
|---|---|
| State | Published - 2023 |
| Externally published | Yes |
| Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: 1 May 2023 → 5 May 2023 https://iclr.cc/Conferences/2023 |
Conference
| Conference | 11th International Conference on Learning Representations, ICLR 2023 |
|---|---|
| Country/Territory | Rwanda |
| City | Kigali |
| Period | 1/05/23 → 5/05/23 |
| Internet address |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language
Fingerprint
Dive into the research topics of 'IMPLICIT BIAS IN LEAKY RELU NETWORKS TRAINED ON HIGH-DIMENSIONAL DATA'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver