IMPLICIT BIAS IN LEAKY RELU NETWORKS TRAINED ON HIGH-DIMENSIONAL DATA

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

Research output: Contribution to conferencePaperpeer-review

Abstract

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning.In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data.For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two.Moreover, this network is an ℓ2-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor.For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.

Original languageEnglish
StatePublished - 2023
Externally publishedYes
Event11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda
Duration: 1 May 20235 May 2023
https://iclr.cc/Conferences/2023

Conference

Conference11th International Conference on Learning Representations, ICLR 2023
Country/TerritoryRwanda
CityKigali
Period1/05/235/05/23
Internet address

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'IMPLICIT BIAS IN LEAKY RELU NETWORKS TRAINED ON HIGH-DIMENSIONAL DATA'. Together they form a unique fingerprint.

Cite this