Implicit Regularization Towards Rank Minimization in ReLU Networks

Nadav Timor, Gal Vardi, Ohad Shamir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for “most” datasets of size 2). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.

Original languageEnglish
Title of host publicationProceedings of the 34th International Conference on Algorithmic Learning Theory
EditorsShpira Agrawal, Francesco Orabona
Pages1429-1459
Number of pages31
Volume201
StatePublished - 2023
Event34th International Conference onAlgorithmic Learning Theory, ALT 2023 - Singapore, Singapore
Duration: 20 Feb 202323 Feb 2023

Publication series

NameProceedings of Machine Learning Research
ISSN (Print)2640-3498

Conference

Conference34th International Conference onAlgorithmic Learning Theory, ALT 2023
Country/TerritorySingapore
CitySingapore
Period20/02/2323/02/23

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Implicit Regularization Towards Rank Minimization in ReLU Networks'. Together they form a unique fingerprint.

Cite this