Implicit Bias of the Step Size in Linear Diagonal Neural Networks

Mor Shpigel Nacson, Kavya Ravichandran, Nathan Srebro, Daniel Soudry

Research output: Contribution to journalConference articlepeer-review

Abstract

Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime — even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.
Original languageEnglish
Pages (from-to)16270-16295
Number of pages26
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 1 May 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022
https://proceedings.mlr.press/v162/

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Implicit Bias of the Step Size in Linear Diagonal Neural Networks'. Together they form a unique fingerprint.

Cite this