Skip to main navigation Skip to search Skip to main content

Implicit Regularization in ReLU Networks with the Square Loss

Research output: Contribution to journalConference articlepeer-review

Abstract

Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the “balancedness” property identified in Du et al. (2018). Our results suggest that a more general framework than the one considered so far may be needed to understand implicit regularization for nonlinear predictors, and provides some clues on what this framework should be.

Original languageEnglish
Pages (from-to)4224-4258
Number of pages35
JournalProceedings of Machine Learning Research
Volume134
StatePublished - 2021
Event34th Conference on Learning Theory, COLT 2021 - Boulder, United States
Duration: 15 Aug 202119 Aug 2021

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Implicit Regularization in ReLU Networks with the Square Loss'. Together they form a unique fingerprint.

Cite this