SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION

Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani

Research output: Contribution to journalArticlepeer-review

Abstract

Interpolators-estimators that achieve zero training error-have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ℓ2 norm ("ridgeless") interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors xi ∈ Rp are obtained by applying a linear transform to a vector of i.i.d. entries, xi = Σ1/2zi (with zi ∈ Rp); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi = φ(Wzi ) (with zi ∈ Rd , W ∈ Rp×d a matrix of i.i.d. entries, and φ an activation function acting componentwise on Wzi ). We recover-in a precise quantitative way-several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent"behavior of the prediction risk, and the potential benefits of overparametrization.

Original languageEnglish
Pages (from-to)949-986
Number of pages38
JournalAnnals of Statistics
Volume50
Issue number2
DOIs
StatePublished - Apr 2022

Keywords

  • Regression
  • interpolation
  • overparametrization
  • random matrix theory
  • ridge regression

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION'. Together they form a unique fingerprint.

Cite this