The Implicit Bias of Gradient Descent on Separable Data

Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

Research output: Contribution to journalArticlepeer-review


We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization in more complex models and with other optimization methods.

Original languageEnglish
Pages (from-to)1-57
JournalJournal of Machine Learning Research
StatePublished - 1 Nov 2018


  • generalization
  • gradient descent
  • implicit regularization
  • logistic regression
  • margin

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence


Dive into the research topics of 'The Implicit Bias of Gradient Descent on Separable Data'. Together they form a unique fingerprint.

Cite this