Skip to main navigation Skip to search Skip to main content

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the dynamics of gradient descent on objective functions of the form f(Qki=1 wi) (with respect to scalar parameters w1, . . ., wk), which arise in the context of training depth-k linear neural networks. We prove that for standard random initializations, and under mild assumptions on f, the number of iterations required for convergence scales exponentially with the depth k. We also show empirically that this phenomenon can occur in higher dimensions, where each wi is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where k is large.

Original languageEnglish
Pages (from-to)2691-2713
Number of pages23
JournalProceedings of Machine Learning Research
Volume99
StatePublished - 2019
Event32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States
Duration: 25 Jun 201928 Jun 2019
https://proceedings.mlr.press/v99

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks'. Together they form a unique fingerprint.

Cite this