On the quality of the initial basin in overspecified neural networks

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the geometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger ("overspecified") networks, which accords with some recent empirical and theoretical observations.

Original languageAmerican English
Title of host publication33rd International Conference on Machine Learning, ICML 2016
EditorsMaria Florina Balcan, Kilian Q. Weinberger
Pages1191-1219
Number of pages29
ISBN (Electronic)9781510829008
StatePublished - 1 Jan 2016
Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
Duration: 19 Jun 201624 Jun 2016

Publication series

Name33rd International Conference on Machine Learning, ICML 2016
Volume2

Conference

Conference33rd International Conference on Machine Learning, ICML 2016
Country/TerritoryUnited States
CityNew York City
Period19/06/1624/06/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Computer Networks and Communications

Cite this