TY - JOUR
T1 - Symmetry & critical points for a model shallow neural network
AU - Arjevani, Yossi
AU - Field, Michael
N1 - Funding Information: Part of this work was completed while YA was visiting the Simons Institute in 2019 for the Foundations of Deep Learning program. We thank Haggai Maron, Segol Nimrod, Ohad Shamir, Michal Shavit, and Daniel Soudry for helpful and insightful discussions. Special thanks also to Christian Bick for commenting on early versions of the manuscript, and to the referees for helpful comments, questions and suggestions. Publisher Copyright: © 2021
PY - 2021/12
Y1 - 2021/12
N2 - Using methods based on the analysis of real analytic functions, symmetry and equivariant bifurcation theory, we obtain sharp results on families of critical points of spurious minima that occur in optimization problems associated with fitting two-layer ReLU networks with k hidden neurons. The main mathematical result proved is to obtain power series representations of families of critical points of spurious minima in terms of 1/k (coefficients independent of k). We also give a path based formulation that naturally connects the critical points with critical points of an associated linear, but highly singular, optimization problem. These critical points closely approximate the critical points in the original problem. The mathematical theory is used to derive results on the original problem in neural nets. For example, precise estimates for several quantities that show that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like k−1, in other cases the loss converges to a strictly positive constant.
AB - Using methods based on the analysis of real analytic functions, symmetry and equivariant bifurcation theory, we obtain sharp results on families of critical points of spurious minima that occur in optimization problems associated with fitting two-layer ReLU networks with k hidden neurons. The main mathematical result proved is to obtain power series representations of families of critical points of spurious minima in terms of 1/k (coefficients independent of k). We also give a path based formulation that naturally connects the critical points with critical points of an associated linear, but highly singular, optimization problem. These critical points closely approximate the critical points in the original problem. The mathematical theory is used to derive results on the original problem in neural nets. For example, precise estimates for several quantities that show that not all spurious minima are alike. In particular, we show that while the loss function at certain types of spurious minima decays to zero like k−1, in other cases the loss converges to a strictly positive constant.
KW - Critical points
KW - Power series representation
KW - ReLU activation
KW - Spurious minima
KW - Student–teacher network
KW - Symmetry breaking
UR - http://www.scopus.com/inward/record.url?scp=85114329465&partnerID=8YFLogxK
U2 - https://doi.org/10.1016/j.physd.2021.133014
DO - https://doi.org/10.1016/j.physd.2021.133014
M3 - Article
SN - 0167-2789
VL - 427
JO - Physica D: Nonlinear Phenomena
JF - Physica D: Nonlinear Phenomena
M1 - 133014
ER -