TY - GEN
T1 - Implicit Regularization Towards Rank Minimization in ReLU Networks
AU - Timor, Nadav
AU - Vardi, Gal
AU - Shamir, Ohad
N1 - Publisher Copyright: © 2023 N. Timor, G. Vardi & O. Shamir.
PY - 2023
Y1 - 2023
N2 - We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for “most” datasets of size 2). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.
AB - We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for “most” datasets of size 2). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions in several reasonable settings.
UR - http://www.scopus.com/inward/record.url?scp=85164572871&partnerID=8YFLogxK
M3 - منشور من مؤتمر
VL - 201
T3 - Proceedings of Machine Learning Research
SP - 1429
EP - 1459
BT - Proceedings of the 34th International Conference on Algorithmic Learning Theory
A2 - Agrawal, Shpira
A2 - Orabona, Francesco
T2 - 34th International Conference onAlgorithmic Learning Theory, ALT 2023
Y2 - 20 February 2023 through 23 February 2023
ER -