Abstract
We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We derive explicit generalization lower bounds for general biased estimators, in the cases of two-layered networks. For linear activation function, the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. In addition, we derive bounds for unbiased estimators, which show that the latter have unacceptable performance for truly nonlinear networks. The analysis uses elements from the theory of large random matrices.
Original language | English |
---|---|
Pages (from-to) | 7956-7970 |
Number of pages | 15 |
Journal | IEEE Transactions on Information Theory |
Volume | 68 |
Issue number | 12 |
Early online date | 11 Jul 2022 |
DOIs | |
State | Published - 1 Dec 2022 |
Keywords
- Cramer-Rao bound
- generalization error
- learning
- random matrices
All Science Journal Classification (ASJC) codes
- Information Systems
- Computer Science Applications
- Library and Information Sciences