Abstract
Over-parameterized residual networks are amongst the most successful convolutional neural architectures for image processing.Here we study their properties through their Gaussian Process and Neural Tangent kernels.We derive explicit formulas for these kernels, analyze their spectra and provide bounds on their implied condition numbers.Our results indicate that (1) with ReLU activation, the eigenvalues of these residual kernels decay polynomially at a similar rate as the same kernels when skip connections are not used, thus maintaining a similar frequency bias; (2) however, residual kernels are more locally biased.Our analysis further shows that the matrices obtained by these residual kernels yield favorable condition numbers at finite depths than those obtained without the skip connections, enabling therefore faster convergence of training with gradient descent.
| Original language | English |
|---|---|
| Number of pages | 38 |
| State | Published - 2023 |
| Event | 11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda Duration: 1 May 2023 → 5 May 2023 https://iclr.cc/Conferences/2023 |
Conference
| Conference | 11th International Conference on Learning Representations, ICLR 2023 |
|---|---|
| Country/Territory | Rwanda |
| City | Kigali |
| Period | 1/05/23 → 5/05/23 |
| Internet address |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language
Fingerprint
Dive into the research topics of 'A Kernel Perspective of Skip Connections in Convolutional Networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver