Abstract
Deep learning architectures comprising tens or even hundreds of convolutional and fully-connected hidden layers differ greatly from the shallow architecture of the brain. Here, we demonstrate that by increasing the relative number of filters per layer of a generalized shallow architecture, the error rates decay as a power law to zero. Additionally, a quantitative method to measure the performance of a single filter, shows that each filter identifies small clusters of possible output labels, with additional noise selected as labels outside the clusters. This average noise per filter also decays for a given generalized architecture as a power law with an increasing number of filters per layer, forming the underlying mechanism of efficient shallow learning. The results are supported by the training of the generalized LeNet-3, VGG-5, and VGG-16 on CIFAR-100 and suggest an increase in the noise power law exponent for deeper architectures. The presented underlying shallow learning mechanism calls for its further quantitative examination using various databases and shallow architectures.
Original language | English |
---|---|
Article number | 129513 |
Journal | Physica A: Statistical Mechanics and its Applications |
Volume | 635 |
DOIs | |
State | Published - 1 Feb 2024 |
Keywords
- Deep learning
- Machine learning
- Shallow learning
All Science Journal Classification (ASJC) codes
- Statistical and Nonlinear Physics
- Statistics and Probability