Abstract
The realization of classification tasks using deep learning is a primary goal of artificial intelligence; however, its possible universal behavior remains unexplored. Herein, we demonstrate a scaling behavior for the test error, ϵ, as a function of the number of classified labels, K. For trained utmost deep architectures on CIFAR-100 ϵ(K)∝Kρ with ρ∼1, and in case of reduced deep architectures, ρ continuously decreases until a crossover to ϵ(K)∝log(K) is observed for shallow architectures. A similar crossover is observed for shallow architectures, where the number of filters in the convolutional layers is proportionally increased. This unified the scaling behavior of deep and shallow architectures, which yields a reduced latency method. The dependence of Δϵ/ΔK on the trained architecture is expected to be crucial in learning scenarios involving dynamic number of labels.
| Original language | English |
|---|---|
| Article number | 129909 |
| Number of pages | 10 |
| Journal | Physica A: Statistical Mechanics and its Applications |
| Volume | 646 |
| DOIs | |
| State | Published - 15 Jul 2024 |
Keywords
- Deep learning
- Machine learning
- Shallow learning
All Science Journal Classification (ASJC) codes
- Statistical and Nonlinear Physics
- Statistics and Probability
Fingerprint
Dive into the research topics of 'Scaling in Deep and Shallow Learning Architectures'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver