Abstract
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-Tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling for a fixed budget of optimization steps. We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of networks and datasets. Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-The-Art. Overall, this simple yet effective method enables faster training and better generalization by allowing more computational resources to be used concurrently.
Original language | English |
---|---|
Title of host publication | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 |
Pages | 8126-8135 |
Number of pages | 10 |
DOIs | |
State | Published - 2020 |
Event | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 - Virtual, Online, United States Duration: 14 Jun 2020 → 19 Jun 2020 |
Conference
Conference | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 14/06/20 → 19/06/20 |
All Science Journal Classification (ASJC) codes
- Software
- Computer Vision and Pattern Recognition