Abstract
Training neural networks on large distributed clusters has become a common practice due to the size and complexity of recent neural networks. These high-end clusters of advanced computational devices cooperate together to reduce the neural network training duration. In practice, training at linear scalability with respect to the number of devices is difficult, due to communication overheads. These communication overheads often cause long idle times for the computational devices. In this paper, we propose LAGA (Lagged AllReduce with Gradient Accumulation): a hybrid technique that combines the best of synchronous and asynchronous approaches, that scales linearly. LAGA reduces the device idle time by accumulating locally computed gradients and executing the communications in the background. We demonstrate the effectiveness of LAGA in both final accuracy and scalability on the ImageNet dataset, where LAGA achieves a speedup of up to 2. 96x and 5. 24x less idle time. Finally, we provide convergence guarantees for LAGA under the non-convex setting.
Original language | English |
---|---|
Title of host publication | Proceedings - 21st IEEE International Conference on Data Mining, ICDM 2021 |
Editors | James Bailey, Pauli Miettinen, Yun Sing Koh, Dacheng Tao, Xindong Wu |
Pages | 171-180 |
Number of pages | 10 |
ISBN (Electronic) | 9781665423984 |
DOIs | |
State | Published - 2021 |
Event | 21st IEEE International Conference on Data Mining, ICDM 2021 - Virtual, Online, New Zealand Duration: 7 Dec 2021 → 10 Dec 2021 |
Publication series
Name | Proceedings - IEEE International Conference on Data Mining, ICDM |
---|---|
Volume | 2021-December |
Conference
Conference | 21st IEEE International Conference on Data Mining, ICDM 2021 |
---|---|
Country/Territory | New Zealand |
City | Virtual, Online |
Period | 7/12/21 → 10/12/21 |
Keywords
- neural networks
- non-convex
- optimization
All Science Journal Classification (ASJC) codes
- General Engineering