LAGA: Lagged AllReduce with Gradient Accumulation for Minimal Idle Time

Ido Hakimi, Rotem Zamir Aviv, Kfir Y. Levy, Assaf Schuster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Training neural networks on large distributed clusters has become a common practice due to the size and complexity of recent neural networks. These high-end clusters of advanced computational devices cooperate together to reduce the neural network training duration. In practice, training at linear scalability with respect to the number of devices is difficult, due to communication overheads. These communication overheads often cause long idle times for the computational devices. In this paper, we propose LAGA (Lagged AllReduce with Gradient Accumulation): a hybrid technique that combines the best of synchronous and asynchronous approaches, that scales linearly. LAGA reduces the device idle time by accumulating locally computed gradients and executing the communications in the background. We demonstrate the effectiveness of LAGA in both final accuracy and scalability on the ImageNet dataset, where LAGA achieves a speedup of up to 2. 96x and 5. 24x less idle time. Finally, we provide convergence guarantees for LAGA under the non-convex setting.

Original languageEnglish
Title of host publicationProceedings - 21st IEEE International Conference on Data Mining, ICDM 2021
EditorsJames Bailey, Pauli Miettinen, Yun Sing Koh, Dacheng Tao, Xindong Wu
Pages171-180
Number of pages10
ISBN (Electronic)9781665423984
DOIs
StatePublished - 2021
Event21st IEEE International Conference on Data Mining, ICDM 2021 - Virtual, Online, New Zealand
Duration: 7 Dec 202110 Dec 2021

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2021-December

Conference

Conference21st IEEE International Conference on Data Mining, ICDM 2021
Country/TerritoryNew Zealand
CityVirtual, Online
Period7/12/2110/12/21

Keywords

  • neural networks
  • non-convex
  • optimization

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'LAGA: Lagged AllReduce with Gradient Accumulation for Minimal Idle Time'. Together they form a unique fingerprint.

Cite this