Nona: A Stochastic Congestion-Aware Job Scheduler for Real-Time Inference Queries

Benoit Pit-Claudel, Derya Malak, Alejandro Cohen, Muriel Medard, Manya Ghobadi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a novel queueing-theoretic approach to enable stochastic congestion-aware scheduling for distributed machine learning inference queries. Our proposed framework, called Nona, combines a stochastic scheduler with an offline optimization formulation rooted in queueing-theoretic principles to minimize the average completion time of heterogeneous inference queries. At its core, Nona incorporates the fundamental tradeoffs between compute and network resources to make efficient scheduling decisions. Nona's formulation uses the Pollaczek-Khinchine formula to estimate queueing latency and to predict system congestion. Builind upon conventional Jackson networks, it captures the dependency between the computation and communication operations of interfering jobs. From this formulation, we derive an optimization problem and use its results as inputs for the scheduler. We introduce a novel graph contraction procedure to enable cloud providers to solve Nona's optimization formulation in practical settings. We evaluate Nona with real-world machine learning models (AlexNet, ResNet, DenseNet, VGG, and GPT2) and demonstrate that Nona outperforms state-of-the-art schedulers by up to 350×.

Original languageAmerican English
Title of host publication2024 IEEE 13th International Conference on Cloud Networking, CloudNet 2024
EditorsDiogo Menezes Ferrazani Mattos, Igor Monteiro Moraes, Thi Mai Trang Nguyen, Rodrigo de Souza Couto, Marcelo Goncalves Rubinstein
ISBN (Electronic)9798350376562
DOIs
StatePublished - 1 Jan 2024
Event13th IEEE International Conference on Cloud Networking, CloudNet 2024 - Rio de Janeiro, Brazil
Duration: 27 Nov 202429 Nov 2024

Publication series

Name2024 IEEE 13th International Conference on Cloud Networking, CloudNet 2024

Conference

Conference13th IEEE International Conference on Cloud Networking, CloudNet 2024
Country/TerritoryBrazil
CityRio de Janeiro
Period27/11/2429/11/24

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Nona: A Stochastic Congestion-Aware Job Scheduler for Real-Time Inference Queries'. Together they form a unique fingerprint.

Cite this