Achieving scalability in a k-nn multi-GPU network service with centaur

Amir Watad, Alexander Libov, Ohad Shacham, Edward Bortnikov, Mark Silberstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Centaur is a GPU-centric architecture for building a low-latency approximate k-Nearest-Neighbors network server. We implement a multi-GPU distributed data flow runtime which enables efficient and scalable network request processing on GPUs. The runtime eliminates GPU management overheads from the CPU, making the server throughput and response time largely agnostic to the CPU load, speed or the number of dedicated CPU cores. Our experiments systems show that our server achieves near-perfect scaling for 16 GPUs, beating the throughput of a highly-optimized CPU-driven server by 35% while maintaining about 2msec average request latency. Furthermore, it requires only a single CPU core to run, achieving over an order of magnitude higher throughput than the standard CPU-driven server architecture in this setting.

Original languageEnglish
Title of host publicationProceedings - 2019 28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
Pages244-256
Number of pages13
ISBN (Electronic)9781728136134
DOIs
StatePublished - Sep 2019
Event28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019 - Seattle, United States
Duration: 21 Sep 201925 Sep 2019

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2019-September

Conference

Conference28th International Conference on Parallel Architectures and Compilation Techniques, PACT 2019
Country/TerritoryUnited States
CitySeattle
Period21/09/1925/09/19

Keywords

  • GPU
  • Parallel Computing

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this