GPUrdma: GPU-side library for high performance networking from GPU kernels

Feras Daoud, Amir Watad, Mark Silberstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present GPUrdma, a GPU-side library for performing Remote Direct Memory Accesses (RDMA) across the network directly from GPU kernels. The library executes no code on CPU, directly accessing the Host Channel Adapter (HCA) Infiniband hardware for both control and data. Slow single-thread GPU performance and the intricacies of the GPU-to-network adapter interaction pose a significant challenge. We describe several design options and analyze their performance implications in detail. We achieve 5 sec one-way communication latency and up to 50Gbit/sec transfer bandwidth for messages from 16KB and larger between K40c NVIDIA GPUs across the network. Moreover, GPUrdma outperforms the CPU RDMA for smaller packets ranging from 2 to 1024 bytes by factor of 4:5× thanks to greater parallelism of transfer requests enabled by highly parallel GPU hardware. We use GPUrdma to implement a subset of the global address space programming interface (GPI) for point-to-point asynchronous RDMA messaging. We demonstrate our preliminary results using two simple applications - ping-pong and a multi-matrix-vector product with constant matrix and multiple vectors - each running on two different machines connected by Infiniband. Our basic pingpong implementation achieves 5% higher performance than the baseline using GPI-2. The improved ping-pong implementation with per-threadblock communication overlap enables further 20% improvement. The multi-matrix-vector product is up to 4:5 × faster thanks to higher throughput for small messages and the ability to keep the matrix in fast GPU shared memory while receiving new inputs. GPUrdma prototype is not yet suitable for production systems due to hardware constraints in the current generation of NVIDIA GPUs which we discuss in detail. However, our results highlight the great potential of GPU-side native networking, and encourage further research toward scalable, high-performance, heterogeneous networking infrastructure.

Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016
ISBN (Electronic)9781450343879
DOIs
StatePublished - 1 Jun 2016
Event6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - Kyoto, Japan
Duration: 1 Jun 2016 → …

Publication series

NameProceedings of the 6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016 - In conjunction with HPDC 2016

Conference

Conference6th International Workshop on Runtime and Operating Systems for Supercomputers, ROSS 2016
Country/TerritoryJapan
CityKyoto
Period1/06/16 → …

Keywords

  • GPGPUs
  • Networking
  • Operating Systems Design
  • accelerators

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'GPUrdma: GPU-side library for high performance networking from GPU kernels'. Together they form a unique fingerprint.

Cite this