TY - GEN
T1 - Lynx
T2 - 25th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2020
AU - Tork, Maroun
AU - Maudlej, Lina
AU - Silberstein, Mark
N1 - Publisher Copyright: © 2020 Association for Computing Machinery.
PY - 2020/3/9
Y1 - 2020/3/9
N2 - This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve the performance and efficiency of hardware-accelerated computing services in data centers. We propose Lynx, an accelerator-centric network server architecture that offloads the server data and control planes to the SmartNIC, and enables direct networking from accelerators via a lightweight hardware-friendly I/O mechanism. Lynx enables the design of hardware-accelerated network servers that run without CPU involvement, freeing CPU cores and improving performance isolation for accelerated services. It is portable across accelerator architectures and allows the management of both local and remote accelerators, seamlessly scaling beyond a single physical machine. We implement and evaluate Lynx on GPUs and the Intel Visual Compute Accelerator, as well as two SmartNIC architectures - one with an FPGA, and another with an 8-core ARM processor. Compared to a traditional host-centric approach, Lynx achieves over 4× higher throughput for a GPU-centric face verification server, where it is used for GPU communications with an external database, and 25% higher throughput for a GPU-accelerated neural network inference service. For this workload, we show that a single SmartNIC may drive 4 local and 8 remote GPUs while achieving linear performance scaling without using the host CPU.
AB - This paper explores new opportunities afforded by the growing deployment of compute and I/O accelerators to improve the performance and efficiency of hardware-accelerated computing services in data centers. We propose Lynx, an accelerator-centric network server architecture that offloads the server data and control planes to the SmartNIC, and enables direct networking from accelerators via a lightweight hardware-friendly I/O mechanism. Lynx enables the design of hardware-accelerated network servers that run without CPU involvement, freeing CPU cores and improving performance isolation for accelerated services. It is portable across accelerator architectures and allows the management of both local and remote accelerators, seamlessly scaling beyond a single physical machine. We implement and evaluate Lynx on GPUs and the Intel Visual Compute Accelerator, as well as two SmartNIC architectures - one with an FPGA, and another with an 8-core ARM processor. Compared to a traditional host-centric approach, Lynx achieves over 4× higher throughput for a GPU-centric face verification server, where it is used for GPU communications with an external database, and 25% higher throughput for a GPU-accelerated neural network inference service. For this workload, we show that a single SmartNIC may drive 4 local and 8 remote GPUs while achieving linear performance scaling without using the host CPU.
UR - http://www.scopus.com/inward/record.url?scp=85082391341&partnerID=8YFLogxK
U2 - 10.1145/3373376.3378528
DO - 10.1145/3373376.3378528
M3 - منشور من مؤتمر
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 117
EP - 131
BT - ASPLOS 2020 - 25th International Conference on Architectural Support for Programming Languages and Operating Systems
Y2 - 16 March 2020 through 20 March 2020
ER -