Abstract
In multicores, performance-critical synchronization is increasingly performed in a lock-free manner using atomic instructions such as CAS or LL/SC. However, when many processors synchronize on the same variable, performance can still degrade significantly. Contending writes get serialized, creating a non-scalable condition. Past proposals that build hardware queues of synchronizing processors do not fundamentally solve this problem - -at best, they help to efficiently serialize the contending writes. This paper proposes a novel architecture that breaks the serialization of hardware queues and enables the queued processors to perform lock-free synchronization in parallel. The architecture, called CASPAR, is able to (1) execute the CASes in the queued-up processors in parallel through eager forwarding of expected values, and (2) validate the CASes in parallel and dequeue groups of processors at a time. The result is highly-scalable synchronization. We evaluate CASPAR with simulations of a 64-core chip. Compared to existing proposals with hardware queues, CASPAR improves the throughput of kernels by 32% on average, and reduces the execution time of the sections considered in lock-free versions of applications by 47% on average. This makes these sections 2.5x faster than in the original applications.
| Original language | English |
|---|---|
| Pages (from-to) | 789-804 |
| Number of pages | 16 |
| Journal | ACM SIGPLAN Notices |
| Volume | 51 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2016 |
Keywords
- Lock-free synchronization
- Serialization
All Science Journal Classification (ASJC) codes
- General Computer Science
Fingerprint
Dive into the research topics of 'CASPAR: Breaking serialization in lock-free multicore synchronization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver