TY - GEN
T1 - DiDi
T2 - 20th International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
AU - Villavieja, Carlos
AU - Karakostas, Vasileios
AU - Vilanova, Lluis
AU - Etsion, Yoav
AU - Ramirez, Alex
AU - Mendelson, Avi
AU - Navarro, Nacho
AU - Cristal, Adrián
AU - Unsal, Osman S.
PY - 2011
Y1 - 2011
N2 - Translation Lookaside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they are looked up on every memory access, are paramount to performance scalability. The emergence of chipmultiprocessors (CMPs) with per-core TLBs, has brought the problem of TLB coherence to front stage. TLBs are kept coherent at the software-level by the operating system (OS). Whenever the OS modifies page permissions in a page table, it must initiate a coherency transaction among TLBs, a process known as a TLB shootdown. Current CMPs rely on the OS to approximate the set of TLBs caching a mapping and synchronize TLBs using costly Inter-Proceessor Interrupts (IPIs) and software handlers. In this paper, we characterize the impact of TLB shootdowns on multiprocessor performance and scalability, and present the design of a scalable TLB coherency mechanism. First, we show that both TLB shootdown cost and frequency increase with the number of processors and project that softwarebased TLB shootdowns would thwart the performance of large multiprocessors. We then present a scalable architectural mechanism that couples a shared TLB directory with load/store queue support for lightweight TLB invalidation, and thereby eliminates the need for costly IPIs. Finally, we show that the proposed mechanism reduces the fraction of machine cycles wasted on TLB shootdowns by an order of magnitude.
AB - Translation Lookaside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they are looked up on every memory access, are paramount to performance scalability. The emergence of chipmultiprocessors (CMPs) with per-core TLBs, has brought the problem of TLB coherence to front stage. TLBs are kept coherent at the software-level by the operating system (OS). Whenever the OS modifies page permissions in a page table, it must initiate a coherency transaction among TLBs, a process known as a TLB shootdown. Current CMPs rely on the OS to approximate the set of TLBs caching a mapping and synchronize TLBs using costly Inter-Proceessor Interrupts (IPIs) and software handlers. In this paper, we characterize the impact of TLB shootdowns on multiprocessor performance and scalability, and present the design of a scalable TLB coherency mechanism. First, we show that both TLB shootdown cost and frequency increase with the number of processors and project that softwarebased TLB shootdowns would thwart the performance of large multiprocessors. We then present a scalable architectural mechanism that couples a shared TLB directory with load/store queue support for lightweight TLB invalidation, and thereby eliminates the need for costly IPIs. Finally, we show that the proposed mechanism reduces the fraction of machine cycles wasted on TLB shootdowns by an order of magnitude.
UR - http://www.scopus.com/inward/record.url?scp=84856515634&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/PACT.2011.65
DO - https://doi.org/10.1109/PACT.2011.65
M3 - منشور من مؤتمر
SN - 9780769545660
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 340
EP - 349
BT - Proceedings - 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
Y2 - 10 October 2011 through 14 October 2011
ER -