TY - GEN
T1 - Trading fences with RMRs and separating memory models
AU - Attiya, Hagit
AU - Hendler, Danny
AU - Woelfel, Philipp
N1 - Publisher Copyright: © Copyright 2015 ACM.
PY - 2015/7/21
Y1 - 2015/7/21
N2 - Out-of-order execution of instructions is a common optimization technique for multicores and multiprocessors, which is governed by the memory model of the architecture. Relatively strong memory models, like TSO (supported by x86 and AMD), only allow reads to bypass earlier writes, while other models, like RMO (supported by ARM, POWER and Alpha) and PSO (supported by older SPARC), also allow the reordering of writes to different locations. These reorderings can be prevented by the use of costly fence instructions. In this paper we prove that when writes can be reordered (e.g, in RMO or even PSO), there is a tradeoff between the number of fences, f, and the number of remote memory references (RMRs), r, for a large class of objects, including locks, counters and queues: f(log r f + 1) ε Ω (log n) : For example, when one of these objects is implemented using a constant number of fences (e.g., in the Bakery lock), the tradeoff implies that a linear number of RMRs is required (as indeed is the case with the Bakery lock). This gives a complexity separation between the memory models that allow write reordering and those that prohibit it, since a recent paper shows that a lock can be implemented in the stronger TSO memory model, with a small, constant number of fences, and a logarithmic number of RMRs. The lower bound uses an information theoretic argument, relating the encoding of n! distinguishable executions to the number of fences and RMRs performed in the course of these executions. We also present a family of algorithms matching the lower bound, which explicitly enforce the required ordering, and hence, are correct even with weak memory models. This shows that the tradeoff is tight, and indicates that for many important objects, fences are mostly needed for avoiding reordering of writes.
AB - Out-of-order execution of instructions is a common optimization technique for multicores and multiprocessors, which is governed by the memory model of the architecture. Relatively strong memory models, like TSO (supported by x86 and AMD), only allow reads to bypass earlier writes, while other models, like RMO (supported by ARM, POWER and Alpha) and PSO (supported by older SPARC), also allow the reordering of writes to different locations. These reorderings can be prevented by the use of costly fence instructions. In this paper we prove that when writes can be reordered (e.g, in RMO or even PSO), there is a tradeoff between the number of fences, f, and the number of remote memory references (RMRs), r, for a large class of objects, including locks, counters and queues: f(log r f + 1) ε Ω (log n) : For example, when one of these objects is implemented using a constant number of fences (e.g., in the Bakery lock), the tradeoff implies that a linear number of RMRs is required (as indeed is the case with the Bakery lock). This gives a complexity separation between the memory models that allow write reordering and those that prohibit it, since a recent paper shows that a lock can be implemented in the stronger TSO memory model, with a small, constant number of fences, and a logarithmic number of RMRs. The lower bound uses an information theoretic argument, relating the encoding of n! distinguishable executions to the number of fences and RMRs performed in the course of these executions. We also present a family of algorithms matching the lower bound, which explicitly enforce the required ordering, and hence, are correct even with weak memory models. This shows that the tradeoff is tight, and indicates that for many important objects, fences are mostly needed for avoiding reordering of writes.
KW - Fences
KW - Shared memory
KW - Total store ordering
UR - http://www.scopus.com/inward/record.url?scp=84957689910&partnerID=8YFLogxK
U2 - https://doi.org/10.1145/2767386.2767427
DO - https://doi.org/10.1145/2767386.2767427
M3 - منشور من مؤتمر
VL - 2015-July
T3 - Proceedings of the Annual ACM Symposium on Principles of Distributed Computing
SP - 173
EP - 182
BT - PODC 2015 - Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing
T2 - ACM Symposium on Principles of Distributed Computing, PODC 2015
Y2 - 21 July 2015 through 23 July 2015
ER -