TY - GEN
T1 - Remote memory references at block granularity
AU - Attiya, Hagit
AU - Yavneh, Gili
N1 - Publisher Copyright: © 2017 Hagit Attiya and Gili Yavneh.
PY - 2018/3/1
Y1 - 2018/3/1
N2 - The cost of accessing shared objects that are stored in remote memory, while neglecting accesses to shared objects that are cached in the local memory, can be evaluated by the number of remote memory references (RMRs) in an execution. Two flavours of this measure - cache-coherent (CC) and distributed shared memory (DSM) - model two popular shared-memory architectures. The number of RMRs, however, does not take into account the granularity of memory accesses, namely, the fact that accesses to the shared memory are performed in blocks. This paper proposes a new measure, called block RMRs, counting the number of remote memory references while taking into account the fact that shared objects can be grouped into blocks. On the one hand, this measure reflects the fact that the RMR incurred for bringing a shared object to the local memory might save another RMR for bringing another object placed at the same block. On the other hand, this measure accounts for false sharing: the fact that an RMR may be incurred when accessing an object due to a concurrent access to another object in the same block. This paper proves that in both the CC and the DSM models, finding an optimal placement is NP-hard when objects have different sizes, even for two processes. In the CC model, finding an optimal placement, i.e., grouping of objects into blocks, is NP-hard when a block can store three objects or more; the result holds even if the sequence of accesses is known in advance. In the DSM model, the answer depends on whether there is an efficient mechanism to inform processes that the data in their local memory is no longer valid, i.e., cache coherence is supported. If coherence is supported with cheap invalidation, then finding an optimal solution is NP-hard. If coherence is not supported, an optimal placement can be achieved by placing each object in the memory of the process that accesses it most often, if the sequence of accesses is known in advance.
AB - The cost of accessing shared objects that are stored in remote memory, while neglecting accesses to shared objects that are cached in the local memory, can be evaluated by the number of remote memory references (RMRs) in an execution. Two flavours of this measure - cache-coherent (CC) and distributed shared memory (DSM) - model two popular shared-memory architectures. The number of RMRs, however, does not take into account the granularity of memory accesses, namely, the fact that accesses to the shared memory are performed in blocks. This paper proposes a new measure, called block RMRs, counting the number of remote memory references while taking into account the fact that shared objects can be grouped into blocks. On the one hand, this measure reflects the fact that the RMR incurred for bringing a shared object to the local memory might save another RMR for bringing another object placed at the same block. On the other hand, this measure accounts for false sharing: the fact that an RMR may be incurred when accessing an object due to a concurrent access to another object in the same block. This paper proves that in both the CC and the DSM models, finding an optimal placement is NP-hard when objects have different sizes, even for two processes. In the CC model, finding an optimal placement, i.e., grouping of objects into blocks, is NP-hard when a block can store three objects or more; the result holds even if the sequence of accesses is known in advance. In the DSM model, the answer depends on whether there is an efficient mechanism to inform processes that the data in their local memory is no longer valid, i.e., cache coherence is supported. If coherence is supported with cheap invalidation, then finding an optimal solution is NP-hard. If coherence is not supported, an optimal placement can be achieved by placing each object in the memory of the process that accesses it most often, if the sequence of accesses is known in advance.
KW - Cache coherence
KW - Distributed shared memory
KW - False sharing
KW - NP-hardness
UR - http://www.scopus.com/inward/record.url?scp=85045644878&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.OPODIS.2017.18
DO - 10.4230/LIPIcs.OPODIS.2017.18
M3 - منشور من مؤتمر
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 21st International Conference on Principles of Distributed Systems, OPODIS 2017
A2 - Aspnes, James
A2 - Leitao, Joao
A2 - Bessani, Alysson
A2 - Felber, Pascal
T2 - 21st International Conference on Principles of Distributed Systems, OPODIS 2017
Y2 - 18 December 2017 through 20 December 2017
ER -