TY - GEN
T1 - On fault tolerance, locality, and optimality in locally repairable codes
AU - Kolosov, Oleg
AU - Yadgar, Gala
AU - Liram, Matan
AU - Tamo, Itzhak
AU - Barg, Alexander
N1 - Publisher Copyright: © Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Erasure codes are used in large-scale storage systems to allow recovery of data from a failed node. A recently developed class of erasure codes, termed locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate more efficient recovery scenarios by storing additional parity blocks in the system, but these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing codes differ in their use of the additional parity blocks, but also in their locality semantics and in the parameters for which they are defined. As a result, existing theoretical models cannot be used to directly compare different LRCs to determine which code will offer the best recovery performance, and at what cost. In this study, we perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure's LRCs, and the recently proposed Optimal-LRCs in light of two new metrics: the average degraded read cost, and the normalized repair cost. We show the tradeoff between these costs and the code's fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster deployed on Amazon EC2 further demonstrates the different effects of realistic network and storage bottlenecks on the benefit from each examined LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.
AB - Erasure codes are used in large-scale storage systems to allow recovery of data from a failed node. A recently developed class of erasure codes, termed locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate more efficient recovery scenarios by storing additional parity blocks in the system, but these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing codes differ in their use of the additional parity blocks, but also in their locality semantics and in the parameters for which they are defined. As a result, existing theoretical models cannot be used to directly compare different LRCs to determine which code will offer the best recovery performance, and at what cost. In this study, we perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure's LRCs, and the recently proposed Optimal-LRCs in light of two new metrics: the average degraded read cost, and the normalized repair cost. We show the tradeoff between these costs and the code's fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster deployed on Amazon EC2 further demonstrates the different effects of realistic network and storage bottlenecks on the benefit from each examined LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.
UR - http://www.scopus.com/inward/record.url?scp=85077021366&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
SP - 865
EP - 877
BT - Proceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
T2 - 2018 USENIX Annual Technical Conference, USENIX ATC 2018
Y2 - 11 July 2018 through 13 July 2018
ER -