On fault tolerance, locality, and optimality in locally repairable codes

Oleg Kolosov, Gala Yadgar, Matan Liram, Itzhak Tamo, Alexander Barg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Erasure codes are used in large-scale storage systems to allow recovery of data from a failed node. A recently developed class of erasure codes, termed locally repairable codes (LRCs), offers tradeoffs between storage overhead and repair cost. LRCs facilitate more efficient recovery scenarios by storing additional parity blocks in the system, but these additional blocks may eventually increase the number of blocks that must be reconstructed. Existing codes differ in their use of the additional parity blocks, but also in their locality semantics and in the parameters for which they are defined. As a result, existing theoretical models cannot be used to directly compare different LRCs to determine which code will offer the best recovery performance, and at what cost. In this study, we perform the first systematic comparison of existing LRC approaches. We analyze Xorbas, Azure's LRCs, and the recently proposed Optimal-LRCs in light of two new metrics: the average degraded read cost, and the normalized repair cost. We show the tradeoff between these costs and the code's fault tolerance, and that different approaches offer different choices in this tradeoff. Our experimental evaluation on a Ceph cluster deployed on Amazon EC2 further demonstrates the different effects of realistic network and storage bottlenecks on the benefit from each examined LRC approach. Despite these differences, the normalized repair cost metric can reliably identify the LRC approach that would achieve the lowest repair cost in each setup.

Original languageEnglish
Title of host publicationProceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018
Pages865-877
Number of pages13
ISBN (Electronic)9781939133021
StatePublished - 2020
Event2018 USENIX Annual Technical Conference, USENIX ATC 2018 - Boston, United States
Duration: 11 Jul 201813 Jul 2018

Publication series

NameProceedings of the 2018 USENIX Annual Technical Conference, USENIX ATC 2018

Conference

Conference2018 USENIX Annual Technical Conference, USENIX ATC 2018
Country/TerritoryUnited States
CityBoston
Period11/07/1813/07/18

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'On fault tolerance, locality, and optimality in locally repairable codes'. Together they form a unique fingerprint.

Cite this