TY - GEN
T1 - A Combinatorial Perspective on Random Access Efficiency for DNA Storage
AU - Gruica, Anina
AU - Bar-Lev, Daniella
AU - Ravagnani, Alberto
AU - Yaakobi, Eitan
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of k information strands, which are encoded into n strands via some generator matrix G. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) until it is possible to decode a specific information strand, which is requested by the user. The goal is to minimize the maximum expectation over all possible requested information strands, and this value is denoted by Tmax(G). This paper introduces new techniques to investigate the random access coverage depth problem, which capture its combinatorial nature. We establish two general formulas to find Tmax(G) for arbitrary matrices. We introduce the concept of recovery balanced codes and combine all these results and notions to compute Tmax(G) for MDS, simplex, and Hamming codes. We also study the performance of modified systematic MDS matrices and our results show that the best results for T(G) are achieved with a specific mix of encoded strands and replication of the information strands.
AB - We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of k information strands, which are encoded into n strands via some generator matrix G. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) until it is possible to decode a specific information strand, which is requested by the user. The goal is to minimize the maximum expectation over all possible requested information strands, and this value is denoted by Tmax(G). This paper introduces new techniques to investigate the random access coverage depth problem, which capture its combinatorial nature. We establish two general formulas to find Tmax(G) for arbitrary matrices. We introduce the concept of recovery balanced codes and combine all these results and notions to compute Tmax(G) for MDS, simplex, and Hamming codes. We also study the performance of modified systematic MDS matrices and our results show that the best results for T(G) are achieved with a specific mix of encoded strands and replication of the information strands.
UR - http://www.scopus.com/inward/record.url?scp=85202833170&partnerID=8YFLogxK
U2 - 10.1109/ISIT57864.2024.10619151
DO - 10.1109/ISIT57864.2024.10619151
M3 - منشور من مؤتمر
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 675
EP - 680
BT - 2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
T2 - 2024 IEEE International Symposium on Information Theory, ISIT 2024
Y2 - 7 July 2024 through 12 July 2024
ER -