TY - GEN
T1 - Error-Correcting Codes for Combinatorial Composite DNA
AU - Sabary, Omer
AU - Preuss, Inbal
AU - Gabrys, Ryan
AU - Yakhini, Zohar
AU - Anavy, Leon
AU - Yaakobi, Eitan
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Data storage in DNA is developing as a possible solution for archival digital data. Recently, to further increase the potential capacity of DNA-based data storage systems, the combinatorial composite DNA synthesis method was suggested. This approach extends the DNA alphabet by harnessing short DNA fragment reagents, known as shortmers. The shortmers are building blocks of the alphabet symbols, each consisting of a fixed number of shortmers. Thus, when information is read, it is possible that one of the shortmers that forms part of the composition of a symbol is missing and therefore the symbol cannot be determined. In this paper, we model this type of error as a type of asymmetric error and propose code constructions that can correct such errors in this setup. We also provide a lower bound on the redundancy of such error-correcting codes and give an explicit encoder and decoder for our construction. Our suggested error model is also supported by an analysis of data from actual experiments that produced DNA according to the combinatorial scheme. Lastly, we also provide a statistical evaluation of the probability of observing such error events, as a function of read depth.
AB - Data storage in DNA is developing as a possible solution for archival digital data. Recently, to further increase the potential capacity of DNA-based data storage systems, the combinatorial composite DNA synthesis method was suggested. This approach extends the DNA alphabet by harnessing short DNA fragment reagents, known as shortmers. The shortmers are building blocks of the alphabet symbols, each consisting of a fixed number of shortmers. Thus, when information is read, it is possible that one of the shortmers that forms part of the composition of a symbol is missing and therefore the symbol cannot be determined. In this paper, we model this type of error as a type of asymmetric error and propose code constructions that can correct such errors in this setup. We also provide a lower bound on the redundancy of such error-correcting codes and give an explicit encoder and decoder for our construction. Our suggested error model is also supported by an analysis of data from actual experiments that produced DNA according to the combinatorial scheme. Lastly, we also provide a statistical evaluation of the probability of observing such error events, as a function of read depth.
UR - http://www.scopus.com/inward/record.url?scp=85197088245&partnerID=8YFLogxK
U2 - 10.1109/ISIT57864.2024.10619334
DO - 10.1109/ISIT57864.2024.10619334
M3 - منشور من مؤتمر
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 109
EP - 114
BT - 2024 IEEE International Symposium on Information Theory, ISIT 2024 - Proceedings
T2 - 2024 IEEE International Symposium on Information Theory, ISIT 2024
Y2 - 7 July 2024 through 12 July 2024
ER -