TY - GEN
T1 - Noise and uncertainty in string-duplication systems
AU - Jain, Siddharth
AU - Farnoud, Farzad
AU - Schwartz, Moshe
AU - Bruck, Jehoshua
N1 - Publisher Copyright: © 2017 IEEE.
PY - 2017/8/9
Y1 - 2017/8/9
N2 - Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output.
AB - Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into account the presence of other types of mutations. In this work, we consider the capacity of duplication mutations in the presence of point-mutation noise, and so quantify the generation power of these mutations. We show that if the number of point mutations is vanishingly small compared to the number of duplication mutations of a constant length, the generation capacity of these mutations is zero. However, if the number of point mutations increases to a constant fraction of the number of duplications, then the capacity is nonzero. Lower and upper bounds for this capacity are also presented. Another problem that we study is concerned with the mismatch between code design and channel in data storage in the DNA of living organisms with respect to duplication mutations. In this context, we consider the uncertainty of such a mismatched coding scheme measured as the maximum number of input codewords that can lead to the same output.
UR - http://www.scopus.com/inward/record.url?scp=85034043384&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/ISIT.2017.8007104
DO - https://doi.org/10.1109/ISIT.2017.8007104
M3 - منشور من مؤتمر
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 3120
EP - 3124
BT - 2017 IEEE International Symposium on Information Theory, ISIT 2017
T2 - 2017 IEEE International Symposium on Information Theory, ISIT 2017
Y2 - 25 June 2017 through 30 June 2017
ER -