Data-Driven Bee Identification for DNA Strands

Shubhransh Singhvi, Avital Boruchovsky, Han Mao Kiah, Eitan Yaakobi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study a data-driven approach to the bee identification problem for DNA strands. The bee-identification problem, introduced by Tandon et al. (2019), requires one to identify M bees, each tagged by a unique barcode, via a set of M noisy measurements. Later, Chrisnata et al. (2022) extended the model to case where one observes N noisy measurements of each bee, and applied the model to address the unordered nature of DNA storage systems.In such systems, a unique address is typically prepended to each DNA data block to form a DNA strand, but the address may possibly be corrupted. While clustering is usually used to identify the address of a DNA strand, this requires M2 data comparisons (when M is the number of reads). In contrast, the approach of Chrisnata et al. (2022) avoids data comparisons completely. In this work, we study an intermediate, data-driven approach to this identification task.For the binary erasure channel, we first show that we can almost surely correctly identify all DNA strands under certain mild assumptions. Then we propose a data-driven pruning procedure and demonstrate that on average the procedure uses only a fraction of M2 data comparisons. Specifically, for M = 2n and erasure probability p, the expected number of data comparisons performed by the procedure is ?M2, where (1 + 2p - p2/2)n = ? = (1 + p/2)n.

Original languageEnglish
Title of host publication2023 IEEE International Symposium on Information Theory, ISIT 2023
Pages797-802
Number of pages6
ISBN (Electronic)9781665475549
DOIs
StatePublished - 2023
Event2023 IEEE International Symposium on Information Theory, ISIT 2023 - Taipei, Taiwan, Province of China
Duration: 25 Jun 202330 Jun 2023

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2023-June

Conference

Conference2023 IEEE International Symposium on Information Theory, ISIT 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period25/06/2330/06/23

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Information Systems
  • Modelling and Simulation
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Data-Driven Bee Identification for DNA Strands'. Together they form a unique fingerprint.

Cite this