FiSSC: Finding smallest sequence covers to sets of degenerate reads with applications to RNA editing

Ido Tziony, Jonathan Mandl, Kobi Shapira, Eli Eisenberg, Ely Porat, Yaron Orenstein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High-throughput sequencing (HTS) is the most established technique to measure transcript abundance. HTS reads often contain uncertain or low-quality base calls that introduce ambiguity in determining the underlying sequence. In many applications, these unresolved nucleotides are handled by looking at the consensus sequence of all HTS reads. However, this approach is not applicable where sequence heterogeneity is of biological relevance. To gauge the biological complexity of a set of HTS reads in face of unresolved base calls, one may apply the parsimony principle, i.e., find a smallest set of sequences that cover all ambiguous reads. But, no method to date solves this problem optimally. Here, we present FiSSC, a new method to find a smallest sequence cover of a set of ambiguous reads. We first prove that the problem is NP-hard. We then present filtering steps that preserve optimal solution size, and an integer-linear-programming formulation, which together form FiSSC. We tested FiSSC on A-to-I RNA editing datasets with binary ambiguities. FiSSC outperformed all baseline methods and achieved optimal results in all but one dataset. We expect FiSSC to advance the study of sequence variation and biological complexity of ambiguous reads in various biological domains.

Original languageAmerican English
Title of host publicationACM-BCB 2024 - 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
ISBN (Electronic)9798400713026
DOIs
StatePublished - 16 Dec 2024
Event15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024 - Shenzhen, China
Duration: 22 Nov 202425 Nov 2024

Publication series

NameACM-BCB 2024 - 15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics

Conference

Conference15th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2024
Country/TerritoryChina
CityShenzhen
Period22/11/2425/11/24

Keywords

  • ILP
  • NP-hard
  • independent set
  • sequence cover

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Biomedical Engineering
  • Health Informatics

Cite this