Abstract
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of the same-length strings, and a parameter d, find a string x that maximizes the number of "non- outliers" within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old erroneous proof made previously in the literature. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no efficient PTAS (EPTAS) unless a parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.
| Original language | American English |
|---|---|
| Pages (from-to) | 107-114 |
| Number of pages | 8 |
| Journal | Theoretical Computer Science |
| Volume | 498 |
| DOIs | |
| State | Published - 5 Aug 2013 |
Keywords
- String algorithms
- String selection
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- General Computer Science
Fingerprint
Dive into the research topics of 'On approximating string selection problems with outliers'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver