Abstract
In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w=w1…wk, a sequence (text) T=t1…tn is said to contain w if there exist indices 1≤i1<⋯<ik≤n such that tij=wj for every 1≤j≤k. Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p:[n]→[0,1], we are interested in obtaining an estimate Δ^, such that |Δ^-Δ(T,w,p)|≤δ with probability at least 2/3, for a given error parameter δ. Our main result is a sample-based distribution-free algorithm whose sample complexity is O~(k2/δ2). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.
Original language | English |
---|---|
Pages (from-to) | 2519-2556 |
Number of pages | 38 |
Journal | Algorithmica |
Volume | 86 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2024 |
Keywords
- Distance-approximation
- Property testing
- Sample-based
- Subsequence-freeness
All Science Journal Classification (ASJC) codes
- General Computer Science
- Computer Science Applications
- Applied Mathematics