Sample-Based Distance-Approximation for Subsequence-Freeness

Omer Cohen Sidon, Dana Ron

Research output: Contribution to journalArticlepeer-review

Abstract

In this work, we study the problem of approximating the distance to subsequence-freeness in the sample-based distribution-free model. For a given subsequence (word) w=w1…wk, a sequence (text) T=t1…tn is said to contain w if there exist indices 1≤i1<⋯<ik≤n such that tij=wj for every 1≤j≤k. Otherwise, T is w-free. Ron and Rosin (ACM Trans Comput Theory 14(4):1–31, 2022) showed that the number of samples both necessary and sufficient for one-sided error testing of subsequence-freeness in the sample-based distribution-free model is Θ(k/ϵ). Denoting by Δ(T,w,p) the distance of T to w-freeness under a distribution p:[n]→[0,1], we are interested in obtaining an estimate Δ^, such that |Δ^-Δ(T,w,p)|≤δ with probability at least 2/3, for a given error parameter δ. Our main result is a sample-based distribution-free algorithm whose sample complexity is O~(k22). We first present an algorithm that works when the underlying distribution p is uniform, and then show how it can be modified to work for any (unknown) distribution p. We also show that a quadratic dependence on 1/δ is necessary.

Original languageEnglish
Pages (from-to)2519-2556
Number of pages38
JournalAlgorithmica
Volume86
Issue number8
DOIs
StatePublished - Aug 2024

Keywords

  • Distance-approximation
  • Property testing
  • Sample-based
  • Subsequence-freeness

All Science Journal Classification (ASJC) codes

  • General Computer Science
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Sample-Based Distance-Approximation for Subsequence-Freeness'. Together they form a unique fingerprint.

Cite this