Small-space and streaming pattern matching with k edits

Tomasz Kociumaka, Ely Porat, Tatiana Starikovskaya

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer k, a pattern p of length m, and a text T of length n≥q m, the task is to find substrings of T that are within edit distance k from p. Our main result is a streaming algorithm that solves the problem in tilde O}(k 5}) space11Hereafter, tilde O() hides a poly} (log n) factor. and tilde O(k 8}) amortized time per character of the text, providing answers correct with high probability. This answers a decade-old question: since the discovery of a poly (k log n)-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no poly (k log n)-space algorithm was known even in the simpler semi-streaming model, where T comes as a stream but p is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. Our central technical contribution is a new space-efficient deterministic encoding of two strings, called the greedy encoding, which encodes a set of all alignments of cost at most k with a certain property (we call such alignments greedy). On strings of length at most n, the encoding occupies tilde O(k 2}) space. We use the encoding to compress substrings of the text that are close to the pattern. In order to do so, we compute the encoding for substrings of the text and of the pattern, which requires read-only access to the latter. In order to develop the fully streaming algorithm, we further introduce a new edit distance sketch parameterized by integers n > k. For any string of length at most n, the sketch is of size tilde Ooverline{(k} 2}), and it can be computed with an tilde O(k 2})-space streaming algorithm. Given the sketches of two strings, in tilde O(k 3}) time we can compute their edit distance or certify that it is larger than k. This result improves upon tilde O(k 8})-size sketches of Belazzougui and Zhang [FOCS 2016] and very recent tilde O(k 3})-size sketches of Jin, Nelson, and Wu [STACS 2021].

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science, FOCS 2021
PublisherIEEE Computer Society
Pages885-896
Number of pages12
ISBN (Electronic)9781665420556
DOIs
StatePublished - 2022
Event62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021 - Virtual, Online, United States
Duration: 7 Feb 202210 Feb 2022

Publication series

NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
Volume2022-February

Conference

Conference62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021
Country/TerritoryUnited States
CityVirtual, Online
Period7/02/2210/02/22

Keywords

  • edit distance
  • pattern matching
  • streaming

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Small-space and streaming pattern matching with k edits'. Together they form a unique fingerprint.

Cite this