TY - GEN
T1 - Gapped String Indexing in Subquadratic Space and Sublinear Query Time
AU - Bille, Philip
AU - Gørtz, Inge Li
AU - Lewenstein, Moshe
AU - Pissis, Solon P.
AU - Rotenberg, Eva
AU - Steiner, Teresa Anna
N1 - Publisher Copyright: © Philip Bille, Inge Li Gørtz, Moshe Lewenstein, Solon P. Pissis, Eva Rotenberg, and Teresa Anna Steiner; licensed under Creative Commons License CC-BY 4.0.
PY - 2024/3
Y1 - 2024/3
N2 - In Gapped String Indexing, the goal is to compactly represent a string S of length n such that for any query consisting of two strings P1 and P2, called patterns, and an integer interval [α, β], called gap range, we can quickly find occurrences of P1 and P2 in S with distance in [α, β]. Gapped String Indexing is a central problem in computational biology and text mining and has thus received significant research interest, including parameterized and heuristic approaches. Despite this interest, the best-known time-space trade-offs for Gapped String Indexing are the straightforward O(n) space and O(n + occ) query time or Ω(n2) space and Õ(|P1| + |P2| + occ) query time. We break through this barrier obtaining the first interesting trade-offs with polynomially subquadratic space and polynomially sublinear query time. In particular, we show that, for every 0 ≤ δ ≤ 1, there is a data structure for Gapped String Indexing with either Õ(n2−δ/3) or Õ(n3−2δ) space and Õ(|P1|+ |P2|+ nδ · (occ + 1)) query time, where occ is the number of reported occurrences. As a new fundamental tool towards obtaining our main result, we introduce the Shifted Set Intersection problem: preprocess a collection of sets S1, ..., Sk of integers such that for any query consisting of three integers i, j, s, we can quickly output YES if and only if there exist a ∈ Si and b ∈ Sj with a + s = b. We start by showing that the Shifted Set Intersection problem is equivalent to the indexing variant of 3SUM (3SUM Indexing) [Golovnev et al., STOC 2020]. We then give a data structure for Shifted Set Intersection with gaps, which entails a solution to the Gapped String Indexing problem. Furthermore, we enhance our data structure for deciding Shifted Set Intersection, so that we can support the reporting variant of the problem, i.e., outputting all certificates in the affirmative case. Via the obtained equivalence to 3SUM Indexing, we thus give new improved data structures for the reporting variant of 3SUM Indexing, and we show how this improves upon the state-of-the-art solution for Jumbled Indexing [Chan and Lewenstein, STOC 2015] for any alphabet of constant size σ > 5.
AB - In Gapped String Indexing, the goal is to compactly represent a string S of length n such that for any query consisting of two strings P1 and P2, called patterns, and an integer interval [α, β], called gap range, we can quickly find occurrences of P1 and P2 in S with distance in [α, β]. Gapped String Indexing is a central problem in computational biology and text mining and has thus received significant research interest, including parameterized and heuristic approaches. Despite this interest, the best-known time-space trade-offs for Gapped String Indexing are the straightforward O(n) space and O(n + occ) query time or Ω(n2) space and Õ(|P1| + |P2| + occ) query time. We break through this barrier obtaining the first interesting trade-offs with polynomially subquadratic space and polynomially sublinear query time. In particular, we show that, for every 0 ≤ δ ≤ 1, there is a data structure for Gapped String Indexing with either Õ(n2−δ/3) or Õ(n3−2δ) space and Õ(|P1|+ |P2|+ nδ · (occ + 1)) query time, where occ is the number of reported occurrences. As a new fundamental tool towards obtaining our main result, we introduce the Shifted Set Intersection problem: preprocess a collection of sets S1, ..., Sk of integers such that for any query consisting of three integers i, j, s, we can quickly output YES if and only if there exist a ∈ Si and b ∈ Sj with a + s = b. We start by showing that the Shifted Set Intersection problem is equivalent to the indexing variant of 3SUM (3SUM Indexing) [Golovnev et al., STOC 2020]. We then give a data structure for Shifted Set Intersection with gaps, which entails a solution to the Gapped String Indexing problem. Furthermore, we enhance our data structure for deciding Shifted Set Intersection, so that we can support the reporting variant of the problem, i.e., outputting all certificates in the affirmative case. Via the obtained equivalence to 3SUM Indexing, we thus give new improved data structures for the reporting variant of 3SUM Indexing, and we show how this improves upon the state-of-the-art solution for Jumbled Indexing [Chan and Lewenstein, STOC 2015] for any alphabet of constant size σ > 5.
KW - data structures
KW - indexing with gaps
KW - string indexing
KW - two patterns
UR - http://www.scopus.com/inward/record.url?scp=85187796283&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.STACS.2024.16
DO - 10.4230/LIPIcs.STACS.2024.16
M3 - منشور من مؤتمر
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 41st International Symposium on Theoretical Aspects of Computer Science, STACS 2024
A2 - Beyersdorff, Olaf
A2 - Kante, Mamadou Moustapha
A2 - Kupferman, Orna
A2 - Lokshtanov, Daniel
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 41st International Symposium on Theoretical Aspects of Computer Science, STACS 2024
Y2 - 12 March 2024 through 14 March 2024
ER -