TY - GEN
T1 - Locally consistent parsing for text indexing in small space
AU - Birenzwige, Or
AU - Golan, Shay
AU - Porat, Ely
N1 - Publisher Copyright: Copyright © 2020 by SIAM
PY - 2020
Y1 - 2020
N2 - We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction, where a text S is given in read-only memory, along with a set of suffixes B, and the goal is to construct the compressed trie of all these suffixes ordered lexicographically, using only O(|B|) words of space. The second problem is the Longest Common Extension (LCE) problem, where again a text S of length n is given in read-only memory with some parameter 1 ≤ τ ≤ n, and the goal is to construct a data structure that uses O(nτ ) words of space and can compute for any pair of suffixes their longest common prefix length. We show how to use ideas based on the Locally Consistent Parsing technique, that were introduced by Sahinalp and Vishkin [44], in some nontrivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems. For the randomized algorithms, we introduce the first Las-Vegas SST construction algorithm that takes O(n) time. This is an improvement over the last result of Gawrychowski and Kociumaka [22] who obtained O(n) time for Monte Carlo algorithm, and O(nplog |B|) time with hight probability for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for a data structure that uses O(nτ ) words of space, can be constructed in linear time with high probability and answers LCE queries in O(τ) time. For the deterministic algorithms, we introduce an SST construction algorithm that takes O(nlog |Bn|) time (for |B| = Ω(log n)). This is the first almost linear time, O(n · polylog n), deterministic SST construction algorithm, where all previous algorithms take at least Ω (min{n|B|, |nB2|}) time. For the LCE problem, we introduce a data structure that uses O(nτ ) words of space and answers LCE queries in O(τplog∗ n) time, with O(nlog τ) construction time (for τ = O(lognn)). This data structure improves both query time and construction time upon the results of Tanimura et al. [47].
AB - We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction, where a text S is given in read-only memory, along with a set of suffixes B, and the goal is to construct the compressed trie of all these suffixes ordered lexicographically, using only O(|B|) words of space. The second problem is the Longest Common Extension (LCE) problem, where again a text S of length n is given in read-only memory with some parameter 1 ≤ τ ≤ n, and the goal is to construct a data structure that uses O(nτ ) words of space and can compute for any pair of suffixes their longest common prefix length. We show how to use ideas based on the Locally Consistent Parsing technique, that were introduced by Sahinalp and Vishkin [44], in some nontrivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems. For the randomized algorithms, we introduce the first Las-Vegas SST construction algorithm that takes O(n) time. This is an improvement over the last result of Gawrychowski and Kociumaka [22] who obtained O(n) time for Monte Carlo algorithm, and O(nplog |B|) time with hight probability for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for a data structure that uses O(nτ ) words of space, can be constructed in linear time with high probability and answers LCE queries in O(τ) time. For the deterministic algorithms, we introduce an SST construction algorithm that takes O(nlog |Bn|) time (for |B| = Ω(log n)). This is the first almost linear time, O(n · polylog n), deterministic SST construction algorithm, where all previous algorithms take at least Ω (min{n|B|, |nB2|}) time. For the LCE problem, we introduce a data structure that uses O(nτ ) words of space and answers LCE queries in O(τplog∗ n) time, with O(nlog τ) construction time (for τ = O(lognn)). This data structure improves both query time and construction time upon the results of Tanimura et al. [47].
UR - http://www.scopus.com/inward/record.url?scp=85084038276&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 607
EP - 626
BT - 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020
A2 - Chawla, Shuchi
T2 - 31st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2020
Y2 - 5 January 2020 through 8 January 2020
ER -