TY - GEN
T1 - The Streaming k-Mismatch Problem
T2 - 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
AU - Golan, Shay
AU - Kociumaka, Tomasz
AU - Kopelowitz, Tsvi
AU - Porat, Ely
N1 - Publisher Copyright: © 2020 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - We revisit the k-mismatch problem in the streaming model on a pattern of length m and a streaming text of length n, both over a size-f alphabet. The current state-of-the-art algorithm for the streaming k-mismatch problem, by Clifford et al. [SODA 2019], uses∼O(k) space and∼O ôp k fworst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is∼O(n p k), and the fastest known offline algorithm, which costs∼O ô n + min ô pnk m,n time. Moreover, it is not known whether improvements over the∼O(n p k) total time are possible when using more than O(k) space. We address these gaps by designing a randomized streaming algorithm for the k-mismatch problem that, given an integer parameter k s m, uses∼O(s) space and costs∼O ô n + min ô nk2 m , pnk s nm s total time. For s = m, the total runtime becomes∼O ô n + min ô pnk m, fn f, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still∼O ôp k ff. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
AB - We revisit the k-mismatch problem in the streaming model on a pattern of length m and a streaming text of length n, both over a size-f alphabet. The current state-of-the-art algorithm for the streaming k-mismatch problem, by Clifford et al. [SODA 2019], uses∼O(k) space and∼O ôp k fworst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is∼O(n p k), and the fastest known offline algorithm, which costs∼O ô n + min ô pnk m,n time. Moreover, it is not known whether improvements over the∼O(n p k) total time are possible when using more than O(k) space. We address these gaps by designing a randomized streaming algorithm for the k-mismatch problem that, given an integer parameter k s m, uses∼O(s) space and costs∼O ô n + min ô nk2 m , pnk s nm s total time. For s = m, the total runtime becomes∼O ô n + min ô pnk m, fn f, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still∼O ôp k ff. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
KW - Hamming distance
KW - K-mismatch
KW - Streaming pattern matching
UR - http://www.scopus.com/inward/record.url?scp=85088379268&partnerID=8YFLogxK
U2 - https://doi.org/10.4230/LIPIcs.CPM.2020.15
DO - https://doi.org/10.4230/LIPIcs.CPM.2020.15
M3 - منشور من مؤتمر
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
A2 - Gortz, Inge Li
A2 - Weimann, Oren
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Y2 - 17 June 2020 through 19 June 2020
ER -