TY - JOUR
T1 - LCS approximation via embedding into locally non-repetitive strings
AU - Landau, G. M.
AU - Levy, A.
AU - Newman, I.
N1 - Funding Information: A preliminary version appeared in the proceedings of CPM 2009. This work was partially supported by the Israel Science Foundation (Grant No. 1011/06). Corresponding author at: Department of Software Engineering, Shenkar College, 12 Anna Frank, Ramat-Gan, Israel. E-mail addresses: [email protected] (G.M. Landau), [email protected], [email protected] (A. Levy), [email protected] (I. Newman). 1 Partially supported by the National Science Foundation Award 0904246, Israel Science Foundation Grant 347/09, Yahoo, Grant No. 2008217 from the United States–Israel Binational Science Foundation (BSF) and DFG. 2 Fax: +972 4 824 9331.
PY - 2011/4
Y1 - 2011/4
N2 - A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least nε for some constant ε > 0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity O(n 2-ε) for some constant ε) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools.
AB - A classical measure of similarity between strings is the length of the longest common subsequence (LCS) between the two given strings. The search for efficient algorithms for finding the LCS has been going on for more than three decades. To date, all known algorithms may take quadratic time (shaved by logarithmic factors) to find large LCS. In this paper, the problem of approximating LCS is studied, while focusing on the hard inputs for this problem, namely, approximating LCS of near-linear size in strings over a relatively large alphabet (of size at least nε for some constant ε > 0, where n is the length of the string). We show that, any given string over a relatively large alphabet can be embedded into a locally non-repetitive string. This embedding has a negligible additive distortion for strings that are not too dissimilar in terms of the edit distance. We also show that LCS can be efficiently approximated in locally-non-repetitive strings. Our new method (the embedding together with the approximation algorithm) gives a strictly sub-quadratic time algorithm (i.e., of complexity O(n 2-ε) for some constant ε) which can find common subsequences of linear (and near linear) size that cannot be detected efficiently by the existing tools.
KW - Embedding
KW - LCS approximation
KW - String algorithms
UR - http://www.scopus.com/inward/record.url?scp=79851509357&partnerID=8YFLogxK
U2 - https://doi.org/10.1016/j.ic.2010.12.006
DO - https://doi.org/10.1016/j.ic.2010.12.006
M3 - Article
SN - 0890-5401
VL - 209
SP - 705
EP - 716
JO - Information and Computation
JF - Information and Computation
IS - 4
ER -