TY - JOUR
T1 - File Updates under Random/Arbitrary Insertions and Deletions
AU - Wang, Qiwen
AU - Jaggi, Sidharth
AU - Médard, Muriel
AU - Cadambe, Viveck R.
AU - Schwartz, Moshe
N1 - Funding Information: Manuscript received June 8, 2016; revised April 18, 2017; accepted April 30, 2017. Date of publication May 17, 2017; date of current version September 13, 2017. Q. Wang was supported by the Knut and Alice Wal-lenberg Foundation. This paper was presented at the 2015 IEEE Information Theory Workshop, and the 2016 IEEE Information Theory Workshop. Publisher Copyright: © 1963-2012 IEEE.
PY - 2017/10/1
Y1 - 2017/10/1
N2 - The problem of one-way file synchronization, henceforth called 'file updates', is studied in this paper. Specifically, a client edits a file, where the edits are modeled by insertions and deletions (InDels). An old copy of the file is stored remotely at a data-centre, and is also available to the client. We consider the problem of throughput- and computationally-efficient communication from the client to the data-centre, to enable the data-centre to update its old copy to the newly edited file. Two models for the source files and edit patterns are studied: the random pre-edit sequence left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit sequence arbitrary InDel (APES-AID) process. In both models, we consider the regime, in which the number of insertions and deletions is a small (but constant) fraction of the length of the original file. For both models, information-theoretic lower bounds on the best possible compression rates that enable file updates are derived (up to first order terms). Conversely, a simple compression algorithm using dynamic programming (DP) and entropy coding (EC), henceforth called DP-EC algorithm, achieves rates that are within constant additive gap (which diminishes as the alphabet size increases) to information-theoretic lower bounds for both models. For the RPES-LtRRID model, a dynamic-programming-run-length-compression (DP-RLC) algorithm is proposed, which achieves a compression rate matching the information-theoretic lower bound up to first order terms. Therefore, when the insertion and deletion probabilities are small (such that first order terms dominate), the achievable rate by DP-RLC is nearly optimal for the RPES-LtRRID model.
AB - The problem of one-way file synchronization, henceforth called 'file updates', is studied in this paper. Specifically, a client edits a file, where the edits are modeled by insertions and deletions (InDels). An old copy of the file is stored remotely at a data-centre, and is also available to the client. We consider the problem of throughput- and computationally-efficient communication from the client to the data-centre, to enable the data-centre to update its old copy to the newly edited file. Two models for the source files and edit patterns are studied: the random pre-edit sequence left-to-right random InDel (RPES-LtRRID) process, and the arbitrary pre-edit sequence arbitrary InDel (APES-AID) process. In both models, we consider the regime, in which the number of insertions and deletions is a small (but constant) fraction of the length of the original file. For both models, information-theoretic lower bounds on the best possible compression rates that enable file updates are derived (up to first order terms). Conversely, a simple compression algorithm using dynamic programming (DP) and entropy coding (EC), henceforth called DP-EC algorithm, achieves rates that are within constant additive gap (which diminishes as the alphabet size increases) to information-theoretic lower bounds for both models. For the RPES-LtRRID model, a dynamic-programming-run-length-compression (DP-RLC) algorithm is proposed, which achieves a compression rate matching the information-theoretic lower bound up to first order terms. Therefore, when the insertion and deletion probabilities are small (such that first order terms dominate), the achievable rate by DP-RLC is nearly optimal for the RPES-LtRRID model.
KW - Synchronization
KW - deletions
KW - insertions
UR - http://www.scopus.com/inward/record.url?scp=85029944177&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/TIT.2017.2705100
DO - https://doi.org/10.1109/TIT.2017.2705100
M3 - Article
SN - 0018-9448
VL - 63
SP - 6487
EP - 6513
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 10
M1 - 7930436
ER -