TY - JOUR
T1 - On the number of genomic pacemakers
T2 - A geometric approach
AU - Snir, Sagi
N1 - Funding Information: Research was supported in part by the USA-Israel Binational Science Foundation. Part of this work was done while the author was visiting the National Center for Biotechnology Information (NCBI) at the National Institutes of Health (NIH), USA. We thank Eugene Koonin and Yuri Wolf for helpful discussions, in particular in interpretation of the biological significance of the resulted clustering of the real data in Section ‘Results on real data’. Publisher Copyright: © 2014 Snir.
PY - 2014/12/31
Y1 - 2014/12/31
N2 - The universal pacemaker (UPM) model extends the classical molecular clock (MC) model, by allowing each gene, in addition to its individual intrinsic rate as in the MC, to accelerate or decelerate according to the universal pacemaker. Under UPM, the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the MC model. In this work we provide a natural generalization to the UPM model that we denote multiple pacemakers (MPM). Under the MPM model every gene is still affected by a single pacemaker, however the number of pacemakers is not confined to one. Such a model induces a partition over the gene set where all the genes in one part are affected by the same pacemaker and task is to identify the pacemaker partition, or in other words, finding for each gene its associated pacemaker. We devise a novel heuristic procedure, relying on statistical and geometrical tools, to solve the problem and demonstrate by simulation that this approach can cope satisfactorily with considerable noise and realistic problem sizes. We applied this procedure to a set of over 2000 genes in 100 prokaryotes and demonstrated the significant existence of two pacemakers.
AB - The universal pacemaker (UPM) model extends the classical molecular clock (MC) model, by allowing each gene, in addition to its individual intrinsic rate as in the MC, to accelerate or decelerate according to the universal pacemaker. Under UPM, the relative evolutionary rates of all genes remain nearly constant whereas the absolute rates can change arbitrarily. It was shown on several taxa groups spanning the entire tree of life that the UPM model describes the evolutionary process better than the MC model. In this work we provide a natural generalization to the UPM model that we denote multiple pacemakers (MPM). Under the MPM model every gene is still affected by a single pacemaker, however the number of pacemakers is not confined to one. Such a model induces a partition over the gene set where all the genes in one part are affected by the same pacemaker and task is to identify the pacemaker partition, or in other words, finding for each gene its associated pacemaker. We devise a novel heuristic procedure, relying on statistical and geometrical tools, to solve the problem and demonstrate by simulation that this approach can cope satisfactorily with considerable noise and realistic problem sizes. We applied this procedure to a set of over 2000 genes in 100 prokaryotes and demonstrated the significant existence of two pacemakers.
KW - Deming regression
KW - Gap statistics
KW - Genome evolution pacemaker
KW - Molecular evolution
KW - Partition distance
UR - http://www.scopus.com/inward/record.url?scp=84928677415&partnerID=8YFLogxK
U2 - https://doi.org/10.1186/s13015-014-0026-0
DO - https://doi.org/10.1186/s13015-014-0026-0
M3 - Article
SN - 1748-7188
VL - 9
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 26
ER -