Abstract
Several studies have pointed out that the tight correlation between genes' evolutionary rate is better explained by a model denoted as the Universal PaceMaker (UPM) rather than by a simple rate constancy as manifested by the classical hypothesis of molecular clock (MC). Under UPM, each gene is associated with a single pacemaker (PM) and varies its evolutionary rate according to this PM ticks. Hence, the relative rates of all genes associated with the same PM remain nearly constant, whereas the absolute rates can change arbitrarily according to the PM ticks. A consequent question to that mentioned is finding the gene-PM association only from the gene sequence data. This, however, turns to be a nontrivial task and is affected by the number of variables, their random noise, and the amount of available information. To this end, a clustering heuristic was devised by exploiting the correlation between corresponding edge lengths across thousands of gene trees. Nevertheless, no theoretical study linking the relationship between the affecting parameters was done. We here study this question by providing theoretical bounds, expressed by the system parameters, on probabilities for positive and negative results. We corroborate these results by a simulation study that reveals the critical role of the variances.
| Original language | American English |
|---|---|
| Pages (from-to) | 806-821 |
| Number of pages | 16 |
| Journal | Journal of Computational Biology |
| Volume | 26 |
| Issue number | 8 |
| DOIs | |
| State | Published - Aug 2019 |
Keywords
- Chernoff bounds
- DNA sequence evolution
- chi square distribution
- probabilistic geometrical clustering
All Science Journal Classification (ASJC) codes
- Computational Mathematics
- Genetics
- Molecular Biology
- Computational Theory and Mathematics
- Modelling and Simulation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver