Skip to main navigation Skip to search Skip to main content

Bounds on identification of genome evolution pacemakers

Research output: Contribution to journalArticlepeer-review

Abstract

Several studies have pointed out that the tight correlation between genes' evolutionary rate is better explained by a model denoted as the Universal PaceMaker (UPM) rather than by a simple rate constancy as manifested by the classical hypothesis of molecular clock (MC). Under UPM, each gene is associated with a single pacemaker (PM) and varies its evolutionary rate according to this PM ticks. Hence, the relative rates of all genes associated with the same PM remain nearly constant, whereas the absolute rates can change arbitrarily according to the PM ticks. A consequent question to that mentioned is finding the gene-PM association only from the gene sequence data. This, however, turns to be a nontrivial task and is affected by the number of variables, their random noise, and the amount of available information. To this end, a clustering heuristic was devised by exploiting the correlation between corresponding edge lengths across thousands of gene trees. Nevertheless, no theoretical study linking the relationship between the affecting parameters was done. We here study this question by providing theoretical bounds, expressed by the system parameters, on probabilities for positive and negative results. We corroborate these results by a simulation study that reveals the critical role of the variances.

Original languageAmerican English
Pages (from-to)806-821
Number of pages16
JournalJournal of Computational Biology
Volume26
Issue number8
DOIs
StatePublished - Aug 2019

Keywords

  • Chernoff bounds
  • DNA sequence evolution
  • chi square distribution
  • probabilistic geometrical clustering

All Science Journal Classification (ASJC) codes

  • Computational Mathematics
  • Genetics
  • Molecular Biology
  • Computational Theory and Mathematics
  • Modelling and Simulation

Cite this