TY - GEN
T1 - Gene-Adjacency-Based Phylogenetics Under a Stochastic Gain-Loss Model
AU - Dvir, Yoav
AU - Brezner, Shelly
AU - Snir, Sagi
N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - A key task in molecular systematics is to decipher the evolutionary history of strains of a species. Standard markers are often too crude in this fine systematic resolution to provide a phylogenetic signal. However, among prokaryotes, events in genome dynamics (GD) such as gene gain in horizontal gene transfer (HGT) between organisms and gene loss seem to provide a quite sensitive signal. The synteny index (SI) marker captures differences between a pair of genomes in terms of both gene order and gene content. Recently, it was shown to be consistent under the Jump model, a simple model of GD where the only operation is a gene jump. In this work, we extend the Jump model to a richer model, allowing for gene gain/loss events, the most prevalent GD events in prokaryotic evolution. Despite the increased model complexity, our new representation yields a significant reduction in the number of variables, leading to a simple equation to estimate the model parameter and, consequently, the consistency of the phylogenetic reconstruction. Additionally, with a more straightforward representation, we can easily calculate the asymptotic variance of the parameter estimation, allowing us to obtain a bound for the expected error. We tested the new model and its associated reconstruction approach on actual and simulated data, where the theoretical asymptotic assumptions do not hold. Our simulation results show a very high accuracy under short evolutionary distances. Applying the method to several families in the ATGC database resulted in relative agreement with other reconstruction approaches based on other signals. The code is on GitHub under the link: https://github.com/shellybre/indels_project.
AB - A key task in molecular systematics is to decipher the evolutionary history of strains of a species. Standard markers are often too crude in this fine systematic resolution to provide a phylogenetic signal. However, among prokaryotes, events in genome dynamics (GD) such as gene gain in horizontal gene transfer (HGT) between organisms and gene loss seem to provide a quite sensitive signal. The synteny index (SI) marker captures differences between a pair of genomes in terms of both gene order and gene content. Recently, it was shown to be consistent under the Jump model, a simple model of GD where the only operation is a gene jump. In this work, we extend the Jump model to a richer model, allowing for gene gain/loss events, the most prevalent GD events in prokaryotic evolution. Despite the increased model complexity, our new representation yields a significant reduction in the number of variables, leading to a simple equation to estimate the model parameter and, consequently, the consistency of the phylogenetic reconstruction. Additionally, with a more straightforward representation, we can easily calculate the asymptotic variance of the parameter estimation, allowing us to obtain a bound for the expected error. We tested the new model and its associated reconstruction approach on actual and simulated data, where the theoretical asymptotic assumptions do not hold. Our simulation results show a very high accuracy under short evolutionary distances. Applying the method to several families in the ATGC database resulted in relative agreement with other reconstruction approaches based on other signals. The code is on GitHub under the link: https://github.com/shellybre/indels_project.
KW - Birth-Death Theory
KW - Markovian Processes
KW - Phylogenetics
KW - Prokaryotic Genome Dynamics
KW - Statistical Consistency
UR - http://www.scopus.com/inward/record.url?scp=85192224057&partnerID=8YFLogxK
U2 - https://doi.org/10.1007/978-3-031-58072-7_4
DO - https://doi.org/10.1007/978-3-031-58072-7_4
M3 - Conference contribution
SN - 9783031580710
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 69
EP - 85
BT - Comparative Genomics - 21st International Conference, RECOMB-CG 2024, Proceedings
A2 - Scornavacca, Celine
A2 - Hernández-Rosales, Maribel
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21st RECOMB International Workshop on Comparative Genomics, RECOMB-CG 2024
Y2 - 27 April 2024 through 28 April 2024
ER -