Untying Rates of Gene Gain and Loss Leads to a New Phylogenetic Approach

Yoav Dvir, Sagi Snir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The advent of the genomic era has produced an incredible wealth and resolution of molecular data, posing an unprecedented challenge for molecular systematics, necessitating novel techniques and paradigms. Consequently, whole genome approaches were developed to extract the evolutionary signal by taking advantage of a larger amount of data. In parallel and in light of the understanding that in prokaryotes, genome dynamics (GD) events, primarily gene gain and loss, provide a significantly richer signal than point mutations in ubiquitous housekeeping genes, GD-based approaches were suggested. However, proper modeling of these data and the processes generating them has lagged in their pace of accumulation, both because of a lack of deep understanding and because of technical difficulties. Among the central hurdles of accurate modeling of real data is the relaxation of rate constancy, particularly the untying of gain and loss rates. This relaxation violates key assumptions such as constant genome sizes, gene set, and model reversibility and has vast implications for implementation. This work presents a generic stochastic model, the two-ratio process (TRP), which encompasses and deals with these complications. As a special case, it contains the Poissonian process with different gene gain and loss rates as a form of the Birth-Death process with varying population sizes. The lack of reversibility invalidates traditional phylogenetic approaches, yielding a novel two-stage phylogenetic approach in which accurate, bidirectional parameters are first inferred for triplets and later combined by a special cherry-picking method to a complete tree. We show by algebraic techniques that this method is theoretically statistically consistent. The method implemented by the software TDDR (Triplets Directed Distances Reconstruction) was applied to synthetic data, showing an advantage over other approaches handling similar data but without the same model assumption. We also applied it to the Alignable Tight Genomic Clusters (ATGC) Database, which showed a high adequacy to the observed data. The full text of this article appears on bioRxiv.org at https://www.biorxiv.org/content/10.1101/2025.01.27.634999v1. The TDDR code is available on GitHub: https://github.com/YoavDvir/TDDR.

Original languageAmerican English
Title of host publicationResearch in Computational Molecular Biology - 29th International Conference, RECOMB 2025, Proceedings
EditorsSriram Sankararaman
PublisherSpringer Science and Business Media Deutschland GmbH
Pages414-419
Number of pages6
ISBN (Print)9783031902512
DOIs
StatePublished - 2025
Event29th International Conference on Research in Computational Molecular Biology, RECOMB 2025 - Seoul, Korea, Republic of
Duration: 26 Apr 202529 Apr 2025

Publication series

NameLecture Notes in Computer Science
Volume15647 LNBI

Conference

Conference29th International Conference on Research in Computational Molecular Biology, RECOMB 2025
Country/TerritoryKorea, Republic of
CitySeoul
Period26/04/2529/04/25

Keywords

  • Birth-Death Processes
  • Phylogenetics
  • Prokaryotic Genome Dynamics
  • Statistical Consistency

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Untying Rates of Gene Gain and Loss Leads to a New Phylogenetic Approach'. Together they form a unique fingerprint.

Cite this