Skip to main navigation Skip to search Skip to main content

Machine-learning of complex evolutionary signals improves classification of SNVs

Sapir Labes, Doron Stupp, Naama Wagner, Idit Bloch, Michal Lotem, Ephrat L. Lahad, Paz Polak, Tal Pupko, Yuval Tabach

Research output: Contribution to journalArticlepeer-review

Abstract

Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.

Original languageEnglish
Article numberlqac025
JournalNAR Genomics and Bioinformatics
Volume4
Issue number2
DOIs
StatePublished - 1 Jun 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 15 - Life on Land
    SDG 15 Life on Land

All Science Journal Classification (ASJC) codes

  • Applied Mathematics
  • Genetics
  • Molecular Biology
  • Structural Biology
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Machine-learning of complex evolutionary signals improves classification of SNVs'. Together they form a unique fingerprint.

Cite this