Abstract
In this study, we present a novel methodology to infer indel parameters from multiple sequence alignments (MSAs) based on simulations. Our algorithm searches for the set of evolutionary parameters describing indel dynamics which best fits a given input MSA. In each step of the search, we use parametric bootstraps and the Mahalanobis distance to estimate how well a proposed set of parameters fits input data. Using simulations, we demonstrate that our methodology can accurately infer the indel parameters for a large variety of plausible settings. Moreover, using our methodology, we show that indel parameters substantially vary between three genomic data sets: Mammals, bacteria, and retroviruses. Finally, we demonstrate how our methodology can be used to simulate MSAs based on indel parameters inferred from real data sets.
| Original language | English |
|---|---|
| Pages (from-to) | 3226-3238 |
| Number of pages | 13 |
| Journal | Genome Biology and Evolution |
| Volume | 7 |
| Issue number | 12 |
| DOIs | |
| State | Published - Dec 2015 |
Keywords
- Alignments
- Indels
- Mahalanobis distance
- Phylogeny
- Simulations
All Science Journal Classification (ASJC) codes
- General Medicine