Abstract
Distance-based methods for phylogenetic reconstruction are based on a two-step approach: first, pairwise distances are computed from DNA sequences associated with a given set of taxa, and then these distances are used to reconstruct the phylogenetic relationships between taxa. Because the estimated distances are based on finite sequences, they are inherently noisy, and this noise may result in reconstruction errors. Previous attempts to improve reconstruction accuracy focused either on improving the robustness of reconstruction algorithms to this stochastic noise, or on improving the accuracy of the distance estimates. Here, we aim to further improve reconstruction accuracy by utilizing the basic observation that reconstruction algorithms are based on a series of comparisons between distances (or linear combinations of distances). We start by examining the relationship between the stochastic noise in the sequence data and the accuracy of the comparisons between pairwise distance estimates. This examination results in improved methods for distance comparison, which are shown to be as accurate as likelihood-based methods, while being much simpler and more efficient to compute. We then extend these methods to improve reconstruction accuracy of quartet trees, and examine some of the challenges moving forward.
Original language | English |
---|---|
Pages (from-to) | 88-99 |
Number of pages | 12 |
Journal | Journal of Theoretical Biology |
Volume | 440 |
DOIs | |
State | Published - 7 Mar 2018 |
Keywords
- Adaptive distance estimation
- DNA substitution models
- Distance comparison
- Distance-based phylogenetic reconstruction
- Fisher's linear discriminant
ASJC Scopus subject areas
- Statistics and Probability
- Modeling and Simulation
- General Biochemistry, Genetics and Molecular Biology
- General Immunology and Microbiology
- General Agricultural and Biological Sciences
- Applied Mathematics