The software MetaPIGA 2 also implements:
- Simple dataset quality control (testing for the presence of identical sequences as well as for excessively ambiguous or excessively divergent sequences);
- Automated trimming of poorly aligned regions using the trimAl algorithm;
- The Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for the easy selection of nucleotide and amino-acid substitution models that best fit the data;
- Ancestral-state reconstruction of all nodes in the tree.
MetaPIGA 2 uses standard formats for data sets and trees, is platform independent, runs in 32- and 64-bits systems, and takes advantage of multiprocessor and/or multicore computers. A version for Grid computing is in development.
Citing MetaPIGA 2
MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristicsRaphaël Helaers & Michel C. Milinkovitch
BMC Bioinformatics 2010, 11:379
http://bioinformatics.oxfordjournals.org/content/25/2/197.full
Phylogenetic inference under recombination using Bayesian stochastic topology selection
Abstract
Motivation: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths.
Results: We demonstrate our approach on simulated data and also to the genome of a suspected HIV recombinant strain as well as to an investigation of recombination in the sequences of 15 laboratory mouse strains sequenced by Perlegen Sciences. Our findings indicate that our method allows us to distinguish between rate heterogeneity and variation in phylogeny caused by recombination without being restricted to 4-taxa data.
Availability: The method has been implemented in JAVA and is available, along with data studied here, from http://www.stats.ox.ac.uk/~webb.
Contact: cholmes@stats.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
http://www.stats.ox.ac.uk/__data/assets/pdf_file/0005/4010/large_pedigrees.pdf
http://www.cs.cmu.edu/~guestrin/Class/10701-S07/Handouts/recitations/HMM-inference.pdf
Probabilistic Phylogenetic Inference with Insertions and Deletions
Abstract Top
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth–death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program DNAML in PHYLIP. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program DNAMLε improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.Author Summary Top
We describe a computationally efficient method to use insertion and deletion events, in addition to substitutions, in phylogenetic inference. To date, many evolutionary models in probabilistic phylogenetic inference methods have only accounted for substitution events, not for insertions and deletions. As a result, not only do tree inference methods use less sequence information than they could, but also it has remained difficult to integrate phylogenetic modeling into sequence alignment methods (such as profiles and profile-hidden Markov models) that inherently require a model of insertion and deletion events. Therefore an important goal in the field has been to develop tractable evolutionary models of insertion/deletion events over time of sufficient accuracy to increase the resolution of phylogenetic inference methods and to increase the power of profile-based sequence homology searches. Our model offers a partial answer to this problem. We show that our model generally improves inference power in both simulated and real data and that it is easily implemented in the framework of standard inference packages with little effect on computational efficiency (we extended DNAML, in Felsenstein's popular PHYLIP package).Citation: Rivas E, Eddy SR (2008) Probabilistic Phylogenetic Inference with Insertions and Deletions. PLoS Comput Biol 4(9): e1000172. doi:10.1371/journal.pcbi.1000172
Editor: David Haussler, University of California Santa Cruz, United States of America
Received: October 24, 2007; Accepted: July 31, 2008; Published: September 19, 2008
Copyright: © 2008 Rivas, Eddy. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was funded by the Howard Hughes Medical Institute.
Competing interests: The authors have declared that no competing interests exist.
* E-mail: rivase@janelia.hhmi.org
Materials and Methods Top
The C source code for the modified PHYLIP 3.66 package [14] that contains the program DNAMLε , the C source code for evolving sequences with the generative model (εRATE ), the modified ROSE package (version 1.3) [76], as well as all the Perl scripts and datasets used to generate the results presented in this paper are provided as a tarball in Dataset S1. The program DNAMLε uses the EASEL sequence analysis library (SRE, unpublished) which is also provided.- PMCID:
- PMC3013127
- PMCID:
- PMC2746219
| Title: A stochastic evolution model for residue Insertion-Deletion Independent from Substitution Author(s): Lebre S, Michel CJ Source: COMPUTATIONAL BIOLOGY AND CHEMISTRY Volume: 34 Issue: 5-6 Pages: 259-267 Published: DEC 2010 Times Cited: 0 |
Title: Genomes as documents of evolutionary history
Author(s): Boussau B, Daubin V
Source: TRENDS IN ECOLOGY & EVOLUTION Volume: 25 Issue: 4 Pages: 224-232 Published: APR 2010
Times Cited: 2
Phylogenetic inference under varying proportions of indel-induced alignment gaps