Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies Vol. 30 no. 9 2014, pages 1312–1313 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btu033 Phylogenetics Advance Access publication January 21, 2014 RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies 1,2 Alexandros Stamatakis 1 2 Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg and Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany Associate Editor: Jonathan Wren standard bootstrap search that relies on algorithmic shortcuts ABSTRACT and approximations to speed up the search process. Motivation: Phylogenies are increasingly used in all fields of medical It also offers an option to calculate the so-called SH-like and biological research. Moreover, because of the next-generation support values (Guindon et al., 2010). I recently implemented sequencing revolution, datasets used for conducting phylogenetic a method that allows for computing RELL (Resampling analyses grow at an unprecedented pace. RAxML (Randomized Estimated Log Likelihoods) bootstrap support as described by Axelerated Maximum Likelihood) is a popular program for phylogen- Minh et al. (2013). etic analyses of large datasets under maximum likelihood. Since the Apart from this, RAxML also offers a so-called bootstopping last RAxML paper in 2006, it has been continuously maintained and option (Pattengale et al., 2010).When thisoptionisused, extended to accommodate the increasingly growing input datasets RAxML will automatically determine how many bootstrap rep- and to serve the needs of the user community. licates are required to obtain stable support values. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and 2.2 Models and data types AVX2 vector intrinsics, techniques for reducing the memory require- Apart from DNA and protein data, RAxML now also supports ments of the code and a plethora of operations for conducting post- binary, multi-state morphological and RNA secondary structure analyses on sets of trees. In addition, an up-to-date 50-page user data. It can correct for ascertainment bias (Lewis, 2001) for all of manual covering all new RAxML options is available. the above data types. This might be useful not only for morpho- Availability and implementation: The code is available under GNU logical data matrices that only contain variable sites but also for GPL at https://github.com/stamatak/standard-RAxML. alignments of SNPs. Contact: alexandros.stamatakis@h-its.org The number of available protein substitution models has been Supplementary information: Supplementary data are available at significantly extended and comprises a general time reversible Bioinformatics online. (GTR) model, as well as the computationally more complex Received on December 22, 2013; revised and accepted on LG4M and LG4X models (Le et al., 2012). RAxML can also January 14, 2014 automatically determine the best-scoring protein substitution model. Finally, a new option for conducting a maximum likelihood estimate of the base frequencies has become available. 1INTRODUCTION RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analysis of large datasets 2.3 Parallel versions under maximum likelihood. Its major strength is a fast maximum RAxML offers a fine-grain parallelization of the likelihood func- likelihood tree search algorithm that returns trees with good tion for multi-core systems via the PThreads-based version and a likelihood scores. Since the last RAxML paper (Stamatakis, coarse-grain parallelization of independent tree searches via MPI 2006), it has been continuously maintained and extended to ac- (Message Passing Interface). It also supports coarse-grain/fine- commodate the increasingly growing input datasets and to serve grain parallelism via the hybrid MPI/PThreads version (Pfeiffer the needs of the user community. In the following, I will present and Stamatakis, 2010). some of the most notable new features and extensions of RAxML. Note that, for extremely large analyses on supercomputers, using the dedicated sister program ExaML [Exascale Maximum Likelihood (Stamatakis and Aberer, 2013)] is recommended. 2 NEW FEATURES 2.1 Bootstrapping and support values 2.4 Post-analysis of trees RAxML offers four different ways to obtain bootstrap support. RAxML offers a plethora of post-analysis functions for sets of It implements the standard non-parametric bootstrap and also trees. Apart from standard statistical significance tests, it offers the so-called rapid bootstrap (Stamatakis et al., 2008), which is a efficient (and partially parallelized) operations for computing The Author 2014. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com RAxML version 8 Robinson–Foulds distances, as well as extended majority rule, 3 USER SUPPORT AND FUTURE WORK majority rule and strict consensus trees (Aberer et al., 2010). User support is provided via the RAxML Google group Beyond this, it implements a method for identifying the so- at: https://groups.google.com/forum/?hl¼en#!forum/raxml. The called rogue taxa (Pattengale et al., 2011), and I recently imple- RAxML source code contains a comprehensive manual and mented options for calculating the TC (Tree Certainty) and IC there is a step-by-step tutorial with some basic commands avail- (Internode Certainty) measures as introduced by Salichos and able at http://www.exelixis-lab.org/web/software/raxml/hands_ Rokas (2013). on.html. Further resources are available via the RAxML soft- Finally, there is the new plausibility checker option (Dao et al., ware page at http://www.exelixis-lab.org/web/software/raxml/ 2013) that allows computing the RF distances between a huge phyl- Future work includes the continued maintenance of RAxML, ogeny with tens of thousands of taxa and several smaller more the adaptation to novel computer architectures and the implemen- accurate reference phylogenies that contain a strict subset of the tation of novel models and datatypes, in particular codon models. taxa in the huge tree. This option can be used to automatically assess the quality of huge trees that can not be inspected by eye. ACKNOWLEDGEMENT 2.5 Analyzing next-generation sequencing data The author thank several colleagues for contributing code to RAxML offers two algorithms for preparing and analyzing next- RAxML: Andre J. Aberer, Simon Berger, Alexey Kozlov, Nick generation sequencing data. A sliding-window approach (unpub- Pattengale, Wayne Pfeiffer, Akifumi S. Tanabe, David Dao and lished) is available to assess which regions of a gene (e.g. 16S) Charlie Taylor. exhibit strong and stable phylogenetic signal to support decisions Funding: This work was funded by institutional funding provided about which regions to amplify. Apart from that, RAxML also by the Heidelberg Institute for Theoretical Studies. implements parsimony and maximum likelihood flavors of the evolutionary placement algorithm [EPA (Berger et al., 2011)] Conflict of Interest: none declared. that places short reads into a given reference phylogeny obtained from full-length sequences to determine the evolutionary origin of the reads. It also offers placement support statistics for those REFERENCES reads by calculating likelihood weights. This option can also be Aberer,A.J. et al. (2010) Parallelized phylogenetic post-analysis on multi-core archi- used to place fossils into a given phylogeny (Berger and tectures. J. Comput. Sci., 1, 107–114. Stamatakis, 2010) or to insert different outgroups into the tree Berger,S.A. and Stamatakis,A. (2010) Accuracy of morphology-based phylogenetic fossil placement under maximum likelihood. In: International Conference on a posteriori, that is, after the inference of the ingroup phylogeny. Computer Systems and Applications (AICCSA), 2010 IEEE/ACS.IEEE, New York, USA, pp. 1–9. 2.6 Vector intrinsics Berger,S.A. et al. (2011) Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol., 60, RAxML uses manually inserted and optimized x86 vector intrin- 291–302. sics to accelerate the parsimony and likelihood calculations. Dao,D. et al. (2013) Automated plausibility analysis of large phyolgenies. Technical It supports SSE3, AVX and AVX2 (using fused multiply-add report. Karlsruhe Institute of Technology. Guindon,S. et al. (2010) New algorithms and methods to estimate maximum-like- instructions) intrinsics. For a small single-gene DNA alignment lihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol., 59, using the  model of rate heterogeneity, the unvectorized version 307–321. of RAxML requires 111.5 s, the SSE3 version 84.4 s and the Izquierdo-Carrasco,F. et al. (2011) Algorithms, data structures, and numerics AVX version 66.22 s to complete a simple tree search on an for likelihood-based phylogenetic inference of huge trees. BMC Intel i7-2620 M core running at 2.70 GHz under Ubuntu Linux. Bioinformatics, 12,470. Le,S.Q. et al. (2012) Modeling protein evolution with several amino acid replace- The differences between AVX and AVX2 are less pronounced ment matrices depending on site rates. Mol. Biol. Evol., 29, 2921–2936. and are typically below 5% run time improvement. Lewis,P.O. (2001) A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol., 50, 913–925. Minh,B.Q. et al. (2013) Ultrafast approximation for phylogenetic bootstrap. Mol. 2.7 Saving memory Biol Evol., 30, 1188–1195. Because memory shortage is becoming an issue due to the grow- Pattengale,N.D. et al. (2010) How many bootstrap replicates are necessary? ing dataset sizes, RAxML implements an option for reducing J. Comput. Biol., 17, 337–354. Pattengale,N.D. et al. (2011) Uncovering hidden phylogenetic consensus in large memory footprints and potentially run times on large phyloge- data sets. IEEE/ACM Trans. Comput. Biol. Bioinforma., 8, 902–911. nomic datasets with missing data. The memory savings are pro- Pfeiffer,W. and Stamatakis,A. (2010) Hybrid mpi/pthreads parallelization of the portional to the amount of missing data in the alignment raxml phylogenetics code. In International Symposium on Parallel & (Izquierdo-Carrasco et al.,2011) Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE. IEEE, New York, USA, pp. 1–8. Salichos,L. and Rokas,A. (2013) Inferring ancient divergences requires genes with 2.8 Miscellaneous new options strong phylogenetic signals. Nature, 497, 327–331. RAxML offers options to conduct fast and more superficial tree Stamatakis,A. (2006) Raxml-vi-hpc: maximum likelihood-based phylogenetic ana- lyses with thousands of taxa and mixed models. Bioinformatics, 22, 2688–2690. searches on datasets with tens of thousands of taxa. It can also Stamatakis,A. and Aberer,A. (2013) Novel parallelization schemes for large-scale compute marginal ancestral states and offers an algorithm for likelihood-based phylogenetic inference. In IEEE 27th International Symposium rooting trees. Furthermore, it implements a sequential, on Parallel Distributed Processing (IPDPS), 2013. pp. 1195–1204. PThreads-parallelized and MPI-parallelized algorithm for com- Stamatakis,A. et al. (2008) A rapid bootstrap algorithm for the raxml web servers. puting all quartets or a subset of quartets for a given alignment. Syst. Biol., 57, 758–771. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Pubmed Central

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Bioinformatics , Volume 30 (9) – Jan 21, 2014

Loading...

Page 2

 
/lp/pubmed-central/raxml-version-8-a-tool-for-phylogenetic-analysis-and-post-analysis-of-r004MQHH8D

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Pubmed Central
Copyright
© The Author 2014. Published by Oxford University Press.
ISSN
1367-4803
eISSN
1367-4811
DOI
10.1093/bioinformatics/btu033
Publisher site
See Article on Publisher Site

Abstract

Vol. 30 no. 9 2014, pages 1312–1313 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btu033 Phylogenetics Advance Access publication January 21, 2014 RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies 1,2 Alexandros Stamatakis 1 2 Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg and Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany Associate Editor: Jonathan Wren standard bootstrap search that relies on algorithmic shortcuts ABSTRACT and approximations to speed up the search process. Motivation: Phylogenies are increasingly used in all fields of medical It also offers an option to calculate the so-called SH-like and biological research. Moreover, because of the next-generation support values (Guindon et al., 2010). I recently implemented sequencing revolution, datasets used for conducting phylogenetic a method that allows for computing RELL (Resampling analyses grow at an unprecedented pace. RAxML (Randomized Estimated Log Likelihoods) bootstrap support as described by Axelerated Maximum Likelihood) is a popular program for phylogen- Minh et al. (2013). etic analyses of large datasets under maximum likelihood. Since the Apart from this, RAxML also offers a so-called bootstopping last RAxML paper in 2006, it has been continuously maintained and option (Pattengale et al., 2010).When thisoptionisused, extended to accommodate the increasingly growing input datasets RAxML will automatically determine how many bootstrap rep- and to serve the needs of the user community. licates are required to obtain stable support values. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and 2.2 Models and data types AVX2 vector intrinsics, techniques for reducing the memory require- Apart from DNA and protein data, RAxML now also supports ments of the code and a plethora of operations for conducting post- binary, multi-state morphological and RNA secondary structure analyses on sets of trees. In addition, an up-to-date 50-page user data. It can correct for ascertainment bias (Lewis, 2001) for all of manual covering all new RAxML options is available. the above data types. This might be useful not only for morpho- Availability and implementation: The code is available under GNU logical data matrices that only contain variable sites but also for GPL at https://github.com/stamatak/standard-RAxML. alignments of SNPs. Contact: alexandros.stamatakis@h-its.org The number of available protein substitution models has been Supplementary information: Supplementary data are available at significantly extended and comprises a general time reversible Bioinformatics online. (GTR) model, as well as the computationally more complex Received on December 22, 2013; revised and accepted on LG4M and LG4X models (Le et al., 2012). RAxML can also January 14, 2014 automatically determine the best-scoring protein substitution model. Finally, a new option for conducting a maximum likelihood estimate of the base frequencies has become available. 1INTRODUCTION RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analysis of large datasets 2.3 Parallel versions under maximum likelihood. Its major strength is a fast maximum RAxML offers a fine-grain parallelization of the likelihood func- likelihood tree search algorithm that returns trees with good tion for multi-core systems via the PThreads-based version and a likelihood scores. Since the last RAxML paper (Stamatakis, coarse-grain parallelization of independent tree searches via MPI 2006), it has been continuously maintained and extended to ac- (Message Passing Interface). It also supports coarse-grain/fine- commodate the increasingly growing input datasets and to serve grain parallelism via the hybrid MPI/PThreads version (Pfeiffer the needs of the user community. In the following, I will present and Stamatakis, 2010). some of the most notable new features and extensions of RAxML. Note that, for extremely large analyses on supercomputers, using the dedicated sister program ExaML [Exascale Maximum Likelihood (Stamatakis and Aberer, 2013)] is recommended. 2 NEW FEATURES 2.1 Bootstrapping and support values 2.4 Post-analysis of trees RAxML offers four different ways to obtain bootstrap support. RAxML offers a plethora of post-analysis functions for sets of It implements the standard non-parametric bootstrap and also trees. Apart from standard statistical significance tests, it offers the so-called rapid bootstrap (Stamatakis et al., 2008), which is a efficient (and partially parallelized) operations for computing The Author 2014. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com RAxML version 8 Robinson–Foulds distances, as well as extended majority rule, 3 USER SUPPORT AND FUTURE WORK majority rule and strict consensus trees (Aberer et al., 2010). User support is provided via the RAxML Google group Beyond this, it implements a method for identifying the so- at: https://groups.google.com/forum/?hl¼en#!forum/raxml. The called rogue taxa (Pattengale et al., 2011), and I recently imple- RAxML source code contains a comprehensive manual and mented options for calculating the TC (Tree Certainty) and IC there is a step-by-step tutorial with some basic commands avail- (Internode Certainty) measures as introduced by Salichos and able at http://www.exelixis-lab.org/web/software/raxml/hands_ Rokas (2013). on.html. Further resources are available via the RAxML soft- Finally, there is the new plausibility checker option (Dao et al., ware page at http://www.exelixis-lab.org/web/software/raxml/ 2013) that allows computing the RF distances between a huge phyl- Future work includes the continued maintenance of RAxML, ogeny with tens of thousands of taxa and several smaller more the adaptation to novel computer architectures and the implemen- accurate reference phylogenies that contain a strict subset of the tation of novel models and datatypes, in particular codon models. taxa in the huge tree. This option can be used to automatically assess the quality of huge trees that can not be inspected by eye. ACKNOWLEDGEMENT 2.5 Analyzing next-generation sequencing data The author thank several colleagues for contributing code to RAxML offers two algorithms for preparing and analyzing next- RAxML: Andre J. Aberer, Simon Berger, Alexey Kozlov, Nick generation sequencing data. A sliding-window approach (unpub- Pattengale, Wayne Pfeiffer, Akifumi S. Tanabe, David Dao and lished) is available to assess which regions of a gene (e.g. 16S) Charlie Taylor. exhibit strong and stable phylogenetic signal to support decisions Funding: This work was funded by institutional funding provided about which regions to amplify. Apart from that, RAxML also by the Heidelberg Institute for Theoretical Studies. implements parsimony and maximum likelihood flavors of the evolutionary placement algorithm [EPA (Berger et al., 2011)] Conflict of Interest: none declared. that places short reads into a given reference phylogeny obtained from full-length sequences to determine the evolutionary origin of the reads. It also offers placement support statistics for those REFERENCES reads by calculating likelihood weights. This option can also be Aberer,A.J. et al. (2010) Parallelized phylogenetic post-analysis on multi-core archi- used to place fossils into a given phylogeny (Berger and tectures. J. Comput. Sci., 1, 107–114. Stamatakis, 2010) or to insert different outgroups into the tree Berger,S.A. and Stamatakis,A. (2010) Accuracy of morphology-based phylogenetic fossil placement under maximum likelihood. In: International Conference on a posteriori, that is, after the inference of the ingroup phylogeny. Computer Systems and Applications (AICCSA), 2010 IEEE/ACS.IEEE, New York, USA, pp. 1–9. 2.6 Vector intrinsics Berger,S.A. et al. (2011) Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood. Syst. Biol., 60, RAxML uses manually inserted and optimized x86 vector intrin- 291–302. sics to accelerate the parsimony and likelihood calculations. Dao,D. et al. (2013) Automated plausibility analysis of large phyolgenies. Technical It supports SSE3, AVX and AVX2 (using fused multiply-add report. Karlsruhe Institute of Technology. Guindon,S. et al. (2010) New algorithms and methods to estimate maximum-like- instructions) intrinsics. For a small single-gene DNA alignment lihood phylogenies: assessing the performance of phyml 3.0. Syst. Biol., 59, using the  model of rate heterogeneity, the unvectorized version 307–321. of RAxML requires 111.5 s, the SSE3 version 84.4 s and the Izquierdo-Carrasco,F. et al. (2011) Algorithms, data structures, and numerics AVX version 66.22 s to complete a simple tree search on an for likelihood-based phylogenetic inference of huge trees. BMC Intel i7-2620 M core running at 2.70 GHz under Ubuntu Linux. Bioinformatics, 12,470. Le,S.Q. et al. (2012) Modeling protein evolution with several amino acid replace- The differences between AVX and AVX2 are less pronounced ment matrices depending on site rates. Mol. Biol. Evol., 29, 2921–2936. and are typically below 5% run time improvement. Lewis,P.O. (2001) A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol., 50, 913–925. Minh,B.Q. et al. (2013) Ultrafast approximation for phylogenetic bootstrap. Mol. 2.7 Saving memory Biol Evol., 30, 1188–1195. Because memory shortage is becoming an issue due to the grow- Pattengale,N.D. et al. (2010) How many bootstrap replicates are necessary? ing dataset sizes, RAxML implements an option for reducing J. Comput. Biol., 17, 337–354. Pattengale,N.D. et al. (2011) Uncovering hidden phylogenetic consensus in large memory footprints and potentially run times on large phyloge- data sets. IEEE/ACM Trans. Comput. Biol. Bioinforma., 8, 902–911. nomic datasets with missing data. The memory savings are pro- Pfeiffer,W. and Stamatakis,A. (2010) Hybrid mpi/pthreads parallelization of the portional to the amount of missing data in the alignment raxml phylogenetics code. In International Symposium on Parallel & (Izquierdo-Carrasco et al.,2011) Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE. IEEE, New York, USA, pp. 1–8. Salichos,L. and Rokas,A. (2013) Inferring ancient divergences requires genes with 2.8 Miscellaneous new options strong phylogenetic signals. Nature, 497, 327–331. RAxML offers options to conduct fast and more superficial tree Stamatakis,A. (2006) Raxml-vi-hpc: maximum likelihood-based phylogenetic ana- lyses with thousands of taxa and mixed models. Bioinformatics, 22, 2688–2690. searches on datasets with tens of thousands of taxa. It can also Stamatakis,A. and Aberer,A. (2013) Novel parallelization schemes for large-scale compute marginal ancestral states and offers an algorithm for likelihood-based phylogenetic inference. In IEEE 27th International Symposium rooting trees. Furthermore, it implements a sequential, on Parallel Distributed Processing (IPDPS), 2013. pp. 1195–1204. PThreads-parallelized and MPI-parallelized algorithm for com- Stamatakis,A. et al. (2008) A rapid bootstrap algorithm for the raxml web servers. puting all quartets or a subset of quartets for a given alignment. Syst. Biol., 57, 758–771.

Journal

BioinformaticsPubmed Central

Published: Jan 21, 2014

References