TY - JOUR
AU1 - Holland, Barbara R.
AB - When doing phylogenetic analyses biologists usually restrict themselves to “canned” packages, such as PAUP*, PAML, or MrBayes. Canned packages are typically fairly easy to learn and use, and they offer a range of analysis methods. The great advantage of these packages is that they allow the practitioner to focus on evolutionary questions without needing to understand the mathematics and computer science behind the algorithms they are applying. Developers of phylogenetic methods, on the other hand, have no choice but to learn programming languages. This might seem like an obvious division, but relatively new programming environments like Python and R mean that the boundary between biologist and programmer is getting much more blurry. These languages are built up from packaged modules, which can be combined in any way the user desires; and it is straightforward for users to create new modules of their own. Languages like R take longer to learn than canned packages, but their nature allows the user to “mix and match” a wide range of methods that have been developed for phylogenetic analysis or to develop their own methods. The real question is whether the advantages of doing phylogenetics in an environment like R outweigh the increased learning cost—Emmanuel Paradis would say an emphatic “yes,” and he pursues this line of thought in the book being reviewed here. Paradis is one of the authors of the R package “APE: Analyses of Phylogenetics and Evolution”; in his new book he sets out how this package and other features of R can be used to perform phylogenetic analyses. R, according to its Internet homepage, is “a language and environment for statistical computing and graphics … that provides a wide variety of statistical and graphical techniques, and is highly extensible.” Importantly, R is free as well as open source. The fact that modern phylogenetics is a statistically based discipline makes R a natural choice for developing phylogenetic tools, and so a book that describes the current state of the play is timely. This book will appeal to two main audiences: phylogenetic practitioners seeking to perform analyses without worrying about the intricacies of how different R modules get things done and developers of phylogenetic algorithms. Doing phylogenetics in R is a fundamentally different approach from using packaged software such as PAUP* or MrBayes. In the latter programs you load your data, perform some analysis that the designer of the program has predefined (e.g., a search for the maximum-likelihood tree under some explicit model), and then output the result (e.g., a tree with branch lengths). Certainly, the program will manipulate the data along the way, but the user has no access to the data or the manipulations. The use of scripts (e.g., PAUP blocks) makes these programs potentially very powerful, but you are still limited to the set of commands provided. In R the data set is stored in active memory, and the ways you can manipulate it are limited only by your imagination, programming skills, and the methods that others have made available. In some cases these limitations may be severe! The downside of this flexibility is that there is going to be a long learning curve for users unfamiliar with R—an R newbie is not going to begin by designing new packages. A more likely scenario is that they will use some package that has already been developed. After learning to use that package effectively, they will gain confidence to use other packages and perhaps eventually design their own analyses using the general features of R. The book by Paradis seems to have been written with this scenario in mind—it is filled with examples and case studies illustrating how to use features of the existing packages. It is the sort of book where you sit in front of your computer and read a bit, tinker a bit, read a bit more, and gradually increase your confidence with the R environment. The book begins with a section on why Paradis thinks R is an effective environment for phylogenetic analysis. He makes a good case for using R as a platform, emphasizing R's flexibility and the opportunity to do highly integrated analyses. Some of the reasons for using R are forward-looking in the sense that, although R doesn't currently do as many different phylogenetic analyses as people might like, in time a greater range of methods will be implemented. This introductory chapter includes details on how to find and install the basic R system and the packages required for phylogenetics. Starting from scratch on a computer without R, I found it fairly straightforward to get up and running. Chapter 2 gives a whirlwind introduction to the general features of R. Most people would want to look at some other tutorial material in addition to what is presented here. This is not a criticism, and indeed Paradis suggests where to look for such tutorials. Chapter 3 introduces us to the two main R packages that have been developed for doing phylogenetics—APE and ADE4—and to the phylogenetic data structures they use to represent trees and sequences. R has a number of useful features for manipulating trees, along with simple reading, writing, and file-format conversion, that include interchanging between dichotomous and multichotomous trees, removing particular taxa, and rooting and unrooting trees. These seem like little things, but they rapidly become tedious and error prone if you attempt to do them by manually editing Newick-format files. I found Chapter 4 particularly interesting—it deals with plotting phylogenies. Here we are playing to one of R's strengths in an area that has, as Paradis points out, been neglected in the phylogenetics discipline. R provides very flexible tools for plotting phylogenies. The book gives examples of how to produce trees that are rooted, unrooted, or radial, with nodes annotated with text, colored circles, or even mini-barcharts. There are also facilities for exploring and displaying very large phylogenies by zooming in on subtrees. This chapter will appeal to anyone who has had to prepare all but the most basic tree for publication. Originally I thought it was odd to have the chapter on plotting trees before the chapter on phylogeny estimation, but having read it I think that the plotting capabilities of R could well be the hook that gets many phylogeneticists to start using it. Chapter 5 covers phylogeny estimation in R. To date, there are some distance-based methods available (NJ and simple clustering algorithms) as well as maximum likelihood. Paradis suggests that Bayesian methods, although not currently available in R, will be straightforward to implement, because R was designed with statistical estimation in mind, as many of the required ingredients already exist. A section of the chapter is devoted to models of DNA substitution. The DNAmodel function in APE allows very flexible model specification, including both partitions and mixtures. There is an example of implementing a partitioned model, but I would have liked an example showing how to create a mixture model as well. Other topics covered include model testing, bootstrap, consensus, and molecular dating. For most biologists finding the best phylogenetic tree to describe some set of species is not an end in itself, but rather is a stepping stone to answering some question about evolution. Chapter 6 shows how methods in the APE and ADE4 packages use R to help answer macroevolutionary questions. This is obviously a key interest of the developers of these packages, as there is a wide range of methods available for comparing observations across species with respect to an underlying phylogeny. Chapter 7 is specifically aimed at people wanting to develop algorithms in R. It doesn't aim to be comprehensive but rather gives a number of useful suggestions, strategies and pointers to the appropriate literature. I think people will find Analysis of Phylogenetics and Evolution with R to be a very useful reference book. It doesn't try to be everything to both the phylogenetic practitioner and the algorithm developer, but instead takes the approach of providing something to get everyone interested and also somewhere down the road of using R productively. There are plenty of signposts throughout the book to let you know where to look for further information. Supporting the use of a language like R has the potential to be very useful to the phylogenetics community. Firstly, it helps the developers of new algorithms by providing an environment where it is relatively easy to develop code without having to reinvent the wheel to do basic things like input and output of phylogenetic data, or even less basic things like defining models of DNA substitution. As an algorithm developer, working in R is likely to speed up your ability to turn an idea for an algorithm into something that can be used by others. The corollary of this is that R has the potential to be very good for practitioners in that, once they have learned to navigate the R environment, they get great flexibility and access to an ever-increasing number of specialist packages. Certainly, the cost in time of learning a language like R exceeds the cost of learning a program like PAUP*, but the long-term benefits are also likely to be greater—and the more people that get involved in doing phylogenetics in R, then the more modules will become available. So overcome that activation-energy hump, buy the book, and get involved! © 2007 Society of Systematic Biologists
TI - Analysis of Phylogenetics and Evolution with R
JF - Systematic Biology
DO - 10.1080/10635150701475589
DA - 2007-08-01
UR - https://www.deepdyve.com/lp/oxford-university-press/analysis-of-phylogenetics-and-evolution-with-r-KbkGYNmUfs
SP - 694
EP - 696
VL - 56
IS - 4
DP - DeepDyve
ER -