The Limits of the Metapopulation: Lineage Fragmentation in a Widespread Terrestrial Salamander (Plethodon cinereus)Waldron, Brian P; Watts, Emily F; Morgan, Donald J; Hantak, Maggie M; Lemmon, Alan R; Lemmon, Emily C Moriarty; Kuchta, Shawn R
doi: 10.1093/sysbio/syae053pmid: 39250721
In vicariant species formation, divergence results primarily from periods of allopatry and restricted gene flow. Widespread species harboring differentiated, geographically distinct sublineages offer a window into what may be a common mode of species formation, whereby a species originates, spreads across the landscape, then fragments into multiple units. However, incipient lineages usually lack reproductive barriers that prevent their fusion upon secondary contact, blurring the boundaries between a single, large metapopulation-level lineage and multiple independent species. Here, we explore this model of species formation in the Eastern Red-backed Salamander (Plethodon cinereus), a widespread terrestrial vertebrate with at least 6 divergent mitochondrial clades throughout its range. Using anchored hybrid enrichment data, we applied phylogenomic and population genomic approaches to investigate patterns of divergence, gene flow, and secondary contact. Genomic data broadly match most mitochondrial groups but reveal mitochondrial introgression and extensive admixture at several contact zones. While species delimitation analyses in Bayesian Phylogenetics and Phylogeography supported 5 lineages of P. cinereus, genealogical divergence indices (gdi) were highly sensitive to the inclusion of admixed samples and the geographic representation of candidate species, with increasing support for multiple species when removing admixed samples or limiting sampling to a single locality per group. An analysis of morphometric data revealed differences in body size and limb proportions among groups, with a reduction of forelimb length among warmer and drier localities consistent with increased fossoriality. We conclude that P. cinereus is a single species, but one with highly structured component lineages of various degrees of independence.
Phylogenomics of Bivalvia Using Ultraconserved Elements Reveal New Topologies for Pteriomorphia and ImparidentiaLi, Yi-Xuan; Ip, Jack Chi-Ho; Chen, Chong; Xu, Ting; Zhang, Qian; Sun, Yanan; Ma, Pei-Zhen; Qiu, Jian-Wen
doi: 10.1093/sysbio/syae052pmid: 39283716
Despite significant advances in phylogenetics over the past decades, the deep relationships within Bivalvia (phylum Mollusca) remain inconclusive. Previous efforts based on morphology or several genes have failed to resolve many key nodes in the phylogeny of Bivalvia. Advances have been made recently using transcriptome data, but the phylogenetic relationships within Bivalvia historically lacked consensus, especially within Pteriomorphia and Imparidentia. Here, we inferred the relationships of key lineages within Bivalvia using matrices generated from specifically designed ultraconserved elements (UCEs) with 16 available genomic resources and 85 newly sequenced specimens from 55 families. Our new probes (Bivalve UCE 2k v.1) for target sequencing captured an average of 849 UCEs with 1085 bp in mean length from in vitro experiments. Our results introduced novel schemes from 6 major clades (Protobranchina, Pteriomorphia, Palaeoheterodonta, Archiheterodonta, Anomalodesmata, and Imparidentia), though some inner nodes were poorly resolved, such as paraphyletic Heterodonta in some topologies potentially due to insufficient taxon sampling. The resolution increased when analyzing specific matrices for Pteriomorphia and Imparidentia. We recovered 3 Pteriomorphia topologies different from previously published trees, with the strongest support for ((Ostreida + (Arcida + Mytilida)) + (Pectinida + (Limida + Pectinida))). Limida were nested within Pectinida, warranting further studies. For Imparidentia, our results strongly supported the new hypothesis of (Galeommatida + (Adapedonta + Cardiida)), while the possible non-monophyly of Lucinida was inferred but poorly supported. Overall, our results provide important insights into the phylogeny of Bivalvia and show that target enrichment sequencing of UCEs can be broadly applied to study both deep and shallow phylogenetic relationships.
Assessing the Adequacy of Morphological Models Using Posterior Predictive SimulationsMulvey, Laura P A; May, Michael R; Brown, Jeremy M; Höhna, Sebastian; Wright, April M; Warnock, Rachel C M
doi: 10.1093/sysbio/syae055pmid: 39374100
Reconstructing the evolutionary history of different groups of organisms provides insight into how life originated and diversified on Earth. Phylogenetic trees are commonly used to estimate this evolutionary history. Within Bayesian phylogenetics a major step in estimating a tree is in choosing an appropriate model of character evolution. While the most common character data used is molecular sequence data, morphological data remains a vital source of information. The use of morphological characters allows for the incorporation fossil taxa, and despite advances in molecular sequencing, continues to play a significant role in neontology. Moreover, it is the main data source that allows us to unite extinct and extant taxa directly under the same generating process. We therefore require suitable models of morphological character evolution, the most common being the Mk Lewis model. While it is frequently used in both palaeobiology and neontology, it is not known whether the simple Mk substitution model, or any extensions to it, provide a sufficiently good description of the process of morphological evolution. In this study we investigate the impact of different morphological models on empirical tetrapod datasets. Specifically, we compare unpartitioned Mk models with those where characters are partitioned by the number of observed states, both with and without allowing for rate variation across sites and accounting for ascertainment bias. We show that the choice of substitution model has an impact on both topology and branch lengths, highlighting the importance of model choice. Through simulations, we validate the use of the model adequacy approach, posterior predictive simulations, for choosing an appropriate model. Additionally, we compare the performance of model adequacy with Bayesian model selection. We demonstrate how model selection approaches based on marginal likelihoods are not appropriate for choosing between models with partition schemes that vary in character state space (i.e., that vary in Q-matrix state size). Using posterior predictive simulations, we found that current variations of the Mk model are often performing adequately in capturing the evolutionary dynamics that generated our data. We do not find any preference for a particular model extension across multiple datasets, indicating that there is no “one size fits all” when it comes to morphological data and that careful consideration should be given to choosing models of discrete character evolution. By using suitable models of character evolution, we can increase our confidence in our phylogenetic estimates, which should in turn allow us to gain more accurate insights into the evolutionary history of both extinct and extant taxa.
Inference of Phylogenetic Networks From Sequence Data Using Composite LikelihoodKong, Sungsik; Swofford, David L; Kubatko, Laura S
doi: 10.1093/sysbio/syae054pmid: 39387633
While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between 2 species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing 2 branches to merge into 1, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes–Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than 2 existing composite likelihood summary methods (SNaQand PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.
A Phylogenomic Backbone for Acoelomorpha Inferred From Transcriptomic DataAbalde, Samuel; Jondelius, Ulf
doi: 10.1093/sysbio/syae057pmid: 39451056
Xenacoelomorpha are mostly microscopic, morphologically simple worms, lacking many structures typical of other bilaterians. Xenacoelomorphs—which include three main groups, namely Acoela, Nemertodermatida, and Xenoturbella—have been proposed to be an early diverging Bilateria, sister to protostomes and deuterostomes, but other phylogenomic analyses have recovered this clade nested within the deuterostomes, as sister to Ambulacraria. The position of Xenacoelomorpha within the metazoan tree has understandably attracted a lot of attention, overshadowing the study of phylogenetic relationships within this group. Given that Xenoturbella includes only six species whose relationships are well understood, we decided to focus on the most speciose Acoelomorpha (Acoela + Nemertodermatida). Here, we have sequenced 29 transcriptomes, doubling the number of sequenced species, to infer a backbone tree for Acoelomorpha based on genomic data. The recovered topology is mostly congruent with previous studies. The most important difference is the recovery of Paratomella as the first off-shoot within Acoela, dramatically changing the reconstruction of the ancestral acoel. Besides, we have detected incongruence between the gene trees and the species tree, likely linked to incomplete lineage sorting, and some signal of introgression between the families Dakuidae and Mecynostomidae, which hampers inferring the correct placement of this family and, particularly, of the genus Notocelis. We have also used this dataset to infer for the first time diversification times within Acoelomorpha, which coincide with known bilaterian diversification and extinction events. Given the importance of morphological data in acoelomorph phylogenetics, we tested several partitions and models. Although morphological data failed to recover a robust phylogeny, phylogenetic placement has proven to be a suitable alternative when a reference phylogeny is available.
Complex Models of Sequence Evolution Improve Fit, But Not Gene Tree Discordance, for Tetrapod MitogenomesToups, Benjamin S; Thomson, Robert C; Brown, Jeremy M
doi: 10.1093/sysbio/syae056pmid: 39392926
Variation in gene tree estimates is widely observed in empirical phylogenomic data and is often assumed to be the result of biological processes. However, a recent study using tetrapod mitochondrial genomes to control for biological sources of variation due to their haploid, uniparentally inherited, and non-recombining nature found that levels of discordance among mitochondrial gene trees were comparable to those found in studies that assume only biological sources of variation. Additionally, they found that several of the models of sequence evolution chosen to infer gene trees were doing an inadequate job of fitting the sequence data. These results indicated that significant amounts of gene tree discordance in empirical data may be due to poor fit of sequence evolution models and that more complex and biologically realistic models may be needed. To test how the fit of sequence evolution models relates to gene tree discordance, we analyzed the same mitochondrial data sets as the previous study using 2 additional, more complex models of sequence evolution that each include a different biologically realistic aspect of the evolutionary process: A covarion model to incorporate site-specific rate variation across lineages (heterotachy), and a partitioned model to incorporate variable evolutionary patterns by codon position. Our results show that both additional models fit the data better than the models used in the previous study, with the covarion being consistently and strongly preferred as tree size increases. However, even these more preferred models still inferred highly discordant mitochondrial gene trees, thus deepening the mystery around what we label the “Mito-Phylo Paradox” and leading us to ask whether the observed variation could, in fact, be biological in nature after all.
Phylogenetic Tree Instability After Taxon Addition: Empirical Frequency, Predictability, and Consequences For Online InferenceCollienne, Lena; Barker, Mary; Suchard, Marc A; Matsen, Frederick A
doi: 10.1093/sysbio/syae059pmid: 39453463
Online phylogenetic inference methods add sequentially arriving sequences to an inferred phylogeny without the need to recompute the entire tree from scratch. Some online method implementations exist already, but there remains concern that additional sequences may change the topological relationship among the original set of taxa. We call such a change in tree topology a lack of stability for the inferred tree. In this article, we analyze the stability of single taxon addition in a Maximum Likelihood framework across 1000 empirical datasets. We find that instability occurs in almost 90% of our examples, although observed topological differences do not always reach significance under the approximately unbiased (AU) test. Changes in tree topology after addition of a taxon rarely occur close to its attachment location, and are more frequently observed in more distant tree locations carrying low bootstrap support. To investigate whether instability is predictable, we hypothesize sources of instability and design summary statistics addressing these hypotheses. Using these summary statistics as input features for machine learning under random forests, we are able to predict instability and can identify the most influential features. In summary, it does not appear that a strict insertion-only online inference method will deliver globally optimal trees, although relaxing insertion strictness by allowing for a small number of final tree rearrangements or accepting slightly suboptimal solutions appears feasible.
Complex Hybridization in a Clade of Polytypic Salamanders (Plethodontidae: Desmognathus) Uncovered by Estimating Higher-Level Phylogenetic NetworksPyron, R Alexander; O’Connell, Kyle A; Myers, Edward A; Beamer, David A; Baños, Hector
doi: 10.1093/sysbio/syae060pmid: 39468736
Reticulation between radiating lineages is a common feature of diversification. We examine these phenomena in the Pisgah clade of Desmognathus salamanders from the southern Appalachian Mountains of the eastern United States. The group contains 4–7 species exhibiting 2 discrete phenotypes, aquatic “shovel-nosed” and semi-aquatic “black-bellied” forms. These ecomorphologies are ancient and have apparently been transmitted repeatedly between lineages through introgression. Geographically proximate populations of both phenotypes exhibit admixture, and at least 2 black-bellied lineages have been produced via reticulations between shovel-nosed parentals, suggesting potential hybrid speciation dynamics. However, computational constraints currently limit our ability to reconstruct network radiations from gene-tree data. Available methods are limited to level-1 networks wherein reticulations do not share edges, and higher-level networks may be non-identifiable in many cases. We present a heuristic approach to recover information from higher-level networks across a range of potentially identifiable empirical scenarios, supported by theory and simulation. When extrinsic information indicates the location and direction of reticulations, our method can successfully estimate a reduced possible set of nonlevel-1 networks. Phylogenomic data support a single backbone topology with up to 5 overlapping hybrid edges in the Pisgah clade. These results suggest an unusual mechanism of ecomorphological hybrid speciation, wherein a binary threshold trait causes some hybrid populations to shift between microhabitat niches, promoting ecological divergence between sympatric hybrids and parentals. This contrasts with other well-known systems in which hybrids exhibit intermediate, novel, or transgressive phenotypes. The genetic basis of these phenotypes is unclear and further data are needed to clarify the evolutionary basis of morphological changes with ecological consequences.
Rapid Evolution of Host Repertoire and Geographic Range in a Young and Diverse Genus of Montane ButterfliesMo, Shifang; Zhu, Yaowei; Braga, Mariana P; Lohman, David J; Nylin, Sören; Moumou, Ashraf; Wheat, Christopher W; Wahlberg, Niklas; Wang, Min; Ma, Fangzhou; Zhang, Peng; Wang, Houshuai
doi: 10.1093/sysbio/syae061pmid: 39484941
Evolutionary changes in geographic distribution and larval host plants may promote the rapid diversification of montane insects, but this scenario has been rarely investigated. We studied the rapid radiation of the butterfly genus Colias, which has diversified in mountain ecosystems in Eurasia, Africa, and the Americas. Based on a data set of 150 nuclear protein-coding genetic loci and mitochondrial genomes, we constructed a time-calibrated phylogenetic tree of Colias species with broad taxon sampling. We then inferred their ancestral geographic ranges, historical diversification rates, and the evolution of host use. We found that the most recent common ancestor of Colias was likely geographically widespread and originated ~3.5 Ma. The group subsequently diversified in different regions across the world, often in tandem with geographic expansion events. No aspect of elevation was found to have a direct effect on diversification. The genus underwent a burst of diversification soon after the divergence of the Neotropical lineage, followed by an exponential decline in diversification rate toward the present. The ancestral host repertoire included the legume genera Astragalus and Trifolium but later expanded to include a wide range of Fabaceae genera and plants in more distantly related families, punctuated with periods of host range expansion and contraction. We suggest that the widespread distribution of the ancestor of all extant Colias lineages set the stage for diversification by isolation of populations that locally adapted to the various different environments they encountered, including different host plants. In this scenario, elevation is not the main driver but might have accelerated diversification by isolating populations.
How to Validate a Bayesian Evolutionary ModelMendes, Fábio K; Bouckaert, Remco; Carvalho, Luiz M; Drummond, Alexei J
doi: 10.1093/sysbio/syae064pmid: 39506375
Biology has become a highly mathematical discipline in which probabilistic models play a central role. As a result, research in the biological sciences is now dependent on computational tools capable of carrying out complex analyses. These tools must be validated before they can be used, but what is understood as validation varies widely among methodological contributions. This may be a consequence of the still embryonic stage of the literature on statistical software validation for computational biology. Our manuscript aims to advance this literature. Here, we describe, illustrate, and introduce new good practices for assessing the correctness of a model implementation with an emphasis on Bayesian methods. We also introduce a suite of functionalities for automating validation protocols. It is our hope that the guidelines presented here help sharpen the focus of discussions on (as well as elevate) expected standards of statistical software for biology.