The nuclear ribosomal internal transcribed spacer ( ITS ) region is the formal fungal barcode and in most cases the marker of choice for the exploration of fungal diversity in environmental samples. Two problems are particularly acute in the pursuit of satisfactory taxonomic assignment of newly generated ITS sequences: (i) the lack of an inclusive, reliable public reference data set and (ii) the lack of means to refer to fungal species, for which no Latin name is available in a standardized stable way. Here, we report on progress in these regards through further development of the UNITE database (http://unite.ut.ee) for molecular identification of fungi. All fungal species represented by at least two ITS sequences in the international nucleotide sequence databases are now given a unique, stable name of the accession number type (e.g. H ymenoscyphus pseudoalbidus | GU 586904| SH 133781.05 FU ), and their taxonomic and ecological annotations were corrected as far as possible through a distributed, third‐party annotation effort. We introduce the term ‘species hypothesis’ ( SH ) for the taxa discovered in clustering on different similarity thresholds (97–99%). An automatically or manually designated sequence is chosen to represent each such SH. These reference sequences are released (http://unite.ut.ee/repository.php) for use by the scientific community in, for example, local sequence similarity searches and in the QIIME pipeline. The system and the data will be updated automatically as the number of public fungal ITS sequences grows. We invite everybody in the position to improve the annotation or metadata associated with their particular fungal lineages of expertise to do so through the new Web‐based sequence management system in UNITE .
Massively parallel short‐read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the S tacks software package to efficiently use genotype‐by‐sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP ‐by‐ SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in S tacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Within uncharacterized groups, DNA barcodes, short DNA sequences that are present in a wide range of species, can be used to assign organisms into species. We propose an automatic procedure that sorts the sequences into hypothetical species based on the barcode gap, which can be observed whenever the divergence among organisms belonging to the same species is smaller than divergence among organisms from different species. We use a range of prior intraspecific divergence to infer from the data a model‐based one‐sided confidence limit for intraspecific divergence. The method, called Automatic Barcode Gap Discovery (ABGD), then detects the barcode gap as the first significant gap beyond this limit and uses it to partition the data. Inference of the limit and gap detection are then recursively applied to previously obtained groups to get finer partitions until there is no further partitioning. Using six published data sets of metazoans, we show that ABGD is computationally efficient and performs well for standard prior maximum intraspecific divergences (a few per cent of divergence for the five data sets), except for one data set where less than three sequences per species were sampled. We further explore the theoretical limitations of ABGD through simulation of explicit speciation and population genetics scenarios. Our results emphasize in particular the sensitivity of the method to the presence of recent speciation events, via (unrealistically) high rates of speciation or large numbers of species. In conclusion, ABGD is fast, simple method to split a sequence alignment data set into candidate species that should be complemented with other evidence in an integrative taxonomic approach.
The metaphor of ‘genomic islands of speciation’ was first used to describe heterogeneous differentiation among loci between the genomes of closely related species. The biological model proposed to explain these differences was that the regions showing high levels of differentiation were resistant to gene flow between species, while the remainder of the genome was being homogenized by gene flow and consequently showed lower levels of differentiation. However, the conditions under which such differentiation can occur at multiple unlinked loci are restrictive; additionally, essentially, all previous analyses have been carried out using relative measures of divergence, which can be misleading when regions with different levels of recombination are compared. Here, we test the model of differential gene flow by asking whether absolute divergence is also higher in the previously identified ‘islands’. Using five species pairs for which full sequence data are available, we find that absolute measures of divergence are not higher in genomic islands. Instead, in all cases examined, we find reduced diversity in these regions, a consequence of which is that relative measures of divergence are abnormally high. These data therefore do not support a model of differential gene flow among loci, although islands of relative divergence may represent loci involved in local adaptation. Simulations using the program IMa2 further suggest that inferences of any gene flow may be incorrect in many comparisons. We instead present an alternative explanation for heterogeneous patterns of differentiation, one in which postspeciation selection generates patterns consistent with multiple aspects of the data.
Species delimitation is the act of identifying species‐level biological diversity. In recent years, the field has witnessed a dramatic increase in the number of methods available for delimiting species. However, most recent investigations only utilize a handful (i.e. 2–3) of the available methods, often for unstated reasons. Because the parameter space that is potentially relevant to species delimitation far exceeds the parameterization of any existing method, a given method necessarily makes a number of simplifying assumptions, any one of which could be violated in a particular system. We suggest that researchers should apply a wide range of species delimitation analyses to their data and place their trust in delimitations that are congruent across methods. Incongruence across the results from different methods is evidence of either a difference in the power to detect cryptic lineages across one or more of the approaches used to delimit species and could indicate that assumptions of one or more of the methods have been violated. In either case, the inferences drawn from species delimitation studies should be conservative, for in most contexts it is better to fail to delimit species than it is to falsely delimit entities that do not represent actual evolutionary lineages.
Combining nuclear (nuDNA) and mitochondrial DNA (mtDNA) markers has improved the power of molecular data to test phylogenetic and phylogeographic hypotheses and has highlighted the limitations of studies using only mtDNA markers. In fact, in the past decade, many conflicting geographic patterns between mitochondrial and nuclear genetic markers have been identified (i.e. mito‐nuclear discordance). Our goals in this synthesis are to: (i) review known cases of mito‐nuclear discordance in animal systems, (ii) to summarize the biogeographic patterns in each instance and (iii) to identify common drivers of discordance in various groups. In total, we identified 126 cases in animal systems with strong evidence of discordance between the biogeographic patterns obtained from mitochondrial DNA and those observed in the nuclear genome. In most cases, these patterns are attributed to adaptive introgression of mtDNA, demographic disparities and sex‐biased asymmetries, with some studies also implicating hybrid zone movement, human introductions and Wolbachia infection in insects. We also discuss situations where divergent mtDNA clades seem to have arisen in the absence of geographic isolation. For those cases where foreign mtDNA haplotypes are found deep within the range of a second taxon, data suggest that those mtDNA haplotypes are more likely to be at a high frequency and are commonly driven by sex‐biased asymmetries and/or adaptive introgression. In addition, we discuss the problems with inferring the processes causing discordance from biogeographic patterns that are common in many studies. In many cases, authors presented more than one explanation for discordant patterns in a given system, which indicates that likely more data are required. Ideally, to resolve this issue, we see important future work shifting focus from documenting the prevalence of mito‐nuclear discordance towards testing hypotheses regarding the drivers of discordance. Indeed, there is great potential for certain cases of mitochondrial introgression to become important natural systems within which to test the effect of different mitochondrial genotypes on whole‐animal phenotypes.
Global biodiversity in freshwater and the oceans is declining at high rates. Reliable tools for assessing and monitoring aquatic biodiversity, especially for rare and secretive species, are important for efficient and timely management. Recent advances in DNA sequencing have provided a new tool for species detection from DNA present in the environment. In this study, we tested whether an environmental DNA ( eDNA ) metabarcoding approach, using water samples, can be used for addressing significant questions in ecology and conservation. Two key aquatic vertebrate groups were targeted: amphibians and bony fish. The reliability of this method was cautiously validated in silico, in vitro and in situ. When compared with traditional surveys or historical data, eDNA metabarcoding showed a much better detection probability overall. For amphibians, the detection probability with eDNA metabarcoding was 0.97 ( CI = 0.90–0.99) vs. 0.58 ( CI = 0.50–0.63) for traditional surveys. For fish, in 89% of the studied sites, the number of taxa detected using the eDNA metabarcoding approach was higher or identical to the number detected using traditional methods. We argue that the proposed DNA ‐based approach has the potential to become the next‐generation tool for ecological studies and standardized biodiversity monitoring in a wide range of aquatic ecosystems. see also the Perspective by Hoffmann, Schubert and Calvignac‐Spencer
Virtually all empirical ecological studies require species identification during data collection. DNA metabarcoding refers to the automated identification of multiple species from a single bulk sample containing entire organisms or from a single environmental sample containing degraded DNA (soil, water, faeces, etc.). It can be implemented for both modern and ancient environmental samples. The availability of next‐generation sequencing platforms and the ecologists’ need for high‐throughput taxon identification have facilitated the emergence of DNA metabarcoding. The potential power of DNA metabarcoding as it is implemented today is limited mainly by its dependency on PCR and by the considerable investment needed to build comprehensive taxonomic reference libraries. Further developments associated with the impressive progress in DNA sequencing will eliminate the currently required DNA amplification step, and comprehensive taxonomic reference libraries composed of whole organellar genomes and repetitive ribosomal nuclear DNA can be built based on the well‐curated DNA extract collections maintained by standardized barcoding initiatives. The near‐term future of DNA metabarcoding has an enormous potential to boost data acquisition in biodiversity research.
The analysis of food webs and their dynamics facilitates understanding of the mechanistic processes behind community ecology and ecosystem functions. Having accurate techniques for determining dietary ranges and components is critical for this endeavour. While visual analyses and early molecular approaches are highly labour intensive and often lack resolution, recent DNA‐based approaches potentially provide more accurate methods for dietary studies. A suite of approaches have been used based on the identification of consumed species by characterization of DNA present in gut or faecal samples. In one approach, a standardized DNA region (DNA barcode) is PCR amplified, amplicons are sequenced and then compared to a reference database for identification. Initially, this involved sequencing clones from PCR products, and studies were limited in scale because of the costs and effort required. The recent development of next generation sequencing (NGS) has made this approach much more powerful, by allowing the direct characterization of dozens of samples with several thousand sequences per PCR product, and has the potential to reveal many consumed species simultaneously (DNA metabarcoding). Continual improvement of NGS technologies, on‐going decreases in costs and current massive expansion of reference databases make this approach promising. Here we review the power and pitfalls of NGS diet methods. We present the critical factors to take into account when choosing or designing a suitable barcode. Then, we consider both technical and analytical aspects of NGS diet studies. Finally, we discuss the validation of data accuracy including the viability of producing quantitative data.
Freshwater ecosystems are among the most endangered habitats on Earth, with thousands of animal species known to be threatened or already extinct. Reliable monitoring of threatened organisms is crucial for data‐driven conservation actions but remains a challenge owing to nonstandardized methods that depend on practical and taxonomic expertise, which is rapidly declining. Here, we show that a diversity of rare and threatened freshwater animals—representing amphibians, fish, mammals, insects and crustaceans—can be detected and quantified based on DNA obtained directly from small water samples of lakes, ponds and streams. We successfully validate our findings in a controlled mesocosm experiment and show that DNA becomes undetectable within 2 weeks after removal of animals, indicating that DNA traces are near contemporary with presence of the species. We further demonstrate that entire faunas of amphibians and fish can be detected by high‐throughput sequencing of DNA extracted from pond water. Our findings underpin the ubiquitous nature of DNA traces in the environment and establish environmental DNA as a tool for monitoring rare and threatened species across a wide range of taxonomic groups.
F-ST outlier tests are a potentially powerful way to detect genetic loci under spatially divergent selection. Unfortunately, the extent to which these tests are robust to nonequilibrium demographic histories has been understudied. We developed a landscape genetics simulator to test the effects of isolation by distance (IBD) and range expansion on F-ST outlier methods. We evaluated the two most commonly used methods for the identification of F-ST outliers (FDIST2 and BayeScan, which assume samples are evolutionarily independent) and two recent methods (FLK and Bayenv2, which estimate and account for evolutionary nonindependence). Parameterization with a set of neutral loci (neutral parameterization') always improved the performance of FLK and Bayenv2, while neutral parameterization caused FDIST2 to actually perform worse in the cases of IBD or range expansion. BayeScan was improved when the prior odds on neutrality was increased, regardless of the true odds in the data. On their best performance, however, the widely used methods had high false-positive rates for IBD and range expansion and were outperformed by methods that accounted for evolutionary nonindependence. In addition, default settings in FDIST2 and BayeScan resulted in many false positives suggesting balancing selection. However, all methods did very well if a large set of neutral loci is available to create empirical P-values. We conclude that in species that exhibit IBD or have undergone range expansion, many of the published F-ST outliers based on FDIST2 and BayeScan are probably false positives, but FLK and Bayenv2 show great promise for accurately identifying loci under spatially divergent selection.
The interactions between organisms and their environments can shape distributions of spatial genetic variation, resulting in patterns of isolation by environment ( IBE ) in which genetic and environmental distances are positively correlated, independent of geographic distance. IBE represents one of the most important patterns that results from the ways in which landscape heterogeneity influences gene flow and population connectivity, but it has only recently been examined in studies of ecological and landscape genetics. Nevertheless, the study of IBE presents valuable opportunities to investigate how spatial heterogeneity in ecological processes, agents of selection and environmental variables contributes to genetic divergence in nature. New and increasingly sophisticated studies of IBE in natural systems are poised to make significant contributions to our understanding of the role of ecology in genetic divergence and of modes of differentiation both within and between species. Here, we describe the underlying ecological processes that can generate patterns of IBE , examine its implications for a wide variety of disciplines and outline several areas of future research that can answer pressing questions about the ecological basis of genetic diversity.
F ST outlier tests are a potentially powerful way to detect genetic loci under spatially divergent selection. Unfortunately, the extent to which these tests are robust to nonequilibrium demographic histories has been understudied. We developed a landscape genetics simulator to test the effects of isolation by distance ( IBD ) and range expansion on F ST outlier methods. We evaluated the two most commonly used methods for the identification of F ST outliers ( FDIST 2 and B aye S can, which assume samples are evolutionarily independent) and two recent methods ( FLK and B ayenv2, which estimate and account for evolutionary nonindependence). Parameterization with a set of neutral loci (‘neutral parameterization’) always improved the performance of FLK and B ayenv2, while neutral parameterization caused FDIST 2 to actually perform worse in the cases of IBD or range expansion. BayeScan was improved when the prior odds on neutrality was increased, regardless of the true odds in the data. On their best performance, however, the widely used methods had high false‐positive rates for IBD and range expansion and were outperformed by methods that accounted for evolutionary nonindependence. In addition, default settings in FDIST 2 and B aye S can resulted in many false positives suggesting balancing selection. However, all methods did very well if a large set of neutral loci is available to create empirical P ‐values. We conclude that in species that exhibit IBD or have undergone range expansion, many of the published F ST outliers based on FDIST 2 and B aye S can are probably false positives, but FLK and B ayenv2 show great promise for accurately identifying loci under spatially divergent selection.
Many species have fragmented distribution with small isolated populations suffering inbreeding depression and/or reduced ability to evolve. Without gene flow from another population within the species (genetic rescue), these populations are likely to be extirpated. However, there have been only ~ 20 published cases of such outcrossing for conservation purposes, probably a very low proportion of populations that would potentially benefit. As one impediment to genetic rescues is the lack of an overview of the magnitude and consistency of genetic rescue effects in wild species, I carried out a meta‐analysis. Outcrossing of inbred populations resulted in beneficial effects in 92.9% of 156 cases screened as having a low risk of outbreeding depression. The median increase in composite fitness (combined fecundity and survival) following outcrossing was 148% in stressful environments and 45% in benign ones. Fitness benefits also increased significantly with maternal Δ F (reduction in inbreeding coefficient due to gene flow) and for naturally outbreeding versus inbreeding species. However, benefits did not differ significantly among invertebrates, vertebrates and plants. Evolutionary potential for fitness characters in inbred populations also benefited from gene flow. There are no scientific impediments to the widespread use of outcrossing to genetically rescue inbred populations of naturally outbreeding species, provided potential crosses have a low risk of outbreeding depression. I provide revised guidelines for the management of genetic rescue attempts. See also the Perspective by Waller
Since 2005, advances in next‐generation sequencing technologies have revolutionized biological science. The analysis of environmental DNA through the use of specific gene markers such as species‐specific DNA barcodes has been a key application of next‐generation sequencing technologies in ecological and environmental research. Access to parallel, massive amounts of sequencing data, as well as subsequent improvements in read length and throughput of different sequencing platforms, is leading to a better representation of sample diversity at a reasonable cost. New technologies are being developed rapidly and have the potential to dramatically accelerate ecological and environmental research. The fast pace of development and improvements in next‐generation sequencing technologies can reflect on broader and more robust applications in environmental DNA research. Here, we review the advantages and limitations of current next‐generation sequencing technologies in regard to their application for environmental DNA analysis.
G(ST) and its relatives are often interpreted as measures of differentiation between subpopulations, with values near zero supposedly indicating low differentiation. However, G(ST) necessarily approaches zero when gene diversity is high, even if subpopulations are completely differentiated, and it is not monotonic with increasing differentiation. Likewise, when diversity is equated with heterozygosity, standard similarity measures formed by taking the ratio of mean within-subpopulation diversity to total diversity necessarily approach unity when diversity is high, even if the subpopulations are completely dissimilar (no shared alleles). None of these measures can be interpreted as measures of differentiation or similarity. The derivations of these measures contain two subtle misconceptions which cause their paradoxical behaviours. Conclusions about population differentiation, gene flow, relatedness, and conservation priority will often be wrong when based on these fixation indices or similarity measures. These are not statistical issues; the problems persist even when true population frequencies are used in the calculations. Recent advances in the mathematics of diversity identify the misconceptions, and yield mathematically consistent descriptive measures of population structure which eliminate the paradoxes produced by standard measures. These measures can be directly related to the migration and mutation rates of the finite-island model.
G ST and its relatives are often interpreted as measures of differentiation between subpopulations, with values near zero supposedly indicating low differentiation. However, G ST necessarily approaches zero when gene diversity is high, even if subpopulations are completely differentiated, and it is not monotonic with increasing differentiation. Likewise, when diversity is equated with heterozygosity, standard similarity measures formed by taking the ratio of mean within‐subpopulation diversity to total diversity necessarily approach unity when diversity is high, even if the subpopulations are completely dissimilar (no shared alleles). None of these measures can be interpreted as measures of differentiation or similarity. The derivations of these measures contain two subtle misconceptions which cause their paradoxical behaviours. Conclusions about population differentiation, gene flow, relatedness, and conservation priority will often be wrong when based on these fixation indices or similarity measures. These are not statistical issues; the problems persist even when true population frequencies are used in the calculations. Recent advances in the mathematics of diversity identify the misconceptions, and yield mathematically consistent descriptive measures of population structure which eliminate the paradoxes produced by standard measures. These measures can be directly related to the migration and mutation rates of the finite‐island model.
Adaptive genetic variation has been thought to originate primarily from either new mutation or standing variation. Another potential source of adaptive variation is adaptive variants from other (donor) species that are introgressed into the (recipient) species, termed adaptive introgression. Here, the various attributes of these three potential sources of adaptive variation are compared. For example, the rate of adaptive change is generally thought to be faster from standing variation, slower from mutation and potentially intermediate from adaptive introgression. Additionally, the higher initial frequency of adaptive variation from standing variation and lower initial frequency from mutation might result in a higher probability of fixation of the adaptive variants for standing variation. Adaptive variation from introgression might have higher initial frequency than new adaptive mutations but lower than that from standing variation, again making the impact of adaptive introgression variation potentially intermediate. Adaptive introgressive variants might have multiple changes within a gene and affect multiple loci, an advantage also potentially found for adaptive standing variation but not for new adaptive mutants. The processes that might produce a common variant in two taxa, convergence, trans‐species polymorphism from incomplete lineage sorting or from balancing selection and adaptive introgression, are also compared. Finally, potential examples of adaptive introgression in animals, including balancing selection for multiple alleles for major histocompatibility complex ( MHC ), S and csd genes, pesticide resistance in mice, black colour in wolves and white colour in coyotes, N eanderthal or D enisovan ancestry in humans, mimicry genes in H eliconius butterflies, beak traits in D arwin's finches, yellow skin in chickens and non‐native ancestry in an endangered native salamander, are examined.