Genotyping errors occur when the genotype determined after molecular analysis does not correspond to the real genotype of the individual under consideration. Virtually every genetic data set includes some erroneous genotypes, but genotyping errors remain a taboo subject in population genetics, even though they might greatly bias the final conclusions, especially for studies based on individual identification. Here, we consider four case studies representing a large variety of population genetics investigations differing in their sampling strategies (noninvasive or traditional), in the type of organism studied (plant or animal) and the molecular markers used [microsatellites or amplified fragment length polymorphisms (AFLPs)]. In these data sets, the estimated genotyping error rate ranges from 0.8% for microsatellite loci from bear tissues to 2.6% for AFLP loci from dwarf birch leaves. Main sources of errors were allelic dropouts for microsatellites and differences in peak intensities for AFLPs, but in both cases human factors were non‐negligible error generators. Therefore, tracking genotyping errors and identifying their causes are necessary to clean up the data sets and validate the final results according to the precision required. In addition, we propose the outline of a protocol designed to limit and quantify genotyping errors at each step of the genotyping process. In particular, we recommend (i) several efficient precautions to prevent contaminations and technical artefacts; (ii) systematic use of blind samples and automation; (iii) experience and rigor for laboratory work and scoring; and (iv) systematic reporting of the error rate in population genetics studies.
Mitochondrial DNA (mtDNA) has been used to study molecular ecology and phylogeography for 25 years. Much important information has been gained in this way, but it is time to reflect on the biology of the mitochondrion itself and consider opportunities for evolutionary studies of the organelle itself and its ecology, biochemistry and physiology. This review has four sections. First, we review aspects of the natural history of mitochondria and their DNA to show that it is a unique molecule with specific characteristics that differ from nuclear DNA. We do not attempt to cover the plethora of differences between mitochondrial and nuclear DNA; rather we spotlight differences that can cause significant bias when inferring demographic properties of populations and/or the evolutionary history of species. We focus on recombination, effective population size and mutation rate. Second, we explore some of the difficulties in interpreting phylogeographical data from mtDNA data alone and suggest a broader use of multiple nuclear markers. We argue that mtDNA is not a sufficient marker for phylogeographical studies if the focus of the investigation is the species and not the organelle. We focus on the potential bias caused by introgression. Third, we show that it is not safe to assume a priori that mtDNA evolves as a strictly neutral marker because both direct and indirect selection influence mitochondria. We outline some of the statistical tests of neutrality that can, and should, be applied to mtDNA sequence data prior to making any global statements concerning the history of the organism. We conclude with a critical examination of the neglected biology of mitochondria and point out several surprising gaps in the state of our knowledge about this important organelle. Here we limelight mitochondrial ecology, sexually antagonistic selection, life‐history evolution including ageing and disease, and the evolution of mitochondrial inheritance.
A compilation was made of 307 studies using nuclear DNA markers for evaluating among‐ and within‐population diversity in wild angiosperms and gymnosperms. Estimates derived by the dominantly inherited markers (RAPD, AFLP, ISSR) are very similar and may be directly comparable. STMS analysis yields almost three times higher values for within‐population diversity whereas among‐population diversity estimates are similar to those derived by the dominantly inherited markers. Number of sampled plants per population and number of scored microsatellite DNA alleles are correlated with some of the population genetics parameters. In addition, maximum geographical distance between sampled populations has a strong positive effect on among‐population diversity. As previously verified with allozyme data, RAPD‐ and STMS‐based analyses show that long‐lived, outcrossing, late successional taxa retain most of their genetic variability within populations. By contrast, annual, selfing and/or early successional taxa allocate most of the genetic variability among populations. Estimates for among‐ and within‐population diversity, respectively, were negatively correlated. The only major discrepancy between allozymes and STMS on the one hand, and RAPD on the other hand, concerns geographical range; within‐population diversity was strongly affected when the former methods were used but not so in the RAPD‐based studies. Direct comparisons between the different methods, when applied to the same plant material, indicate large similarities between the dominant markers and somewhat lower similarity with the STMS‐based data, presumably due to insufficient number of analysed microsatellite DNA loci in many studies.
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real-time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F-0 immigrants in populations with ongoing gene flow, and hence for providing direct, real-time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F-0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios (D-LR) appeared to be an effective way to predict whether F-0 immigrants could be identified for a particular pair of populations using a given set of markers.
Genetic assignment methods use genotype likelihoods to draw inference about where individuals were or were not born, potentially allowing direct, real‐time estimates of dispersal. We used simulated data sets to test the power and accuracy of Monte Carlo resampling methods in generating statistical thresholds for identifying F 0 immigrants in populations with ongoing gene flow, and hence for providing direct, real‐time estimates of migration rates. The identification of accurate critical values required that resampling methods preserved the linkage disequilibrium deriving from recent generations of immigrants and reflected the sampling variance present in the data set being analysed. A novel Monte Carlo resampling method taking into account these aspects was proposed and its efficiency was evaluated. Power and error were relatively insensitive to the frequency assumed for missing alleles. Power to identify F 0 immigrants was improved by using large sample size (up to about 50 individuals) and by sampling all populations from which migrants may have originated. A combination of plotting genotype likelihoods and calculating mean genotype likelihood ratios ( D LR ) appeared to be an effective way to predict whether F 0 immigrants could be identified for a particular pair of populations using a given set of markers.
Many empirical studies have assessed fine‐scale spatial genetic structure (SGS), i.e. the nonrandom spatial distribution of genotypes, within plant populations using genetic markers and spatial autocorrelation techniques. These studies mostly provided qualitative descriptions of SGS, rendering quantitative comparisons among studies difficult. The theory of isolation by distance can predict the pattern of SGS under limited gene dispersal, suggesting new approaches, based on the relationship between pairwise relatedness coefficients and the spatial distance between individuals, to quantify SGS and infer gene dispersal parameters. Here we review the theory underlying such methods and discuss issues about their application to plant populations, such as the choice of the relatedness statistics, the sampling scheme to adopt, the procedure to test SGS, and the interpretation of spatial autocorrelograms. We propose to quantify SGS by an ‘ Sp ’ statistic primarily dependent upon the rate of decrease of pairwise kinship coefficients between individuals with the logarithm of the distance in two dimensions. Under certain conditions, this statistic estimates the reciprocal of the neighbourhood size. Reanalysing data from, mostly, published studies, the Sp statistic was assessed for 47 plant species. It was found to be significantly related to the mating system (higher in selfing species) and to the life form (higher in herbs than trees), as well as to the population density (higher under low density). We discuss the necessity for comparing SGS with direct estimates of gene dispersal distances, and show how the approach presented can be extended to assess (i) the level of biparental inbreeding, and (ii) the kurtosis of the gene dispersal distribution.
The identification of signatures of natural selection in genomic surveys has become an area of intense research, stimulated by the increasing ease with which genetic markers can be typed. Loci identified as subject to selection may be functionally important, and hence (weak) candidates for involvement in disease causation. They can also be useful in determining the adaptive differentiation of populations, and exploring hypotheses about speciation. Adaptive differentiation has traditionally been identified from differences in allele frequencies among different populations, summarised by an estimate of F ST . Low outliers relative to an appropriate neutral population‐genetics model indicate loci subject to balancing selection, whereas high outliers suggest adaptive (directional) selection. However, the problem of identifying statistically significant departures from neutrality is complicated by confounding effects on the distribution of F ST estimates, and current methods have not yet been tested in large‐scale simulation experiments. Here, we simulate data from a structured population at many unlinked, diallelic loci that are predominantly neutral but with some loci subject to adaptive or balancing selection. We develop a hierarchical‐Bayesian method, implemented via Markov chain Monte Carlo (MCMC), and assess its performance in distinguishing the loci simulated under selection from the neutral loci. We also compare this performance with that of a frequentist method, based on moment‐based estimates of F ST . We find that both methods can identify loci subject to adaptive selection when the selection coefficient is at least five times the migration rate. Neither method could reliably distinguish loci under balancing selection in our simulations, even when the selection coefficient is twenty times the migration rate.
Nested clade phylogeographical analysis (NCPA) has become a common tool in intraspecific phylogeography. To evaluate the validity of its inferences, NCPA was applied to actual data sets with 150 strong a priori expectations, the majority of which had not been analysed previously by NCPA. NCPA did well overall, but it sometimes failed to detect an expected event and less commonly resulted in a false positive. An examination of these errors suggested some alterations in the NCPA inference key, and these modifications reduce the incidence of false positives at the cost of a slight reduction in power. Moreover, NCPA does equally well in inferring events regardless of the presence or absence of other, unrelated events. A reanalysis of some recent computer simulations that are seemingly discordant with these results revealed that NCPA performed appropriately in these simulated samples and was not prone to a high rate of false positives under sampling assumptions that typify real data sets. NCPA makes a posteriori use of an explicit inference key for biological interpretation after statistical hypothesis testing. Alternatives to NCPA that claim that biological inference emerges directly from statistical testing are shown in fact to use an a priori inference key, albeit implicitly. It is argued that the a priori and a posteriori approaches to intraspecific phylogeography are complementary, not contradictory. Finally, cross‐validation using multiple DNA regions is shown to be a powerful method of minimizing inference errors. A likelihood ratio hypothesis testing framework has been developed that allows testing of phylogeographical hypotheses, extends NCPA to testing specific hypotheses not within the formal inference key (such as the out‐of‐Africa replacement hypothesis of recent human evolution) and integrates intra‐ and interspecific phylogeographical inference.
It has been long recognized that population demographic expansions lead to distinctive features in the molecular diversity of populations. However, recent simulation results have suggested that a distinction could be made between a pure demographic expansion in an unsubdivided population, and a range expansion in a subdivided population, both leading to a large increase in the total number of the individuals. In order to better characterize the effect of a range expansion, I introduce a simple model of instantaneous expansion under an infinite‐island model, under which I derive the distribution of the number of mutation differences between pairs of genes (the mismatch distribution), the heterozygosity, the average number of pairwise difference, and the fixation index F ST . These derivations are checked against simulations, and are shown to lead to results qualitatively similar to those one would obtain after a range expansion in a 2‐dimensional stepping‐stone model. I then apply these results to estimate immigration rates in hunter‐gather and post‐Neolithic human populations from patterns of mitochondrial (mtDNA) diversity. Potential problems with this estimation procedure are also discussed.
Many recent studies report that individual heterozygosity at a handful of apparently neutral microsatellite markers is correlated with key components of fitness, with most studies invoking inbreeding depression as the likely underlying mechanism. The implicit assumption is that an individual's inbreeding coefficient can be estimated reliably using only 10 or so markers, but the validity of this assumption is unclear. Consequently, we have used individual‐based simulations to examine the conditions under which heterozygosity and inbreeding are likely to be correlated. Our results indicate that the parameter space in which this occurs is surprisingly narrow, requiring that inbreeding events are both frequent and severe, for example, through selfing, strong population structure and/or high levels of polygyny. Even then, the correlations are strong only when large numbers of loci (~200) can be deployed to estimate heterozygosity. With the handful of markers used in most studies, correlations only become likely under the most extreme scenario we looked at, namely 20 demes of 20 individuals coupled with strong polygyny. This finding is supported by the observation that heterozygosity is only weakly correlated among markers within an individual, even in a dataset comprising 400 markers typed in diverse human populations, some of which favour consanguineous marriages. If heterozygosity and inbreeding coefficient are generally uncorrelated, then heterozygosity–fitness correlations probably have little to do with inbreeding depression. Instead, one would need to invoke chance linkage between the markers used and one or more gene(s) experiencing balancing selection. Unfortunately, both explanations sit somewhat uncomfortably with current understanding. If inbreeding is the dominant mechanism, then our simulations indicate that consanguineous mating would have to be vastly more common than is predicted for most realistic populations. Conversely, if heterosis provides the answer, there need to be many more polymorphisms with major fitness effects and higher levels of linkage disequilibrium than are generally assumed.
The degree to which widespread avian blood parasites in the genera Plasmodium and Haemoproteus pose a threat to novel hosts depends in part on the degree to which they are constrained to a particular host or host family. We examined the host distribution and host‐specificity of these parasites in birds from two relatively understudied and isolated locations: Australia and Papua New Guinea. Using polymerase chain reaction (PCR), we detected infection in 69 of 105 species, representing 44% of individuals surveyed ( n = 428). Across host families, prevalence of Haemoproteus ranged from 13% (Acanthizidae) to 56% (Petroicidae) while prevalence of Plasmodium ranged from 3% (Petroicidae) to 47% (Ptilonorhynchidae). We recovered 78 unique mitochondrial lineages from 155 sequences. Related lineages of Haemoproteus were more likely to derive from the same host family than predicted by chance at shallow (average LogDet genetic distance = 0, n = 12, P = 0.001) and greater depths (average distance = 0.014, n = 11, P < 0.001) within the parasite phylogeny. Within two major Haemoproteus subclades identified in a maximum likelihood phylogeny, host‐specificity was evident up to parasite genetic distances of 0.029 and 0.007 based on logistic regression. We found no significant host relationship among lineages of Plasmodium by any method of analysis. These results support previous evidence of strong host‐family specificity in Haemoproteus and suggest that lineages of Plasmodium are more likely to form evolutionarily–stable associations with novel hosts.
The traditional view that species are held together through gene flow has been challenged by observations that migration is too restricted among populations of many species to prevent local divergence. However, only very low levels of gene flow are necessary to permit the spread of highly advantageous alleles, providing an alternative means by which low‐migration species might be held together. We re‐evaluate these arguments given the recent and wide availability of indirect estimates of gene flow. Our literature review of F ST values for a broad range of taxa suggests that gene flow in many taxa is considerably greater than suspected from earlier studies and often is sufficiently high to homogenize even neutral alleles. However, there are numerous species from essentially all organismal groups that lack sufficient gene flow to prevent divergence. Crude estimates on the strength of selection on phenotypic traits and effect sizes of quantitative trait loci (QTL) suggest that selection coefficients for leading QTL underlying phenotypic traits may be high enough to permit their rapid spread across populations. Thus, species may evolve collectively at major loci through the spread of favourable alleles, while simultaneously differentiating at other loci due to drift and local selection.
Changes in agricultural practices and forest fragmentation can have a dramatic effect on landscape connectivity and the dispersal of animals, potentially reducing gene flow within populations. In this study, we assessed the influence of woodland connectivity on gene flow in a traditionally forest‐dwelling species — the European roe deer — in a fragmented landscape. From a sample of 648 roe deer spatially referenced within a study area of 55 × 40 km, interindividual genetic distances were calculated from genotypes at 12 polymorphic microsatellite loci. We calculated two geographical distances between each pair of individuals: the Euclidean distance (straight line) and the ‘least cost distance’ (the trajectory that maximizes the use of wooded corridors). We tested the correlation between genetic pairwise distances and the two types of geographical pairwise distance using Mantel tests. The correlation was better using the least cost distance, which takes into account the distribution of wooded patches, especially for females (the correlation was stronger but not significant for males). These results suggest that in a fragmented woodland area roe deer dispersal is strongly linked to wooded structures and hence that gene flow within the roe deer population is influenced by the connectivity of the landscape.
Microsatellites are powerful molecular markers, used commonly to estimate intraspecific genetic distances. With the exception of band sharing similarity index, available distance measures were developed specifically for diploid organisms and are unsuited for comparisons of polyploids. Here, we present a simple method for calculation of microsatellite genotype distances, which takes into account mutation processes and permits comparison of individuals with different ploidy levels. This method should provide a valuable tool for intraspecific analyses of polyploid organisms, which are widespread among plants and some animal taxa. An illustration is given using data from the planarian flatworm Schmidtea polychroa (Platyhelminthes).
The use of noninvasively collected samples greatly expands the range of ecological issues that may be investigated through population genetics. Furthermore, the difficulty of obtaining reliable genotypes with samples containing low quantities of amplifiable DNA may be overcome by designing optimal genotyping schemes. Such protocols are mainly determined by the rates of genotyping errors caused by false alleles and allelic dropouts. These errors may not be avoided through laboratory procedure and hence must be quantified. However, the definition of genotyping error rates remains elusive and various estimation methods have been reported in the literature. In this paper we proposed accurate codification for the frequencies of false alleles and allelic dropouts. We then reviewed other estimation methods employed in hair‐ or faeces‐based population genetics studies and modelled the bias associated with erroneous methods. It is emphasized that error rates may be substantially underestimated when using an erroneous approach. Genotyping error rates may be important determinants of the outcome of noninvasive studies and hence should be carefully computed and reported.
Genetic analysis using noninvasively collected samples such as faeces continues to pose a formidable challenge because of unpredictable variation in the extent to which usable DNA is obtained. We investigated the influence of multiple variables on the quantity of DNA extracted from faecal samples from wild mountain gorillas and chimpanzees. There was a small negative correlation between temperature at time of collection and the amount of DNA obtained. Storage of samples either in RNAlater solution or dried using silica gel beads produced similar results, but significantly higher amounts of DNA were obtained using a novel protocol that combines a short period of storage in ethanol with subsequent desiccation using silica.
Individual‐based assignment tests are now standard tools in molecular ecology and have several applications, including the study of dispersal. The measurement of natal dispersal is vital to understanding the ecology of many species, yet the accuracy of assignment tests in situations where natal dispersal is common remains untested in the field. We studied a metapopulation of the grand skink, Oligosoma grande , a large territorial lizard from southern New Zealand. Skink populations occur on isolated, regularly spaced rock outcrops and are characterized by frequent interpopulation dispersal. We examined the accuracy of assignment tests at four replicate sites by comparing long‐term mark‐and‐recapture records of natal dispersal with the results of assignment tests based on microsatellite DNA data. Assignment tests correctly identified the natal population of most individuals (65–100%, depending on the method of assignment), even when interpopulation dispersal was common (5–20% dispersers). They also provided similar estimates of the proportions of skinks dispersing to those estimated by the long‐term mark‐and‐recapture data. Fully and partially Bayesian assignment methods were equally accurate but their accuracy depended on the stringency applied, the degree of genetic differentiation between populations, and the number of loci used. In addition, when assignments required high confidence, the method of assignment (fully or partially Bayesian) had a large bearing on the number of individuals that could be assigned. Because assignment tests require significantly less fieldwork than traditional mark‐and‐recapture approaches (in this study 7 years), they will provide useful dispersal data in many applied and theoretical situations.
'Candidatus Cardinium', a recently described bacterium from the Bacteroidetes group, is involved in diverse reproduction alterations of its arthropod hosts, including cytoplasmic incompatibility, parthenogenesis and feminization. To estimate the incidence rate of Cardinium and explore the limits of its host range, 99 insect and mite species were screened, using primers designed to amplify a portion of Cardinium 16S ribosomal DNA (rDNA). These arthropods were also screened for the presence of the better-known reproductive manipulator, Wolbachia. Six per cent of the species screened tested positive for Cardinium, compared with 24% positive for Wolbachia. Of the 85 insects screened, Cardinium was found in four parasitic wasp species and one armoured scale insect. Of the 14 mite species examined, one predatory mite was found to carry the symbiont. A phylogenetic analysis of all known Cardinium 16S rDNA sequences shows that distantly related arthropods can harbour closely related symbionts, a pattern typical of horizontal transmission. However, closely related Cardinium were found to cluster among closely related hosts, suggesting host specialization and horizontal transmission among closely related hosts. Finally, the primers used revealed the presence of a second lineage of Bacteroidetes symbionts, not related to Cardinium, in two insect species. This second symbiont lineage is closely allied with other arthropod symbionts, such as Blattabacterium, the primary symbionts of cockroaches, and male-killing symbionts of ladybird beetles. The combined data suggest the presence of a diverse assemblage of arthropod-associated Bacteroidetes bacteria that are likely to strongly influence their hosts' biology.
‘ Candidatus Cardinium’, a recently described bacterium from the Bacteroidetes group, is involved in diverse reproduction alterations of its arthropod hosts, including cytoplasmic incompatibility, parthenogenesis and feminization. To estimate the incidence rate of Cardinium and explore the limits of its host range, 99 insect and mite species were screened, using primers designed to amplify a portion of Cardinium 16 S ribosomal DNA (rDNA). These arthropods were also screened for the presence of the better‐known reproductive manipulator, Wolbachia . Six per cent of the species screened tested positive for Cardinium , compared with 24% positive for Wolbachia . Of the 85 insects screened, Cardinium was found in four parasitic wasp species and one armoured scale insect. Of the 14 mite species examined, one predatory mite was found to carry the symbiont. A phylogenetic analysis of all known Cardinium 16 S rDNA sequences shows that distantly related arthropods can harbour closely related symbionts, a pattern typical of horizontal transmission. However, closely related Cardinium were found to cluster among closely related hosts, suggesting host specialization and horizontal transmission among closely related hosts. Finally, the primers used revealed the presence of a second lineage of Bacteroidetes symbionts, not related to Cardinium , in two insect species. This second symbiont lineage is closely allied with other arthropod symbionts, such as Blattabacterium , the primary symbionts of cockroaches, and male‐killing symbionts of ladybird beetles. The combined data suggest the presence of a diverse assemblage of arthropod‐associated Bacteroidetes bacteria that are likely to strongly influence their hosts’ biology.
A long‐standing issue in marine biology is identifying spatial scales at which populations of sessile adults are connected by planktonic offspring. We examined the genetic continuity of the acorn barnacle Balanus glandula , an abundant member of rocky intertidal communities of the northeastern Pacific Ocean, and compared these genetic patterns to the nearshore oceanography described by trajectories of surface drifters. Consistent with its broad dispersal potential, barnacle populations are genetically similar at both mitochondrial (cytochrome oxidase I) and nuclear (elongation factor 1‐alpha) loci across broad swaths of the species’ range. In central California, however, there is a striking genetic cline across 475 km of coastline between northern and southern populations. These patterns indicate that gene flow within central California is far more restricted spatially than among other populations. Possible reasons for the steep cline include the slow secondary introgression of historically separated populations, a balance between diversifying selection and dispersal, or some mix of both. Geographic trajectories of oceanic drifters closely parallel geographical patterns of gene flow. Drifters placed to the north (Oregon; ∼44°N) and south (Santa Barbara, California; ∼34° N) of the cline disperse hundreds of kilometres within 40 days, yet over the long‐term their trajectories never overlapped. The lack of communication between waters originating in Oregon and southern California probably helps to maintain strong genetic differentiation between these regions. More broadly, the geographical variation in gene flow implies that focusing on species‐level averages of gene flow can mask biologically important variance within species which reflects local environmental conditions and historical events.