The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD‐WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD‐WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen .
The purpose of the dbNSFP is to provide a one‐stop resource for functional predictions and annotations for human nonsynonymous single‐nucleotide variants (nsSNVs) and splice‐site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome‐sequencing study. A list of all potential nsSNVs and ssSNVs based on the human reference sequence were created and functional predictions and annotations were curated and compiled for each SNV. Here, we report a recent major update of the database to version 3.0. The SNV list has been rebuilt based on GENCODE 22 and currently the database includes 82,832,027 nsSNVs and ssSNVs. An attached database dbscSNV, which compiled all potential human SNVs within splicing consensus regions and their deleteriousness predictions, add another 15,030,459 potentially functional SNVs. Eleven prediction scores (MetaSVM, MetaLR, CADD, VEST3, PROVEAN, 4× fitCons, fathmm‐MKL, and DANN) and allele frequencies from the UK10K cohorts and the Exome Aggregation Consortium (ExAC), among others, have been added. The original seven prediction scores in v2.0 (SIFT, 2× Polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM) as well as many SNV and gene functional annotations have been updated. dbNSFP v3.0 is freely available at http://sites.google.com/site/jpopgen/dbNSFP . The purpose of the dbNSFP is to provide a one‐stop resource for functional predictions and annotations for human non‐synonymous single‐nucleotide variants (nsSNVs) and splice site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome‐sequencing study. Here we report a recent major update of the database to version 3.0 and some preliminary analyses comparing the 24 functional prediction scores and conservation scores in dbNSFP v3.0.
The rate at which nonsynonymous single nucleotide polymorphisms (ns SNP s) are being identified in the human genome is increasing dramatically owing to advances in whole‐genome/whole‐exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral ns SNP s are therefore assuming ever‐increasing importance. Here, we describe the Functional Analysis Through Hidden M arkov Models ( FATHMM ) software and server: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT , P oly P hen, and PANTHER ) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state‐of‐the‐art prediction methods (i.e., SNP s& GO and M ut P red). We demonstrate that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated ns SNP s in wheat ( Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web‐based implementation of FATHMM , including a high‐throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.
TP53 gene mutations are one of the most frequent somatic events in cancer. The IARC TP53 Database ( http://p53.iarc.fr ) is a popular resource that compiles occurrence and phenotype data on TP53 germline and somatic variations linked to human cancer. The deluge of data coming from cancer genomic studies generates new data on TP53 variations and attracts a growing number of database users for the interpretation of TP53 variants. Here, we present the current contents and functionalities of the IARC TP53 Database and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. This analysis showed that IARC has more TP53 somatic mutation data than genomic repositories (29,000 vs. 4,000). However, the more complete screening achieved by genomic studies highlighted some overlooked facts about TP53 mutations, such as the presence of a significant number of mutations occurring outside the DNA‐binding domain in specific cancer types. We also provide an update on TP53 inherited variants including the ones that should be considered as neutral frequent variations. We thus provide an update of current knowledge on TP53 variations in human cancer as well as inform users on the efficient use of the IARC TP53 Database. Here we present an update on the IARC TP53 Database contents and features (p53.iarc.fr), and perform a systematic analysis of TP53 somatic mutation data extracted from this database and from genomic data repositories. The IARC database has more TP53 somatic mutation data than genomic repositories (29,000 versus 4,000). Genome‐wide studies confirmed that TP53 is the most frequently mutated cancer genes and revealed the presence of a significant number of mutations outside the DNA‐binding domain in specific cancer types.
Here, we describe an overview and update on GeneMatcher ( http://www.genematcher.org ), a freely accessible Web‐based tool developed as part of the Baylor‐Hopkins Center for Mendelian Genomics. We created GeneMatcher with the goal of identifying additional individuals with rare phenotypes who had variants in the same candidate disease gene. We also wanted to facilitate connections to basic scientists working on orthologous genes in model systems with the goal of connecting their work to human Mendelian phenotypes. Meeting these goals will enhance the identification of novel Mendelian genes. Launched in September, 2013, GeneMatcher now has 2,178 candidate genes from 486 submitters spread across 38 countries entered in the database (June 1, 2015). GeneMatcher is also part of the Matchmaker Exchange ( http://matchmakerexchange.org/ ) with an Application Programing Interface enabling submitters to query other databases of genetic variants and phenotypes without having to create accounts and data entries in multiple systems.
Oncotator is a tool for annotating genomic point mutations and short nucleotide insertions/deletions (indels) with variant‐ and gene‐centric information relevant to cancer researchers. This information is drawn from 14 different publicly available resources that have been pooled and indexed, and we provide an extensible framework to add additional data sources. Annotations linked to variants range from basic information, such as gene names and functional classification (e.g. missense), to cancer‐specific data from resources such as the Catalogue of Somatic Mutations in Cancer (COSMIC), the Cancer Gene Census, and The Cancer Genome Atlas (TCGA). For local use, Oncotator is freely available as a python module hosted on Github ( https://github.com/broadinstitute/oncotator ). Furthermore, Oncotator is also available as a web service and web application at http://www.broadinstitute.org/oncotator/ .
There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.
Human mitochondrial DNA is widely used as tool in many fields including evolutionary anthropology and population history, medical genetics, genetic genealogy, and forensic science. Many applications require detailed knowledge about the phylogenetic relationship of mtDNA variants. Although the phylogenetic resolution of global human mtDNA diversity has greatly improved as a result of increasing sequencing efforts of complete mtDNA genomes, an updated overall mtDNA tree is currently not available. In order to facilitate a better use of known mtDNA variation, we have constructed an updated comprehensive phylogeny of global human mtDNA variation, based on both coding- and control region mutations. This complete mtDNA tree includes previously published as well as newly identified haplogroups, is easily navigable, will be continuously and regularly updated in the future, and is online available at http://www.phylotree.org. (C) 2008 Wiley-Liss, Inc.
Analyzing the type and frequency of patient‐specific mutations that give rise to Duchenne muscular dystrophy (DMD) is an invaluable tool for diagnostics, basic scientific research, trial planning, and improved clinical care. Locus‐specific databases allow for the collection, organization, storage, and analysis of genetic variants of disease. Here, we describe the development and analysis of the TREAT‐NMD DMD Global database ( http://umd.be/TREAT_DMD/ ). We analyzed genetic data for 7,149 DMD mutations held within the database. A total of 5,682 large mutations were observed (80% of total mutations), of which 4,894 (86%) were deletions (1 exon or larger) and 784 (14%) were duplications (1 exon or larger). There were 1,445 small mutations (smaller than 1 exon, 20% of all mutations), of which 358 (25%) were small deletions and 132 (9%) small insertions and 199 (14%) affected the splice sites. Point mutations totalled 756 (52% of small mutations) with 726 (50%) nonsense mutations and 30 (2%) missense mutations. Finally, 22 (0.3%) mid‐intronic mutations were observed. In addition, mutations were identified within the database that would potentially benefit from novel genetic therapies for DMD including stop codon read‐through therapies (10% of total mutations) and exon skipping therapy (80% of deletions and 55% of total mutations). Analysing the type and frequency of patient specific mutations giving rise to Duchenne Muscular Dystrophy (DMD) is an invaluable tool for diagnostics, basic scientific research, trial planning and improved clinical care. We describe the development and analysis of the TREAT‐NMD DMD Global database ( http://umd.be/TREAT_DMD/ ). We analysed and reported genetic data for 7149 DMD mutations held within the database. Additionally, we identified mutations within the database that would potentially benefit from novel genetic therapies for DMD including stop codon read‐through therapies.
The advent of massive parallel sequencing is rapidly changing the strategies employed for the genetic diagnosis and research of rare diseases that involve a large number of genes. So far it is not clear whether these approaches perform significantly better than conventional single gene testing as requested by clinicians. The current yield of this traditional diagnostic approach depends on a complex of factors that include gene-specific phenotype traits, and the relative frequency of the involvement of specific genes. To gauge the impact of the paradigm shift that is occurring in molecular diagnostics, we assessed traditional Sanger-based sequencing (in 2011) and exome sequencing followed by targeted bioinformatics analysis (in 2012) for five different conditions that are highly heterogeneous, and for which our center provides molecular diagnosis. We find that exome sequencing has a much higher diagnostic yield than Sanger sequencing for deafness, blindness, mitochondrial disease, and movement disorders. For microsatellite-stable colorectal cancer, this was low under both strategies. Even if all genes that could have been ordered by physicians had been tested, the larger number of genes captured by the exome would still have led to a clearly superior diagnostic yield at a fraction of the cost.
The tumor suppressor gene TP53 is frequently mutated in human cancers. More than 75% of all mutations are missense substitutions that have been extensively analyzed in various yeast and human cell assays. The International Agency for Research on Cancer (IARC) TP53 database () compiles all genetic variations that have been reported in TP53 . Here, we present recent database developments that include new annotations on the functional properties of mutant proteins, and we perform a systematic analysis of the database to determine the functional properties that contribute to the occurrence of mutational “hotspots” in different cancer types and to the phenotype of tumors. This analysis showed that loss of transactivation capacity is a key factor for the selection of missense mutations, and that difference in mutation frequencies is closely related to nucleotide substitution rates along TP53 coding sequence. An interesting new finding is that in patients with an inherited missense mutation, the age at onset of tumors was related to the functional severity of the mutation, mutations with total loss of transactivation activity being associated with earlier cancer onset compared to mutations that retain partial transactivation capacity. Furthermore, 80% of the most common mutants show a capacity to exert dominant‐negative effect (DNE) over wild‐type p53, compared to only 45% of the less frequent mutants studied, suggesting that DNE may play a role in shaping mutation patterns. These results provide new insights into the factors that shape mutation patterns and influence mutation phenotype, which may have clinical interest. Hum Mutat 28(6), 622–629, 2007. Published 2007 Wiley‐Liss, Inc.
MicroRNAs (miRNAs) are studied as key regulators of gene expression involved in different diseases. Several single nucleotide polymorphisms (SNPs) in miRNA genes or target sites (miRNA-related SNPs) have been proved to be associated with human diseases by affecting the miRNA-mediated regulatory function. To systematically analyze miRNA-related SNPs and their effects, we performed a genome-wide scan for SNPs in human pre-miRNAs, miRNA flanking regions, target sites, and designed a pipeline to predict the effects of them on miRNA-target interaction. As a result, we identified 48 SNPs in human miRNA seed regions and thousands of SNPs in 3' untranslated regions with the potential to either disturb or create miRNA-target interactions. Furthermore, we experimentally confirmed seven loss-of-function SNPs and one gain-of-function SNP by luciferase assay. This is the first case of experimental validation of an SNP in an miRNA creating a novel miRNA target binding. All useful data were complied into miRNASNP, a user-friendly free online database (http://www.bioguo.org/miRNASNP/). These data will be a useful resource for studying miRNA function, identifying disease-associated miRNAs, and further personalized medicine. Hum Mutat 33: 254-263, 2012. (C) 2011 Wiley Periodicals, Inc.
Mutation detection through exome sequencing allows simultaneous analysis of all coding sequences of genes. However, it cannot yet replace S anger sequencing ( SS ) in diagnostics because of incomplete representation and coverage of exons leading to missing clinically relevant mutations. Targeted next‐generation sequencing ( NGS ), in which a selected fraction of genes is sequenced, may circumvent these shortcomings. We aimed to determine whether the sensitivity and specificity of targeted NGS is equal to those of SS . We constructed a targeted enrichment kit that includes 48 genes associated with hereditary cardiomyopathies. In total, 84 individuals with cardiomyopathies were sequenced using 151 bp paired‐end reads on an I llumina M i S eq sequencer. The reproducibility was tested by repeating the entire procedure for five patients. The coverage of ≥30 reads per nucleotide, our major quality criterion, was 99% and in total ∼21,000 variants were identified. Confirmation with SS was performed for 168 variants (155 substitutions, 13 indels). All were confirmed, including a deletion of 18 bp and an insertion of 6 bp. The reproducibility was nearly 100%. We demonstrate that targeted NGS of a disease‐specific subset of genes is equal to the quality of SS and it can therefore be reliably implemented as a stand‐alone diagnostic test. A NGS stand‐alone diagnostic test for hereditary cardio‐myopathies. Parallel analyses of 48 genes on a MiSeq identified ∼21,000 variants in 84 patients. The sensitivity and specificity of this test equals Sanger Sequencing. In total 168 variants were confirmed, no false‐positives or false‐negatives were detected.
The current version of the androgen receptor gene (AR) mutations database is described. A major change to the database is that the nomenclature and numbering scheme now conforms to all Human Genome Variation Society norms. The total number of reported mutations has risen from 605 to 1,029 since 2004. The database now contains a number of mutations that are associated with prostate cancer (CaP) treatment regimens, while the number of AR mutations found in CaP tissues has more than doubled from 76 to 159. In addition, in a number of androgen insensitivity syndrome (AIS) and CaP cases, multiple mutations have been found within the same tissue samples. For the first time, we report on a disconnect within the AIS phenotype-genotype relationship among our own patient database, in that over 40% of our patients with a classic complete AIS or partial AIS phenotypes did not appear to have a mutation in their AR gene. The implications of this phenomenon on future locus-specific mutation database (LSDB) development are discussed, together with the concept that mutations can be associated with both loss- and gain-of-function, and the effect of multiple AR mutations within individuals. The database is available on the internet (http://androgendb.mcgill.ca), and a web-based LSDB with the variants using the Leiden Open Variation Database platform is available at http://www.lovd.nl/AR. Hum Mutat 33: 887-894, 2012. (C) 2012 Wiley Periodicals, Inc.