Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high‐quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high‐quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. Multiple sequence alignments are fundamental to many sequence analysis methods. The new program Clustal Omega can align virtually any number of protein sequences quickly and has powerful features for adding sequences to existing precomputed alignments.
While the number and identity of proteins expressed in a single human cell type is currently unknown, this fundamental question can be addressed by advanced mass spectrometry (MS)‐based proteomics. Online liquid chromatography coupled to high‐resolution MS and MS/MS yielded 166 420 peptides with unique amino‐acid sequence from HeLa cells. These peptides identified 10 255 different human proteins encoded by 9207 human genes, providing a lower limit on the proteome in this cancer cell line. Deep transcriptome sequencing revealed transcripts for nearly all detected proteins. We calculate copy numbers for the expressed proteins and show that the abundances of >90% of them are within a factor 60 of the median protein expression level. Comparisons of the proteome and the transcriptome, and analysis of protein complex databases and GO categories, suggest that we achieved deep coverage of the functional transcriptome and the proteome of a single cell type. More than 10 000 proteins were identified by high‐resolution mass spectrometry in a human cancer cell line. The data cover most of the functional proteome as judged by RNA‐seq data and it reveals the expression range of different protein classes.
The initial genome‐scale reconstruction of the metabolic network of Escherichia coli K‐12 MG1655 was assembled in 2000. It has been updated and periodically released since then based on new and curated genomic and biochemical knowledge. An update has now been built, named i JO1366, which accounts for 1366 genes, 2251 metabolic reactions, and 1136 unique metabolites. i JO1366 was (1) updated in part using a new experimental screen of 1075 gene knockout strains, illuminating cases where alternative pathways and isozymes are yet to be discovered, (2) compared with its predecessor and to experimental data sets to confirm that it continues to make accurate phenotypic predictions of growth on different substrates and for gene knockout strains, and (3) mapped to the genomes of all available sequenced E. coli strains, including pathogens, leading to the identification of hundreds of unannotated genes in these organisms. Like its predecessors, the i JO1366 reconstruction is expected to be widely deployed for studying the systems biology of E. coli and for metabolic engineering applications.
The generation of mathematical models of biological processes, the simulation of these processes under different conditions, and the comparison and integration of multiple data sets are explicit goals of systems biology that require the knowledge of the absolute quantity of the system's components. To date, systematic estimates of cellular protein concentrations have been exceptionally scarce. Here, we provide a quantitative description of the proteome of a commonly used human cell line in two functional states, interphase and mitosis. We show that these human cultured cells express at least ∼10 000 proteins and that the quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell. We discuss how protein abundance is linked to function and evolution. The majority of all proteins expressed in the human osteosarcoma cell line U2OS were absolutely quantified by mass spectrometry. The quantified proteins span a concentration range of seven orders of magnitude up to 20 000 000 copies per cell.
Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology. Deep learning, a class of modern machine learning methods, has become a go‐to approach for analysing large‐scale high‐dimensional data. This review discusses its applications in biology, focusing on regulatory genomics and cellular imaging, and gives guidelines for practitioners.
The plant hormone auxin is thought to provide positional information for patterning during development. It is still unclear, however, precisely how auxin is distributed across tissues and how the hormone is sensed in space and time. The control of gene expression in response to auxin involves a complex network of over 50 potentially interacting transcriptional activators and repressors, the auxin response factors (ARFs) and Aux/IAAs. Here, we perform a large‐scale analysis of the Aux/IAA‐ARF pathway in the shoot apex of Arabidopsis , where dynamic auxin‐based patterning controls organogenesis. A comprehensive expression map and full interactome uncovered an unexpectedly simple distribution and structure of this pathway in the shoot apex. A mathematical model of the Aux/IAA‐ARF network predicted a strong buffering capacity along with spatial differences in auxin sensitivity. We then tested and confirmed these predictions using a novel auxin signalling sensor that reports input into the signalling pathway, in conjunction with the published DR5 transcriptional output reporter. Our results provide evidence that the auxin signalling network is essential to create robust patterns at the shoot apex. The plant hormone auxin is a key morphogenetic signal involved in the control of cell identity throughout development. A striking example of auxin action is at the shoot apical meristem (SAM), a population of stem cells generating the aerial parts of the plant. Organ positioning and patterning depends on local accumulations of auxin in the SAM, generated by polar transport of auxin ( Vernoux , 2010 ). However, it is still unclear how auxin is distributed at cell resolution in tissues and how the hormone is sensed in space and time during development. A complex ensemble of 29 Aux/IAAs and 23 ARFs is central to the regulation of gene transcription in response to auxin (for review, see Leyser, 2006 ; Guilfoyle and Hagen, 2007 ; Chapman and Estelle, 2009 ). Protein–protein interactions govern the properties of this transduction pathway ( Del Bianco and Kepinski, 2011 ). Limited interaction studies suggest that, in the absence of auxin, the Aux/IAA repressors form heterodimers with the ARF transcription factors, preventing them from regulating target genes. In the presence of auxin, the Aux/IAA proteins are targeted to the proteasome by an SCF E3 ubiquitin ligase complex ( Chapman and Estelle, 2009 ; Leyser, 2006 ). In this process, auxin promotes the interaction between Aux/IAA proteins and the TIR1 F‐box of the SCF complex (or its AFB homologues) that acts as an auxin co‐receptor ( Dharmasiri , 2005a , 2005b ; Kepinski and Leyser, 2005 ; Tan , 2007 ). The auxin‐induced degradation of Aux/IAAs would then release ARFs to regulate transcription of their target genes. This includes activation of most of the Aux/IAA genes themselves, thus establishing a negative feedback loop ( Guilfoyle and Hagen, 2007 ). Although this general scenario provides a framework for understanding gene regulation by auxin, the underlying protein–protein network remains to be fully characterized. In this paper, we combined experimental and theoretical analyses to understand how this pathway contributes to sensing auxin in space and time ( Figure 1 ). We first analysed the expression patterns of the ARFs , Aux/IAAs and TIR1 / AFBs genes in the SAM. Our results demonstrate a general tendency for most of the 25 ARFs and Aux/IAAs detected in the SAM: a differential expression with low levels at the centre of the meristem (where the stem cells are located) and high levels at the periphery of the meristem (where organ initiation takes place). We also observed a similar differential expression for TIR1/AFB co‐receptors. To understand the functional significance of the distribution of ARFs and Aux/IAAs in the SAM, we next investigated the global structure of the Aux/IAA‐ARF network using a high‐throughput yeast two‐hybrid approach and uncover a rather simple topology that relies on three basic generic features: (i) Aux/IAA proteins interact with themselves, (ii) Aux/IAA proteins interact with ARF activators and (iii) ARF repressors have no or very limited interactions with other proteins in the network. The results of our interaction analysis suggest a model for the Aux/IAA‐ARF signalling pathway in the SAM, where transcriptional activation by ARF activators would be negatively regulated by two independent systems, one involving the ARF repressors, the other the Aux/IAAs. The presence of auxin would remove the inhibitory action of Aux/IAAs, but leave the ARF repressors to compete with ARF activators for promoter‐binding sites. To explore the regulatory properties of this signalling network, we developed a mathematical model to describe the transcriptional output as a function of the signalling input that is the combinatorial effect of auxin concentration and of its perception. We then used the model and a simplified view of the meristem (where the same population of Aux/IAAs and ARFs exhibit a low expression at the centre and a high expression in the peripheral zone) for investigating the role of auxin signalling in SAM function. We show that in the model, for a given ARF activator‐to‐repressor ratio, the gene induction capacity increases with the absolute levels of ARF proteins. We thus predict that the differential expression of the ARF s generates differences in auxin sensitivities between the centre (low sensitivity) and the periphery (high sensitivity), and that the expression of TIR1/AFB participates to this regulation (prediction 1). We also use the model to analyse the transcriptional response to rapidly changing auxin concentrations. By simulating situations equivalent either to the centre or the periphery of our simplified representation of the SAM, we predict that the signalling pathway buffers its response to the auxin input via the balance between ARF activators and repressors, in turn generated by their differential spatial distributions (prediction 2). To test the predictions from the model experimentally, we needed to assess both the input (auxin level and/or perception) and the output (target gene induction) of the signalling cascade. For measuring the transcriptional output, the widely used DR5 reporter is perfectly adapted ( Figure 5 ) ( Ulmasov , 1997 ; Sabatini , 1999 ; Benkova , 2003 ; Heisler , 2005 ). For assaying pathway input, we designed DII‐VENUS, a novel auxin signalling sensor that comprises a constitutively expressed fusion of the auxin‐binding domain (termed domain II or DII) ( Dreher , 2006 ; Tan , 2007 ) of an IAA to a fast‐maturating variant of YFP, VENUS ( Figure 5 ). The degradation patterns from DII‐VENUS indicate a high auxin signalling input both in flower primordia and at the centre of the SAM. This is in contrast to the organ‐specific expression pattern of DR5::VENUS ( Figure 5 ). These results indicate that the signalling pathway limits gene activation in response to auxin at the meristem centre and confirm the differential sensitivity to auxin between the centre and the periphery (prediction 1). We further confirmed the buffering capacities of the signalling pathway (prediction 2) by carrying out live imaging experiments to monitor DII‐VENUS and DR5::VENUS expression in real time ( Figure 5 ). This analysis reveals the presence of important temporal variations of DII‐VENUS fluorescence, while DR5::VENUS does not show such global variations. Our approach thus provides evidence that the Aux/IAA‐ARF pathway has a key role in patterning in the SAM, alongside the auxin transport system. Our results illustrate how the tight spatio‐temporal regulation of both the distribution of a morphogenetic signal and the activity of the downstream signalling pathway provides robustness to a dynamic developmental process. We provide a comprehensive expression map of the different genes (TIR1/AFBs, ARFs and Aux/IAAs) involved in the signalling pathway regulating gene transcription in response to auxin in the shoot apical meristem (SAM). We demonstrate a relatively simple structure of this pathway using a high‐throughput yeast two‐hybrid approach to obtain the Aux/IAA‐ARF full interactome. The topology of the signalling network was used to construct a model for auxin signalling and to predict a role for the spatial regulation of auxin signalling in patterning of the SAM. We used a new sensor to monitor the input in the auxin signalling pathway and to confirm the model prediction, thus demonstrating that auxin signalling is essential to create robust patterns at the SAM.
Many compounds being considered as candidates for advanced biofuels are toxic to microorganisms. This introduces an undesirable trade‐off when engineering metabolic pathways for biofuel production because the engineered microbes must balance production against survival. Cellular export systems, such as efflux pumps, provide a direct mechanism for reducing biofuel toxicity. To identify novel biofuel pumps, we used bioinformatics to generate a list of all efflux pumps from sequenced bacterial genomes and prioritized a subset of targets for cloning. The resulting library of 43 pumps was heterologously expressed in Escherichia coli , where we tested it against seven representative biofuels. By using a competitive growth assay, we efficiently distinguished pumps that improved survival. For two of the fuels ( n ‐butanol and isopentanol), none of the pumps improved tolerance. For all other fuels, we identified pumps that restored growth in the presence of biofuel. We then tested a beneficial pump directly in a production strain and demonstrated that it improved biofuel yields. Our findings introduce new tools for engineering production strains and utilize the increasingly large database of sequenced genomes. Biofuels can be produced by microbes, but biofuel toxicity is a major obstacle to efficient production. Here, the authors identify efflux pumps that can effectively export biofuels, improving cell viability and increasing biofuel yields.
Inferring potential drug indications, for either novel or approved drugs, is a key step in drug development. Previous computational methods in this domain have focused on either drug repositioning or matching drug and disease gene expression profiles. Here, we present a novel method for the large‐scale prediction of drug indications (PREDICT) that can handle both approved drugs and novel molecules. Our method is based on the observation that similar drugs are indicated for similar diseases, and utilizes multiple drug–drug and disease–disease similarity measures for the prediction task. On cross‐validation, it obtains high specificity and sensitivity (AUC=0.9) in predicting drug indications, surpassing existing methods. We validate our predictions by their overlap with drug indications that are currently under clinical trials, and by their agreement with tissue‐specific expression information on the drug targets. We further show that disease‐specific genetic signatures can be used to accurately predict drug indications for new diseases (AUC=0.92). This lays the computational foundation for future personalized drug treatments, where gene expression signatures from individual patients would replace the disease‐specific signatures. Predicting indications for new molecules or finding alternative indications for approved drugs is a laborious and costly process ( DiMasi , 2003 ), calling for computational solutions that would minimize production time and development costs ( Terstappen and Reggiani, 2001 ). Here, we present a novel method for predicting drug indications, PREDICT, capable of handling both approved drugs and novel molecules. Our method is based on the assumption that similar drugs are indicated for similar diseases. To score a possible drug–disease association, we compute its similarity to known associations by combining drug–drug and disease–disease similarity computations. This strategy achieves high specificity and sensitivity rates in a cross‐validation setting, where part of the known associations are hidden and the method is assessed based on how well it can retrieve them based on the rest of the associations. Assessing its predictions of novel indications for existing drugs, we find that it covers a significant portion (27%, P <2 × 10 −220 ) of drug indications currently tested on clinical trials. Examples of such predictions include: (i) Cabergoline, indicated for Hyperprolactinemia, which is predicted to treat Migrane, a prediction supported by two separate studies ( Verhelst , 1999 ; Cavestro , 2006 ) and (ii) Progesterone, which is predicted to treat renal cell cancer, non‐papillary (npRCC), supported by the study of Izumi (2007) . In addition, we provide indication predictions for novel molecules. For example, Cycloleucine is predicted for the treatment of Alzheimer's disease (AD); indeed, Cycloleucine was found to be a potent and selective antagonist of NMDA receptor‐mediated responses ( Hershkowitz and Rogawski, 1989 ), a new promising class of chemicals for the treatment of AD ( Farlow, 2004 ). As another example, Hyperforin, St John's wort extract, is predicted to treat hyperthermia. Interestingly, St John's wort extract was found to have anxiolytic effects on stress‐induced hyperthermia in mice ( Grundmann , 2006 ). We further introduce a disease–disease similarity measure based on disease‐specific gene signatures and show that such a measure can be used by our method to accurately predict drug indications. Importantly, this suggests the potential utility of our approach also in a personalized medicine setting, whereby future gene expression signatures from individual patients would replace these disease‐specific signatures. We present a novel method for the large‐scale prediction of drug indications that can handle both approved drugs and novel molecules. Our method utilizes multiple drug–drug and disease–disease similarity measures for the prediction task, obtaining high specificity and sensitivity rates (AUC=0.9). Our drug repositioning predictions cover 27% of the indications currently tested on clinical trials ( P <2 × 10 −220 ). We show comparable performance using a gene expression signature‐based disease–disease similarity, laying the computational foundation for predicting patient‐specific indications.
Several bacterial species have been implicated in the development of colorectal carcinoma ( CRC ), but CRC ‐associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here, we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor‐free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test ( FOBT ) and when both approaches were combined, sensitivity improved > 45% relative to the FOBT , while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early‐ and late‐stage cancer and could be validated in independent patient and control populations ( N = 335) from different countries. CRC ‐associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor‐related host–microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients, accompanied by an increase of lipopolysaccharide metabolism. Metagenomic profiling of fecal samples from colorectal cancer ( CRC ) patients in comparison with tumor‐free controls reveals strong associations between the gut microbiota and cancer. Their potential for noninvasive cancer screening is explored systematically. A classification model based on gut microbial marker species distinguishes CRC patients from controls with similar accuracy as the fecal occult blood test ( FOBT ), routinely used for clinical screening. Combining metagenomic data with the FOBT leads to a relative improvement in sensitivity of > 45% over the FOBT alone at identical specificity. Detection accuracy of the metagenomic test is maintained in an independent study population and is still high for alternative microbiome readouts, such as the abundance of 16S rRNA OTUs or families of functionally related genes. Functional metagenomic analysis indicates an increased potential of CRC ‐associated microbiota for degradation of host glycans and amino acids and for pro‐inflammatory lipopolysaccharide metabolism. Metagenomic profiling of fecal samples from colorectal cancer ( CRC ) patients in comparison with tumor‐free controls reveals strong associations between the gut microbiota and cancer. Their potential for noninvasive cancer screening is explored systematically.
Protein and genetic interaction maps can reveal the overall physical and functional landscape of a biological system. To date, these interaction maps have typically been generated under a single condition, even though biological systems undergo differential change that is dependent on environment, tissue type, disease state, development or speciation. Several recent interaction mapping studies have demonstrated the power of differential analysis for elucidating fundamental biological responses, revealing that the architecture of an interactome can be massively re‐wired during a cellular or adaptive response. Here, we review the technological developments and experimental designs that have enabled differential network mapping at very large scales and highlight biological insight that has been derived from this type of analysis. We argue that differential network mapping, which allows for the interrogation of previously unexplored interaction spaces, will become a standard mode of network analysis in the future, just as differential gene expression and protein phosphorylation studies are already pervasive in genomic and proteomic analysis. Protein and genetic interaction maps have typically been generated under a single condition, providing a static view of the interactome. Recent studies employing differential analysis, however, have revealed that widespread re‐wiring of the interactome underlies key biological responses.
The interest in studying metabolic alterations in cancer and their potential role as novel targets for therapy has been rejuvenated in recent years. Here, we report the development of the first genome‐scale network model of cancer metabolism, validated by correctly identifying genes essential for cellular proliferation in cancer cell lines. The model predicts 52 cytostatic drug targets, of which 40% are targeted by known, approved or experimental anticancer drugs, and the rest are new. It further predicts combinations of synthetic lethal drug targets, whose synergy is validated using available drug efficacy and gene expression measurements across the NCI‐60 cancer cell line collection. Finally, potential selective treatments for specific cancers that depend on cancer type‐specific downregulation of gene expression and somatic mutations are compiled. During tumor development, cancer cells modify their metabolism to meet the requirements of cellular proliferation, thus facilitating the uptake and conversion of nutrients into biomass. Many key metabolic alterations are similar across tumor cells, including changes in glucose metabolism that give rise to the Warburg effect, and an increase in biosynthetic activities (such as nucleotide, lipids and amino‐acid synthesis) ( DeBerardinis , 2008 ; Tennant , 2009 ; Vander Heiden , 2009 ). The observation that many types of cancer cells adapt their metabolism toward increased proliferation makes flux balance analysis (FBA), a constraint‐based modeling (CBM) approach, suitable for modeling cancer metabolism as it assumes that cells are under selective pressure to increase their growth rate ( Price , 2003 ). Building on previous reconstructions of a generic (non‐tissue specific) human metabolic network ( Duarte , 2007 ; Ma , 2007 ), we develop here the first large‐scale FBA model of cancer metabolism that aims to capture the main metabolic alterations that are common across many cancer types. The model reconstruction is based on our recent computational method for the automatic reconstruction of human tissue metabolic models ( Jerby , 2010 ), integrating the human metabolic model with cancer gene expression data. The construction of the cancer model focuses on activating a core set of metabolic enzyme‐coding genes that are highly expressed across cancer cell lines in the NCI‐60 collection, with additional reactions enabling their activation and the biosynthesis of a set of biomass compounds required for cellular proliferation. This generic cancer model enables the successful prediction of the metabolic state of cancer cells across different gene knockdowns and modeling the effects of drug applications on a large scale. As a first demonstration of the predictive performance of the cancer model, we applied it to predict 199 growth‐supporting genes whose knockdown is expected to inhibit cellular proliferation, showing that the model predictions indeed match results of shRNA gene silencing experiments ( Luo , 2008 ). To identify viable anticancer drug targets, we predicted whether the knockdown of the growth‐supporting genes is likely to be toxic to normal cells. Out of the 199 genes that are predicted to be growth supporting in the cancer model, 52 are predicted to have negligible effects on energy production in normal cells. However, the knockdown of the majority of the latter is yet predicted to potentially cause damage to proliferation of normal cells, suggesting that the targeting of these genes would cause similar side effects to those observed with current cytostatic drugs ( Partridge , 2001 ). Next, we predicted 342 synthetic lethal drug targets, whose predicted synergy was validated based on (i) comparison with genetic interactions between the corresponding yeast orthologs ( Costanzo , 2010 ) and (ii) by analyzing the efficacy of metabolic drugs targeting these genes, finding that drugs that target a single gene (participating in a predicted synthetic lethal pair) indeed have higher efficacy in cell lines in which the synergistic gene is lowly expressed. In contrast to the single targets described above, the knockdown of a third of these synergistic pairs is predicted to leave the proliferation of normal cells intact. Most importantly, the specific targeting of a gene participating in a synergistic pair is especially appealing in tumors in which its interacting gene is specifically inactivated—the targeting of such a gene solely is likely to selectively damage the tumor, without affecting the function of healthy tissues in which the interacting gene in the pair is active. We utilized genomic and transcriptomic data to infer gene inactivation across an array of cancers, which has led to the identification of cancer type‐specific targets based on the intersection of this data with our predicted synergistic gene pairs. In summary, the model presented here lays down a fundamental computational approach for interpreting the rapidly accumulating proteomics ( Bichsel , 2001 ) and metabolomics ( Fan , 2009 ) data characterizing cancer metabolic alterations. We hope that the publication of this first step will spur further studies aimed at obtaining a systems level understanding of cancer metabolism and at designing new therapeutic means that selectively target them. The first genome‐scale network model of cancer metabolism is developed and validated by successfully identifying genes essential for cellular proliferation in cancer cell lines. The model predicts 52 cytostatic drug targets, of which 40% are targeted by known, approved or experimental anticancer drugs, and the rest are new. Combinations of synthetic lethal drug targets are predicted, whose synergy is validated using available drug efficacy and gene expression measurements across the NCI‐60 cancer cell line collection. Potential selective treatments for specific cancers that depend on cancer type‐specific downregulation of gene expression and somatic mutations are compiled.
Proper functioning of biological cells requires that the process of protein expression be carried out with high efficiency and fidelity. Given an amino‐acid sequence of a protein, multiple degrees of freedom still remain that may allow evolution to tune efficiency and fidelity for each gene under various conditions and cell types. Particularly, the redundancy of the genetic code allows the choice between alternative codons for the same amino acid, which, although ‘synonymous,’ may exert dramatic effects on the process of translation. Here we review modern developments in genomics and systems biology that have revolutionized our understanding of the multiple means by which translation is regulated. We suggest new means to model the process of translation in a richer framework that will incorporate information about gene sequences, the tRNA pool of the organism and the thermodynamic stability of the mRNA transcripts. A practical demonstration of a better understanding of the process would be a more accurate prediction of the proteome, given the transcriptome at a diversity of biological conditions.
Oncogenes such as K‐ ras mediate cellular and metabolic transformation during tumorigenesis. To analyze K‐Ras‐dependent metabolic alterations, we employed 13 C metabolic flux analysis (MFA), non‐targeted tracer fate detection (NTFD) of 15 N‐labeled glutamine, and transcriptomic profiling in mouse fibroblast and human carcinoma cell lines. Stable isotope‐labeled glucose and glutamine tracers and computational determination of intracellular fluxes indicated that cells expressing oncogenic K‐Ras exhibited enhanced glycolytic activity, decreased oxidative flux through the tricarboxylic acid (TCA) cycle, and increased utilization of glutamine for anabolic synthesis. Surprisingly, a non‐canonical labeling of TCA cycle‐associated metabolites was detected in both transformed cell lines. Transcriptional profiling detected elevated expression of several genes associated with glycolysis, glutamine metabolism, and nucleotide biosynthesis upon transformation with oncogenic K‐Ras. Chemical perturbation of enzymes along these pathways further supports the decoupling of glycolysis and TCA metabolism, with glutamine supplying increased carbon to drive the TCA cycle. These results provide evidence for a role of oncogenic K‐Ras in the metabolic reprogramming of cancer cells. The ras and myc oncogenes drive pleiotropic changes in cell signaling, nutrient uptake, and intracellular metabolism ( Chiaradonna , 2006b ; Yuneva , 2007 ; Wise , 2008 ; Vander Heiden , 2009 ). Mutated ras proteins, identified in 25% of human cancers ( Bos, 1989 ; Downward, 2003 ), correlate with an increased rate of glucose consumption, lactate accumulation, altered expression of mitochondrial genes, increased ROS production, and reduced mitochondrial activity ( Bos, 1989 ; Downward, 2003 ; Vizan , 2005 ; Chiaradonna , 2006a ; Yun , 2009 ; Baracca , 2010 ; Weinberg , 2010 ). Furthermore, K‐Ras transformed cancer cells are dependent upon glucose and glutamine availability, since their withdrawal induces apoptosis and cell‐cycle arrest, respectively ( Ramanathan , 2005 ; Telang , 2006 ; Yun , 2009 ). However, the precise metabolic effects downstream of oncogenic Ras signaling as well as the mechanisms by which intracellular glucose and glutamine metabolism change have not been completely elucidated. In this report, we have investigated the reprogramming of central carbon metabolism in cancer cells and its regulation by the K‐ ras oncogene, applying a systems level approach using 13 C metabolic flux analysis (MFA), non‐targeted tracer fate detection (NTFD), and transcriptional profiling. These data reveal a coordinated decoupling of glycolysis and the tricarboxylic acid (TCA) cycle. K‐Ras transformed mouse and human cells exhibited a high glucose to lactate flux and relatively lower oxidative metabolism of pyruvate. Such changes were supported by increased expression of glycolytic genes as well as several pyruvate dehydrogenase kinases. In contrast to glucose, the contribution of glutamine carbon to TCA cycle intermediates through both oxidative and reductive metabolism was significantly increased upon K‐Ras transformation. Despite this increase in glutamine anaplerosis, oxidative TCA flux was significantly decreased. Additionally, we observed elevated levels of glutamine‐derived nitrogen in various biosynthetic metabolites in transformed cells, including amino acids, 5‐oxoproline, and the nucleobase adenine. Consistent with these changes, we detected increased transcription of genes associated with glutamine metabolism and nucleotide biosynthesis in cells expressing oncogenic K‐Ras. Taken together, these findings indicate an important role of oncogenic K‐Ras in cancer cell metabolism. The observed decoupling of glucose and glutamine metabolism enables the efficient utilization of both carbon and nitrogen from glutamine for biosynthetic processes. In accord with these alterations, oncogenic K‐Ras induces gene expression changes that may drive this metabolic reprogramming. Finally, these results may enable the identification of metabolic and transcriptional targets throughout the network and allow more effective cancer therapies. A systems approach using 13C metabolic flux analysis (MFA), non‐targeted tracer fate detection (NTFD), and transcriptional profiling was applied to investigate the role of oncogenic K‐Ras in metabolic transformation. K‐Ras transformed cells exhibit an increased glycolytic rate and lower flux through the oxidative tricarboxylic acid (TCA) cycle. K‐Ras transformed cells show a relative increase in glutamine anaplerosis and reductive TCA metabolism. Transcriptional changes driven by oncogenic K‐Ras suggest control nodes associated with the metabolic reprogramming of cancer cells.
Type 2 diabetes (T2D) can be prevented in pre‐diabetic individuals with impaired glucose tolerance (IGT). Here, we have used a metabolomics approach to identify candidate biomarkers of pre‐diabetes. We quantified 140 metabolites for 4297 fasting serum samples in the population‐based Cooperative Health Research in the Region of Augsburg (KORA) cohort. Our study revealed significant metabolic variation in pre‐diabetic individuals that are distinct from known diabetes risk indicators, such as glycosylated hemoglobin levels, fasting glucose and insulin. We identified three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine) that had significantly altered levels in IGT individuals as compared to those with normal glucose tolerance, with P ‐values ranging from 2.4 × 10 −4 to 2.1 × 10 −13 . Lower levels of glycine and LPC were found to be predictors not only for IGT but also for T2D, and were independently confirmed in the European Prospective Investigation into Cancer and Nutrition (EPIC)‐Potsdam cohort. Using metabolite–protein network analysis, we identified seven T2D‐related genes that are associated with these three IGT‐specific metabolites by multiple interactions with four enzymes. The expression levels of these enzymes correlate with changes in the metabolite concentrations linked to diabetes. Our results may help developing novel strategies to prevent T2D. A targeted metabolomics approach was used to identify candidate biomarkers of pre‐diabetes. The relevance of the identified metabolites is further corroborated with a protein‐metabolite interaction network and gene expression data. A targeted metabolomics approach was used to identify candidate biomarkers of pre‐diabetes. The relevance of the identified metabolites is further corroborated with a protein‐metabolite interaction network and gene expression data. Three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine C2) were found with significantly altered levels in pre‐diabetic individuals compared with normal controls. Lower levels of glycine and LPC (18:2) were found to predict risks for pre‐diabetes and type 2 diabetes (T2D). Seven T2D‐related genes ( PPARG , TCF7L2 , HNF1A , GCK , IGF1 , IRS1 and IDE ) are functionally associated with the three identified metabolites. The unique combination of methodologies, including prospective population‐based and nested case–control, as well as cross‐sectional studies, was essential for the identification of the reported biomarkers.
The molecular understanding of phenotypes caused by drugs in humans is essential for elucidating mechanisms of action and for developing personalized medicines. Side effects of drugs (also known as adverse drug reactions) are an important source of human phenotypic information, but so far research on this topic has been hampered by insufficient accessibility of data. Consequently, we have developed a public, computer‐readable side effect resource (SIDER) that connects 888 drugs to 1450 side effect terms. It contains information on frequency in patients for one‐third of the drug–side effect pairs. For 199 drugs, the side effect frequency of placebo administration could also be extracted. We illustrate the potential of SIDER with a number of analyses. The resource is freely available for academic research at http://sideeffects.embl.de.
The genome‐scale model (GEM) of metabolism in the bacterium Escherichia coli K‐12 has been in development for over a decade and is now in wide use. GEM‐enabled studies of E. coli have been primarily focused on six applications: (1) metabolic engineering, (2) model‐driven discovery, (3) prediction of cellular phenotypes, (4) analysis of biological network properties, (5) studies of evolutionary processes, and (6) models of interspecies interactions. In this review, we provide an overview of these applications along with a critical assessment of their successes and limitations, and a perspective on likely future developments in the field. Taken together, the studies performed over the past decade have established a genome‐scale mechanistic understanding of genotype–phenotype relationships in E. coli metabolism that forms the basis for similar efforts for other microbial species. Future challenges include the expansion of GEMs by integrating additional cellular processes beyond metabolism, the identification of key constraints based on emerging data types, and the development of computational methods able to handle such large‐scale network models with sufficient accuracy. This review summarizes the applications enabled by genome‐scale models of metabolism for the bacterium E. coli . It provides an overview of the applications along with a critical assessment of their successes and limitations, and a perspective on likely future developments in the field.
Metabolic network reconstruction encompasses existing knowledge about an organism's metabolism and genome annotation, providing a platform for omics data analysis and phenotype prediction. The model alga Chlamydomonas reinhardtii is employed to study diverse biological processes from photosynthesis to phototaxis. Recent heightened interest in this species results from an international movement to develop algal biofuels. Integrating biological and optical data, we reconstructed a genome‐scale metabolic network for this alga and devised a novel light‐modeling approach that enables quantitative growth prediction for a given light source, resolving wavelength and photon flux. We experimentally verified transcripts accounted for in the network and physiologically validated model function through simulation and generation of new experimental growth data, providing high confidence in network contents and predictive applications. The network offers insight into algal metabolism and potential for genetic engineering and efficient light source design, a pioneering resource for studying light‐driven metabolism and quantitative systems biology. Algae have garnered significant interest in recent years, especially for their potential application in biofuel production. The hallmark, model eukaryotic microalgae Chlamydomonas reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes ( Harris, 2001 ). Characterizing algal metabolism is key to engineering production strains and understanding photobiological phenomena. Based on extensive literature on C. reinhardtii metabolism, its genome sequence ( Merchant , 2007 ), and gene functional annotation, we have reconstructed and experimentally validated the genome‐scale metabolic network for this alga, i RC1080, the first network to account for detailed photon absorption permitting growth simulations under different light sources. i RC1080 accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites and encompasses 83 subsystems distributed across 10 cellular compartments ( Figure 1A ). Its >32% coverage of estimated metabolic genes is a tremendous expansion over previous algal reconstructions ( Boyle and Morgan, 2009 ; Manichaikul , 2009 ). The lipid metabolic pathways of i RC1080 are considerably expanded relative to existing networks, and chemical properties of all metabolites in these pathways are accounted for explicitly, providing sufficient detail to completely specify all individual molecular species: backbone molecule and stereochemical numbering of acyl‐chain positions; acyl‐chain length; and number, position, and cis – trans stereoisomerism of carbon–carbon double bonds. Such detail in lipid metabolism will be critical for model‐driven metabolic engineering efforts. We experimentally verified transcripts accounted for in the network under permissive growth conditions, detecting >90% of tested transcript models ( Figure 1B ) and providing validating evidence for the contents of i RC1080. We also analyzed the extent of transcript verification by specific metabolic subsystems. Some subsystems stood out as more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited 32% of the estimated metabolic genes encoded in the genome, and including extensive details of lipid metabolic pathways. This is the first metabolic network to explicitly account for stoichiometry and wavelengths of metabolic photon usage, providing a new resource for research of C. reinhardtii metabolism and developments in algal biotechnology. Metabolic functional annotation and the largest transcript verification of a metabolic network to date was performed, at least partially verifying >90% of the transcripts accounted for in i RC1080. Analysis of the network supports hypotheses concerning the evolution of latent lipid pathways in C. reinhardtii , including very long‐chain polyunsaturated fatty acid and ceramide synthesis pathways. A novel approach for modeling light‐driven metabolism was developed that accounts for both light source intensity and spectral quality of emitted light. The constructs resulting from this approach, termed prism reactions, were shown to significantly improve the accuracy of model predictions, and their use was demonstrated for evaluation of light source efficiency and design.
To obtain rates of mRNA synthesis and decay in yeast, we established dynamic transcriptome analysis (DTA). DTA combines non‐perturbing metabolic RNA labeling with dynamic kinetic modeling. DTA reveals that most mRNA synthesis rates are around several transcripts per cell and cell cycle, and most mRNA half‐lives range around a median of 11 min. DTA can monitor the cellular response to osmotic stress with higher sensitivity and temporal resolution than standard transcriptomics. In contrast to monotonically increasing total mRNA levels, DTA reveals three phases of the stress response. During the initial shock phase, mRNA synthesis and decay rates decrease globally, resulting in mRNA storage. During the subsequent induction phase, both rates increase for a subset of genes, resulting in production and rapid removal of stress‐responsive mRNAs. During the recovery phase, decay rates are largely restored, whereas synthesis rates remain altered, apparently enabling growth at high salt concentration. Stress‐induced changes in mRNA synthesis rates are predicted from gene occupancy with RNA polymerase II. DTA‐derived mRNA synthesis rates identified 16 stress‐specific pairs/triples of cooperative transcription factors, of which seven were known. Thus, DTA realistically monitors the dynamics in mRNA metabolism that underlie gene regulatory systems. Nascent transcriptome analysis reveals dynamics of mRNA synthesis and decay in yeast. The first step in the expression of the genome is the synthesis of messenger‐RNA (mRNA). In all cells, the regulation of mRNA levels in response to changing environmental conditions is a fundamental process. Classical methods to study such changes in mRNA levels, however, fail to unravel whether such changes are due to changes in mRNA synthesis (transcription) or changes in mRNA decay, which both contribute to setting mRNA levels. Therefore, the regulation of mRNA stability and turnover is poorly understood, and new methods for a quantitative analysis of mRNA synthesis and decay are urgenlty sought. In this study, we describe a novel method termed dynamic transcriptome analysis (DTA), which can be used to determine synthesis and decay rates of mRNAs on a genome‐wide level in yeast and other eukaryotic cells. We applied DTA to the model organism Saccharomyces cerevisiae and analyzed the dynamics of the transcriptome under standard growth conditions as well as under osmotic stress conditions. DTA relies on a combination of biochemistry, high‐throughput data acquisition, and computational biology. It uses metabolic labeling of newly synthesised RNA with the nucleoside analogon 4‐thiouridine (4sU), purification of labeled, newly synthesized RNA, and subsequent microarray hybridization. An improved mathematical model enables synthesis and decay rates of esentially all mRNAs in the cell to be determined with accuracy. In this study, we found that under normal growth conditions the synthesis rates for most mRNAs are low and that the decay rates are not correlated with synthesis. Addition of salt to the culture, however, induced three phases of changes in mRNA synthesis and decay. During the initial shock phase, there is a global repression of synthesis and a reduction of decay of most mRNAs. The subsequent induction phase involves strongly increased synthesis of stress mRNAs, which are also destabilized. Finally, the recovery phase restores decay rates, but leaves synthesis rates altered, apparently to allow for cellular growth under the new conditions. DTA shows a higher sensitivity and better temporal resolution than classical methods such as transcriptomics. Also, DTA is non‐perturbing and allows for an unbiased monitoring of genomic regulatory systems in living cells. Previously used methods are invasive and likely alter cellular physiology and thereby mRNA dynamics. DTA has a high potential to become a standard technique in molecular biology that may replace standard transcriptomics to study gene regulatory systems. In the future, DTA may be used to study dynamic changes in cellular mRNA metabolism induced by chemical inhibitors or defined mutations or changes in the environment. Rates of mRNA synthesis and decay can be measured on a genome‐wide scale in yeast by dynamic transcriptome analysis (DTA), which combines non‐perturbing metabolic RNA labeling with dynamic kinetic modeling. DTA reveals that most mRNA synthesis rates are around several transcripts per cell and cell cycle, and most mRNA half‐lives range around a median of 11 min. DTA realistically monitors the cellular response to osmotic stress with higher sensitivity and temporal resolution than transcriptomics, and can be used to follow changes in RNA metabolism in gene regulatory systems.
The expression level of a gene is often used as a proxy for determining whether the protein or RNA product is functional in a cell or tissue. Therefore, it is of fundamental importance to understand the global distribution of gene expression levels, and to be able to interpret it mechanistically and functionally. Here we use RNA sequencing (RNA‐seq) of mouse Th2 cells, coupled with a range of other techniques, to show that all genes can be separated, based on their expression abundance, into two distinct groups: one group comprised of lowly expressed and putatively non‐functional mRNAs, and the other of highly expressed mRNAs with active chromatin marks at their promoters. These observations are confirmed in many other microarray and RNA‐seq data sets of metazoan cell types. The authors show that genes can be separated into distinct low or high expression abundance groups. Histone marks reveal that this switch‐like transition from low to high expression goes hand‐in‐hand with a change in chromatin status.
Systems biology relies on data sets in which the same group of proteins is consistently identified and precisely quantified across multiple samples, a requirement that is only partially achieved by current proteomics approaches. Selected reaction monitoring (SRM)—also called multiple reaction monitoring—is emerging as a technology that ideally complements the discovery capabilities of shotgun strategies by its unique potential for reliable quantification of analytes of low abundance in complex mixtures. In an SRM experiment, a predefined precursor ion and one of its fragments are selected by the two mass filters of a triple quadrupole instrument and monitored over time for precise quantification. A series of transitions (precursor/fragment ion pairs) in combination with the retention time of the targeted peptide can constitute a definitive assay. Typically, a large number of peptides are quantified during a single LC‐MS experiment. This tutorial explains the application of SRM for quantitative proteomics, including the selection of proteotypic peptides and the optimization and validation of transitions. Furthermore, normalization and various factors affecting sensitivity and accuracy are discussed.