BRB-ArrayTools is an integrated software system for the comprehensive analysis of DNA microarray experiments. It was developed by professional biostatisticians experienced in the design and analysis of DNA microarray studies and incorporates methods developed by leading statistical laboratories. The software is designed for use by biomedical scientists who wish to have access to state-of-the-art statistical methods for the analysis of gene expression data and to receive training in the statistical analysis of high dimensional data. The software provides the most extensive set of tools available for predictive classifier development and complete cross-validation. It offers extensive links to genomic websites for gene annotation and analysis tools for pathway analysis. An archive of over 100 datasets of published microarray data with associated clinical data is provided and BRB-ArrayTools automatically imports data from the Gene Expression Omnibus public archive at the National Center for Biotechnology Information.
Motivation Triple-negative breast cancer (TNBC) is a heterogeneous breast cancer group, and identification of molecular subtypes is essential for understanding the biological characteristics and clinical behaviors of TNBC as well as for developing personalized treatments. Based on 3,247 gene expression profiles from 21 breast cancer data sets, we discovered six TNBC subtypes from 587 TNBC samples with unique gene expression patterns and ontologies. Cell line models representing each of the TNBC subtypes also displayed different sensitivities to targeted therapeutic agents. Classification of TNBC into subtypes will advance further genomic research and clinical applications. Result We developed a web-based subtyping tool TNBCtype for candidate TNBC samples using our gene expression meta data and classification methods. Given a gene expression data matrix, this tool will display for each candidate sample the predicted subtype, the corresponding correlation coefficient, and the permutation P-value. We offer a user-friendly web interface to predict the subtypes for new TNBC samples that may facilitate diagnostics, biomarker selection, drug discovery, and the more tailored treatment of breast cancer.
Therapeutic resistance arises as a result of evolutionary processes driven by dynamic feedback between a heterogeneous cell population and environmental selective pressures. Previous studies have suggested that mutations conferring resistance to epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKI) in non-small-cell lung cancer (NSCLC) cells lower the fitness of resistant cells relative to drug-sensitive cells in a drug-free environment. Here, we hypothesize that the local tumor microenvironment could influence the magnitude and directionality of the selective effect, both in the presence and absence of a drug. Using a combined experimental and computational approach, we developed a mathematical model of preexisting drug resistance describing multiple cellular compartments, each representing a specific tumor environmental niche. This model was parameterized using a novel experimental dataset derived from the HCC827 erlotinib-sensitive and -resistant NSCLC cell lines. We found that, in contrast to in the drug-free environment, resistant cells may hold a fitness advantage compared to parental cells in microenvironments deficient in oxygen and nutrients. We then utilized the model to predict the impact of drug and nutrient gradients on tumor composition and recurrence times, demonstrating that these endpoints are strongly dependent on the microenvironment. Our interdisciplinary approach provides a model system to quantitatively investigate the impact of microenvironmental effects on the evolutionary dynamics of tumor cells.
Summary OmicCircos is an R software package used to generate high-quality circular plots for visualizing genomic variations, including mutation patterns, copy number variations (CNVs), expression patterns, and methylation patterns. Such variations can be displayed as scatterplot, line, or text-label figures. Relationships among genomic features in different chromosome positions can be represented in the forms of polygons or curves. Utilizing the statistical and graphic functions in an R/Bioconductor environment, OmicCircos performs statistical analyses and displays results using cluster, boxplot, histogram, and heatmap formats. In addition, OmicCircos offers a number of unique capabilities, including independent track drawing for easy modification and integration, zoom functions, link-polygons, and position-independent heatmaps supporting detailed visualization. Availability and Implementation OmicCircos is available through Bioconductor at http://www.bioconductor.org/packages/devel/bioc/html/OmicCircos.html. An extensive vignette in the package describes installation, data formatting, and workflow procedures. The software is open source under the Artistic—2.0 license.
Tijana MilenkoviÄ1,2, Weng Leong Ng2, Wayne Hayes2,3 and Nataša PrÅ¾ulj11Department of Computing, Imperial College London SW7 2AZ, UK. 2Department of Computer Science, University of California, Irvine, CA 92697-3435, USA. 3Department of Mathematics, Imperial College London SW7 2AZ, UK. AbstractImportant biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology.
Tumor immunoscoring is rapidly becoming a universal parameter of prognosis, and T-cells isolated from tumor masses are used for ex vivo amplification and readministration to patients to facilitate an antitumor immune response. We recently exploited the cancer genome atlas (TCGA) RNASeq data to assess T-cell receptor (TcR) expression and, in particular, discovered strong correlations between major histocompatibility class II (MHCII) and TcR-α constant region expression levels. In this article, we describe the results of searching TCGA exome files for TcR-α V-regions, followed by searching the V-region datasets for TcR-α-J regions. Both primary and metastatic breast cancer sample files contained recombined TcR-α V-J regions, ranging in read counts from 16–39, at the higher level. Among four such V-J rearrangements, three were productive rearrangements. Rearranged TcR-α V–J regions were also detected in TCGA–bladder cancer, -lung cancer, and -ovarian cancer datasets, as well as exome files representing bladder cancer, in Moffitt Cancer Center patients. These results suggest that a direct search of commonly available, conventional exome files for rearranged TcR segments could play a role in more sophisticated immunoscoring or in identifying particular T-cell clones and TcRs directed against tumor antigens.
The polyphenol plant extracts have previously been demonstrated to act as chemopreventive and anticancer agents. is a rich source of polyphenols, yet its antioxidant and anticancer activities remain poorly characterized. This study aimed to determine the anticancer activity of leaf and fruit extracts by investigating their impact on proliferation, apoptosis, and Huh7it cell necrosis. Leaves and fruits were extracted using methanol, and the phytochemical contents were analyzed using Fourier-transform infrared spectroscopy. The antioxidant activity was measured using the 2,2-diphenyl-1-picrylhydrazyl method. Anticancer activities were examined through MTT (3-(4,5-dimethylthiazol-2yl)-5-(3-carboxymethoxyphenyl)-2-(4-sulfophenyl)-2H-tetrazolium) assay on Huh7it liver cancer cells. The apoptosis and necrosis conditions were examined using Annexin biomarkers V-PI and later analyzed in flow cytometry. leaves and fruit examined were found to have strong antioxidant activities with IC values of 7.9875 µg/mL and 13.402 µg/mL, respectively. MTT assay results indicated leaves and fruit had IC values >653 μg/mL and >2000 μg/mL, respectively. The flow cytometry analysis indicated a higher percentage of Huh7it apoptosis and necrosis in leaf extracts compared with fruit extracts. The difference in anticancer activity was attributed to differing compounds present in each extract.
Visual analytics and visualisation can leverage the human perceptual system to interpret and uncover hidden patterns in big data. The advent of next-generation sequencing technologies has allowed the rapid production of massive amounts of genomic data and created a corresponding need for new tools and methods for visualising and interpreting these data. Visualising genomic data requires not only simply plotting of data but should also offer a decision or a choice about what the message should be conveyed in the particular plot; which methodologies should be used to represent the results must provide an easy, clear, and accurate way to the clinicians, experts, or researchers to interact with the data. Genomic data visual analytics is rapidly evolving in parallel with advances in high-throughput technologies such as artificial intelligence (AI) and virtual reality (VR). Personalised medicine requires new genomic visualisation tools, which can efficiently extract knowledge from the genomic data and speed up expert decisions about the best treatment of individual patient's needs. However, meaningful visual analytics of such large genomic data remains a serious challenge. This article provides a comprehensive systematic review and discussion on the tools, methods, and trends for visual analytics of cancer-related genomic data. We reviewed methods for genomic data visualisation including traditional approaches such as scatter plots, heatmaps, coordinates, and networks, as well as emerging technologies using AI and VR. We also demonstrate the development of genomic data visualisation tools over time and analyse the evolution of visualising genomic data.
In diffuse large B-cell lymphoma (DLBCL), predictive modeling may contribute to targeted drug development by enrichment of the study populations enrolled in clinical trials of DLBCL investigational drugs to include patients with lower likelihood of responding to standard of care. In clinical practice, predictive modeling has the potential to optimize therapy choices in DLBCL. The objectives of this study were to create a model for predicting health outcomes in patients with DLBCL treated with standard of care and determine informative predictors of health outcomes for patients with DLBCL. This was a retrospective observational study using data extracted from the IMS Health Database between September 2007 and April 2015. Patients were ⩾18 years of age with a DLBCL diagnosis. The index date was the date of the first DLBCL diagnosis. Patients were followed until outcome occurrence, defined as progression to a later line of therapy after ⩾60 days from the end of a previous therapy or stem cell transplantation. Patients were categorized into three cohorts depending on the post-index observation period: ⩽1 year, ⩽3 years, or ⩽5 years. Lasso logistic regression (LASSO), Naive Bayes, gradient-boosting machine (GBM), random forest (RF), and neural network models were performed for each cohort. The best-performing algorithms were predictive models based on GBM and observation periods ⩽1 and ⩽3 years after index date. Informative predictors included myocardial imaging, DLBCL stage IV, bronchiolar and renal disease, a chemotherapy regimen, and exposure to diphenhydramine and vasoprotectives on or before the first DLBCL diagnosis. These predictive models may be applied to targeted drug development and have the potential to optimize therapy choices in DLBCL. They were generated efficiently using a large number of independent variables readily available in standard insurance claims or electronic health record data systems.
Neuroblastoma is a pediatric cancer of the developing sympathetic nervous system. High-risk neuroblastoma patients typically undergo an initial remission in response to treatment, followed by recurrence of aggressive tumors that have become refractory to further treatment. The need for biomarkers that can select patients not responding well to therapy in an early phase is therefore needed. In this study, we used next generation sequencing technology to determine the expression profiles in high-risk neuroblastoma cell lines established before and after therapy. Using partial least squares-discriminant analysis (PLS-DA) with least absolute shrinkage and selection operator (LASSO) and leave-one-out cross-validation, we identified a panel of 55 messenger RNAs and 17 long non-coding RNAs (lncRNAs) which were significantly altered in the expression between cell lines isolated from primary and recurrent tumors. From a neuroblastoma patient cohort, we found 20 of the 55 protein-coding genes to be differentially expressed in patients with unfavorable compared with favorable outcome. We further found a twofold increase or decrease in hazard ratios in these genes when comparing patients with unfavorable and favorable outcome. Gene set enrichment analysis (GSEA) revealed that these genes were involved in proliferation, differentiation and regulated by Polycomb group (PcG) proteins. Of the 17 lncRNAs, 3 upregulated ( ) and 3 downregulated lncRNAs ( ) were also found to be differentially expressed in favorable compared with unfavorable outcome. Moreover, using expression profiles on both miRNAs and mRNAs in the same cohort of cell lines, we found 13 downregulated and 18 upregulated experimentally observed miRNA target genes targeted by and , - , respectively. The advantage of analyzing biomarkers in a clinically relevant neuroblastoma model system enables further studies on the effect of individual genes upon gene perturbation. In summary, this study identified several genes, which may aid in the prediction of response to therapy and tumor recurrence.
Cancer chemotherapy dose schedules are conventionally applied intermittently, with dose duration of the order of hours, intervals between doses of days or weeks, and cycles repeated for weeks. The large number of possible combinations of values of duration, interval, and lethality has been an impediment to empirically determine the optimal set of treatment conditions. The purpose of this project was to determine the set of parameters for duration, interval, and lethality that would be most effective for treating early colon cancer. An agent-based computer model that simulated cell proliferation kinetics in normal human colon crypts was calibrated with measurements of human biopsy specimens. Mutant cells were simulated as proliferating and forming an adenoma, or dying if treated with cytotoxic chemotherapy. Using a high-performance computer, a total of 28 800 different parameter sets of duration, interval, and lethality were simulated. The effect of each parameter set on the stability of colon crypts, the time to cure a crypt of mutant cells, and the accumulated dose was determined. Of the 28 800 parameter sets, 434 parameter sets were effective in curing the crypts of mutant cells before they could form an adenoma and allowed the crypt normal cell dynamics to recover to pretreatment levels. A group of 14 similar parameter sets produced a minimal time to cure mutant cells. A different group of nine similar parameter sets produced the least accumulated dose. These parameter sets may be considered as candidate dose schedules to guide clinical trials for early colon cancer.
MicroRNAs (miRNAs) are endogenous 22-nucleotide RNAs that can play a fundamental regulatory role in the gene expression of various organisms. Current research suggests that miRNAs can assume pivotal roles in carcinogenesis. In this article, through bioinformatics mining and computational analysis, we determine a single miRNA commonly involved in the development of breast, cervical, endometrial, ovarian, and vulvar cancer, whereas we underline the existence of 7 more miRNAs common in all examined malignancies with the exception of vulvar cancer. Furthermore, we identify their target genes and encoded biological functions. We also analyze common biological processes on which all of the identified miRNAs act and we suggest a potential mechanism of action. In addition, we analyze exclusive miRNAs among the examined malignancies and bioinformatically explore their functionality. Collectively, our data can be employed in in vitro assays as a stepping stone in the identification of a universal machinery that is derailed in female malignancies, whereas exclusive miRNAs may be employed as putative targets for future chemotherapeutic agents or cancer-specific biomarkers.
We recently reviewed the current progress in the use of high-throughput molecular "omics" data for the quantitative analysis of molecular pathway activation. These quantitative metrics may be used in many ways, and we focused on their application as tumor biomarkers. Here, we provide an update of the most recent conceptual findings related to pathway analysis in tumor biology, which were not included in the previous review. The major novelties include a method enabling calculation of pathway-scale tumor mutation burden termed "Pathway Instability" and its application for scoring of anticancer target drugs. A new technique termed Shambhala emerged that enables accurate common harmonization of any number of gene expression profiles obtained using any number of experimental platforms. This may be helpful for merging various gene expression data sets and for comparing their pathway activation characteristics. Another recent bioinformatics method, termed FLOating-Window Projective Separator (FloWPS), has the potential to significantly enhance the value of pathway activation profiles as biomarkers of cancer response to treatments. It reduces the minimum required number of training samples needed to construct a machine-learning-based classifier. Finally, several documented clinical cases have been recently published, in which gene-expression-based pathway analysis was successfully used for personalized off-label prescription of target drugs to metastatic cancer patients.
KRAS-activation mutations occur in 25% to 40% of lung adenocarcinomas and are a known mechanism of epidermal growth factor receptor inhibitor (EGFRI) resistance. There are currently no targeted therapies approved specifically for the treatment of KRAS-active non-small cell lung cancers (NSCLC). Attempts to target mutant KRAS have failed in clinical studies leaving no targeted therapy option for these patients. To circumvent targeting KRAS directly, we hypothesized that targeting proteins connected to KRAS function rather than targeting KRAS directly could induce cell death in KRAS-active NSCLC cells. To identify potential targets, we leveraged 2 gene expression data sets derived from NSCLC cell lines either resistant and sensitive to EGFRI treatment. Using a Feasible Solutions Algorithm, we identified genes with deregulated expression in KRAS-active cell lines and used STRING as a source for known protein-protein interactions. This process generated a network of 385 deregulated proteins including KRAS and other known mechanisms of EGFRI resistance. To identify candidate drug targets from the network for further study, we selected proteins with the greatest number of connections within the network and possessed an enzymatic activity that could be inhibited with an existing pharmacological agent. Of the potential candidates, the pharmacological impact of targeting casein kinase 2 (CK2) as a single target was tested, and we found a modest reduction in viability in KRAS-active NSCLC cells. MEK was chosen as a second target from outside the network because it lies downstream of KRAS and MEK inhibition can overcome resistance to CK2 inhibitors. We found that CK2 and MEK inhibition demonstrates moderate synergy in inducing apoptosis in KRAS-active NSCLC cells. These results suggest promise for a combination inhibitor strategy for treating KRAS-active NSCLC.
Machine learning (ML) is a useful tool for advancing our understanding of the patterns and significance of biomedical data. Given the growing trend on the application of ML techniques in precision medicine, here we present an ML technique which predicts the likelihood of complete remission (CR) in patients diagnosed with acute myeloid leukemia (AML). In this study, we explored the question of whether ML algorithms designed to analyze gene-expression patterns obtained through RNA sequencing (RNA-seq) can be used to accurately predict the likelihood of CR in pediatric AML patients who have received induction therapy. We employed tests of statistical significance to determine which genes were differentially expressed in the samples derived from patients who achieved CR after 2 courses of treatment and the samples taken from patients who did not benefit. We tuned classifier hyperparameters to optimize performance and used multiple methods to guide our feature selection as well as our assessment of algorithm performance. To identify the model which performed best within the context of this study, we plotted receiver operating characteristic (ROC) curves. Using the top 75 genes from the -nearest neighbors algorithm (K-NN) model ( = 27) yielded the best area-under-the-curve (AUC) score that we obtained: 0.84. When we finally tested the previously unseen test data set, the top 50 genes yielded the best AUC = 0.81. Pathway enrichment analysis for these 50 genes showed that the guanosine diphosphate fucose (GDP-fucose) biosynthesis pathway is the most significant with an adjusted value = .0092, which may suggest the vital role of -glycosylation in AML.