©2019 by Gregory Way

Supervised Machine Learning to Detect Aberrant Genes and Pathways in Cancer

From a systems biology perspective, gene expression data can be used as a measure of the general state of a biological entity, such as a cell, tissue, or organism. We leverage this property of gene expression data to identify when specific genes or pathways are disrupted.

Alterations in specific genes and pathways drive oncogenesis. We use the downstream transcriptome response to detect when specific alterations are present in a given tumor.

We developed this approach in collaboration with The Cancer Genome Atlas (TCGA) PanCanAtlas Project.

Detecting Ras Pathway Activation

The Ras pathway is a complex system of interacting genes and proteins that together regulate cellular proliferation. The pathway is frequently altered in cancer by various mechanisms including driver mutations in KRAS, HRAS, or NRAS, and loss of function mutations or copy number deletions in NF1.

We hypothesized that we could use the transcriptome to detect when tumors had Ras pathway activations. We trained the classifier on tumors from The Cancer Genome Atlas PanCanAtlas project. Critically, the classifier generalized to never-before-seen cell lines (right). The classifier could predict cell lines without Ras mutations that were sensitive to MEK inhibitors.

We describe this approach and results in Way et al. 2018. The source code is publicly available at https://github.com/greenelab/pancancer.

We also applied a similar approach to a gene expression dataset of Multiple Myeloma (MM). The dataset comes from the Multiple Myeloma Research Foundation CoMMpass Study. We applied the classifier previously trained on TCGA tumors to MM, but it failed to generalize.


In collaboration with Arun Wiita's lab at UCSF, we developed a multiclass classifier within MM to specifically detect KRAS, NRAS or Ras wildtype tumors. Surprisingly, the classifier performed remarkably well in CoMMpass data (top left) and in a group of never-before-seen MM cell lines (bottom left).

We describe this approach and results in a recent preprint (Lin et al. 2019). The source code is publicly available at https://github.com/greenelab/multiple-myeloma-classifier

We initially developed this approach in collaboration with Yolanda Sanchez's lab at Dartmouth to detect glioblastoma patients with NF1 loss of function. The Sanchez lab has previously discovered a small molecule that is effective in cell lines with NF1 loss of function. However, loss of NF1 is impossible to detect from DNA sequence data alone.

We described a machine learning approach to detect NF1 loss of function in glioblastoma patient derived xenograft (PDX) models (right).

The approach and results are presented in Way et al. 2017. The source code is publicly available at https://github.com/greenelab/nf1_inactivation.