Identifying pan-cancer subgroups for radiogenomic analysis
Radiogenomic analysis is a promising new framework that attempts to correlate genomics data with imaging features extracted from CT scans, MRI, and other clinical imaging types. The goal of this study is to leverage common cancer characteristics in order to identify pan-cancer features that can be used for downstream radiogenomic analysis. In order to elucidate the relationships between multi-omic data across cancer types, a functional understanding of the common traits that govern cancer cell behavior needs to be integrated into the preliminary genomic analysis. To accomplish this, we first constructed a custom list of genes using gene ontologies for each of the known hallmarks of cancer, a set of well-documented cancer attributes. In addition to the gene sets, RNA-seq data for a total of 2463 samples were gathered from the TCGA database for the following three cancer types: breast, prostate, and brain. Once the necessary data preparations had been completed, the data was analyzed using a gene set enrichment method known as Gene Set Variation Analysis (GSVA). The GSVA algorithm is a nonparametric, unsupervised method that estimates variation of pathway activity over a sample population. The final results of the GSVA analysis revealed certain remarkable patterns across cancer types and hallmarks. Integrating the GSVA scores with clinical stage and grade data for each of the cancer types also unveiled a positive correlation between higher grades and sustained proliferative signaling. These observable patterns appear to be pan-cancer in nature, which may enable us to incorporate them into a radiogenomics framework in the near future.