Integrating Mutations with RNA Splice Junctions to Predict Novel Isoforms in Multiple Myeloma
An extensive amount of analysis goes into understanding the origin of genetic diseases, yet intronic and synonymous mutations are currently regarded as non-damaging and dismissed in mainstream genomic research practices. The purpose of this research project is to validate the integration of splice-site and synonymous mutations in order to predict whether a non-damaging mutation is leading to loss of gene function. The Multiple Myeloma Resarch Foundation (MMRF) Cohort provided a total of 1045 exome samples, and comes out to ~47GB of data to be processed and analyzed. This dataset is composed of individual mutations (2GB), as well as splicing events (45GB). To efficiently represent this large cohort, we utilize a noSQL MongoDB. The main approach to researching these variants (mutations) involves creating a splice-variant database to view which variants are causing alternative splicing. This database displays all the variants in each of the patient samples by-patient-by-gene, as well as any alternative splicing events caused by a specific type of variant. The database is pre-filtered for splice variants, so in order to find novel splicing events, all one needs to do is filter for records with any splicing events. So far, this approach appears promising, as many novel splicing events have now been captured. For instance, a splice-donor mutation in the TRAF3 tumor-suppressor gene has been found to be causing intron retention in multiple patients, and could be causing proteins to lose functionality. Our system can be easily adapted for any large cohorts and is not limited to Multiple Myeloma. Going forward, the splice-variant database approach appears to be promising, and should undergo more testing before complete integration to general genomic analysis.