Siddharth Rawal
Siddharth Rawal
Siddharth Rawal
Helios Scholar
School: Arizona State University
Hometown: Chandler, Arizona
Mentor: William Hendricks, Ph.D.
Email This Article Print This Page

Automating the Detection of Structural and Copy Number Variations in Canine Cancer Genomic Data

Recent advances in high-throughput sequencing technologies have made it possible to detect potential disease-causing variants from tumor samples on a large scale using standard workflows/pipelines. A cancer genomic pipeline consists of tools that detect three major types of aberrations: Single Nucleotide Variations (SNVs), Structural Variations (SVs), and Copy Number Variations (CNVs). These tools are constantly being developed, updated and tested to analyze human cancer samples. For studying cancers in model organisms, like canines, there is a lack of a standardized pipeline due to a dearth of canine-compatible tools. The Canis pipeline, used at TGen to analyze canine cancer genomic data, has been reconstructed from the human pipeline. It processes raw sequencing data and detects only SNVs. However, CNV and SV detection tools in TGen’s cancer genomics pipeline are hard-coded for humans and hence cannot be used for canine data. Delly, a species-agnostic SV caller, and a modified version of tCoNuT, an in-house developed CNV detection tool, have been tested with canine cancer samples and are now being used outside the pipeline. Running these tools separately for large datasets requires significant commitments of time and manual effort. Therefore, we have created a script that automates SV and CNV calling from a batch of canine cancer genomic data. This wrapper script runs the two tools, annotates the resulting variants, and produces the output in an organized fashion for every canine patient sample in a dataset. The wrapper script analyzes multiple patient samples simultaneously, resulting in a reduction in hands-on time. This script was tested on a batch of eight whole-exome-sequenced canine cancer samples, and cumulatively used approximately 11 hours to generate results for all samples. As the next step, we are investigating potential tools for calling gene fusions from transcriptomic data, a feature that is currently absent in the Canis pipeline. Filling in these gaps in canine genomic and transcriptomic pipelines will enable us to study canine cancers completely and more efficiently, thereby helping to identify specific aberrations that may play a role in development and progression of the disease.