Megan Johnson
Megan Johnson
Megan Johnson
Helios Scholar
School: Gary K. Herberger Young Scholars Academy
Hometown: Phoenix, Arizona
Mentor: Rebecca Halperin, Ph.D.

Email This Article Print This Page

Optimizing LumosVar performance with preprocessing tools

Somatic variants are acquired genetic mutations that can significantly impact cell function. Cancer researchers have identified many somatic mutations, some of which directly cause cancer or resistance to certain therapies. This information is now being introduced in a clinical setting to tailor cancer treatment to the individual. Therefore, the ability to accurately identify somatic variants is essential for the progression of genomics research and for making individualized medicine a reality. Currently, variants are identified using various software pipelines, which generally take sequencing data, align it, run preprocessing tools, and then pass the output into a variant caller, which outputs locations on the genome where it believes there are variants. Many preprocessing tools are available and intend to improve caller performance by adjusting the aligned sequence data before it is run through the caller. LumosVar is a variant caller with unique features such as multi-sample and tumor-only calling. However, there lacks a standard evaluation on its performance with matched pair calling, and the extent to which preprocessing methods impact it.  To test this, genomic data was run through two preprocessing tools, GATK Indel Realigner and GATK Base Recalibrator, and then run through LumosVar. Other caller specific variables were adjusted and tested as well, and the output was compared with the COLO829 somatic variant truth set. Both preprocessing tools improved caller performance, with LumosVar calling 43 more variants in the Base Recalibration data and four additional variants from the Indel Realignment data compared to the control. In addition, it was found that using different numbers of unmatched controls had minimal impact, although surprisingly using only five controls improved performance. As a result, we know that Base Recalibrator is the more effective preprocessing tool, and that the caller can be run with fewer controls, reducing run time without sacrificing accuracy.