Optimizing tools for genomics data: enhancing the integrative genomics viewer with Java functionality for next-generation sequencing
The emergence of next-generation sequencing (NGS) has ushered in the development of genomic tools that enable the visualization of annotations across the genome. An example of such a tool is the Integrative Genomics Viewer (IGV), developed by the Broad Institute, which allows for the visualization of genomic data with greater resolution in grouping, sorting, and filtering options for bioinformatics datasets.
However, the IGV contains some features that are still limited and unrefined. For example, users must manually load files and scale tracks on the IGV application for multiple loci or for multiple samples, making visualization extremely inefficient. Additionally, IGV requires the creation of a batch file that accepts only one specific input, creating inflexibility in its acceptance of different input files. To address these limitations, this project sought to enhance and optimize the current features available in IGV by 1) accommodating different data inputs with varying syntax and 2) by developing an automated snapshot of visualized genomic data in a graphical user interface (GUI) format with greater customizability and accessibility. An architecture class socket was created to integrate IGV’s options into a customizable Java program, and a file input class was designed to read in varying comma separated values (CSV) file inputs. To prevent the IGV from overloading too many binary format sequencing (BAM) files, a file reader class was created to process up to 10 BAMs for optimized data viewing on a standard monitor. Additional classes were created to increase user customizability, including a method that condenses the IGV-generated snapshots into a single PDF or Excel-formatted report.
By generating many reports from various file inputs, the optimized version of this tool possesses several advantages including automation for greater speed, less time spent manually verifying NGS data, customizability for easier interaction with the IGV from a Java Program, and accommodation of various input file formats. Lastly, the development of the GUI interface pairs the visualization of the IGV with the versatility of the Java platform, paving the way for more expedient genomic data analysis for both biologists and bioinformaticians alike.