Optimizing TGen’s molecular barcoding to include unique molecular identifiers
Current next-generation sequencing technology uses molecular barcoding during library preparations to associate data with its correct sample. These barcode sequences, or indexes, exist within an adapter molecule, which links DNA fragments to the sequencing flow cell, a glass slide where sequencing chemistry is performed. Some barcodes also include a unique molecular identifier (UMI), which aids in the bioinformatic analysis of sequencing data. This additional sequence may potentially reduce library preparation efficiencies because it creates a longer adapter, however, it can save useful data that would otherwise be discarded. To investigate which option is better, we wanted to compare data gathered from DNA libraries made with and without barcodes containing a UMI. We designed adapters containing 4 different indexes and UMIs with over 2,000 different sequence options. The maximum amount of possibilities for an 8 sequence UMI is about 65,000; however, this brings a greater chance of having the same nucleotide more than twice in a row, thus increasing chances of a sequencing error. Libraries were created for more than 8 batches of samples using genomic DNA. Ideally, a DNA library has an efficiency around 45-50%. It is expected that the adapters containing the UMI will yield lower library efficiencies than the in-house, optimized, short adapters. Comparing sequencing results will determine if the greater retention of data gained by using UMIs outweighs the benefit of higher library efficiencies.