Helios Scholar
School: Chaparral High School
Hometown: Paradise Valley, Arizona
Mentored by: Nick Banovich, Ph.D.

Automatic cell type annotations in single cell RNA sequencing

Single cell RNA sequencing (scRNAseq) is a powerful method to understand transcriptomic differences within heterogeneous tissues. The aim of this project is to establish a workflow to facilitate the cell type annotation process in scRNAseq. For scRNAseq analysis, the most important step is to precisely and accurately annotate the clusters by cell types using known cell type specific markers. However, this process requires several hours of careful examination, and has to be repeated each time new samples are integrated into the data set. To solve this problem, we examined two different tools developed by Monocle and Seurat, the two most popular packages used in scRNAseq analysis, to determine which tool performs better. Monocle’s package Garnett uses a marker file containing the cell type and corresponding marker genes. Seurat’s function TransferAnchors uses a manually annotated reference file to assign cell type annotation on a query dataset based on transcriptional correspondence. We used the same data set containing roughly 25,000 cells and looked at three different resolutions (4, 11, and 31 clusters) to get an adequate range of data for comparison. For Garnett, the number of unknown cells increased when the resolution increased, with 64% of the cells being assigned as unknown when testing 31 markers for 31 distinct cell types, it also incorrectly labeled cell types. However, TransferAnchors performed favorably to Garnett by producing consistent results and successfully assigning all cell types at all three resolutions. While Garnett did not successfully annotate the cell types assigned, it may be beneficial to use it as a starting point for new projects. TransferAnchors requires a manually annotated file to work on, thus the  suggested workflow would be to manually annotate the first batch of samples and then use TransferAnchors for later analysis of more samples.