Dimensionality reduction in single cell RNA-seq analysis
Single cell mRNA sequencing (scRNA-seq) is a novel tool in genomics research. scRNA-seq is used to measure the gene expression of individual cells in a given sample, which makes it ideal for the analysis of inter-cellular differences within a complex tissue, such as cancer. The data collected from scRNA-seq is sparse, as most genes within any cell are not sampled, which presents unique statistical challenges for further downstream analyses. The goal of this experiment was to implement a newly-introduced dimensionality-reduction algorithm for use in scRNA-seq and compare it to commonly used dimensionality algorithms in scRNA-seq. Data for this experiment were peripheral blood mononuclear cells (PBMC) taken from one healthy individual. A healthy donor was important for this experiment because we are not expecting to find any anomalies in the counts for each cell type. Furthermore, PBMCs are a good benchmark for scRNA-seq analysis since they are a widely studied sample that is known to contain eight to nine distinct cell types. After the sample was sequenced, we analyzed the scRNA data using three different dimensionality reduction algorithms: principle component analysis (PCA), t-distributed stochastic neighbor embedding (tSNE), and diffusion component analysis (DC). We found that each algorithm has unique strengths and weaknesses for latent space representation. For example, tSNE will identify clusters of cells even if no clusters truly exist in the dataset. We visualized PBMC data for each algorithm and compared the resulting plots for each of the three methods. We found that DC has the ability to show differential cellular trajectories, which would be particularly useful in cancer research to aid the tracking of gene expression in a tumor throughout the disease course. Furthermore, this new implementation of DC can be utilized to examine data from past scRNA experiments to possibly uncover previously missed genetic relationships.