Detecting cell type-specific expression quantitative trait loci using single-cell RNA sequencing of lung tissue: comparison across genotyping and imputation strategies
Cell type-specific expression quantitative trait loci (ct-eQTLs) reveal cellular mechanisms of disease by linking gene expression to genotypes. Genotypes are typically called from whole genome sequencing (WGS) data and then imputed to infer missing or inaccurate genotypes. However, it is unclear whether genotype calling and imputation using single-cell RNA-sequencing (scRNA-seq) produces accurate enough genotype data for ct-eQTL mapping. Here, we compared genotyping on samples from interstitial lung disease (ILD) patients and controls across four datasets: low-pass WGS and scRNA-seq data, with and without imputation. Imputation increased the number of genotyped sites both across and within samples, including several sites known to be associated with ILD. Without imputation, the variable sites recovered by direct genotyping on WGS and scRNAseq data showed little overlap (10.5%). However, imputed variable sites were largely shared (56.4%) between sequencing platforms. All datasets were highly concordant when calling the reference genotype. However, lower concordance when calling non-reference genotypes, particularly when comparing imputed WGS and scRNA-seq, suggests that imputing from scRNA-seq may reduce genotyping accuracy. As a future direction, we will compare these four datasets to determine whether genotypes called from imputed scRNA-seq will be sufficient to identify ILD-associated ct-eQTLs. This opens up new avenues for ct-eQTL analysis on the many publicly available scRNA-seq datasets without corresponding WGS data.