Seurat analysis report of BD Rhapsody Targeted RNA-Seq data


Demultiplexing cells by cell hashing

For assignment of each tag to each cell barcode, read counts of each tag in each valid cell barcode, which defined by the cDNA matrix, were extracted from tag/cell barcode expression matrix. Unassigned cell barcodes were labeled as “not-detected” cells. Then, sum of the total read counts of each tags were normalized to 10M reads, and log2 fold-change between first most counted tags and second most counted tags within each cell barcode. Doublet cells (double-positive cells of any pair of Tags (identified by flowDensity package) and log2 fold-change between first and second most counted tags < 0.70479 were identified as doublets.

Finally, remained cell barcodes were assigned to the first most counted tags. Each tag expression in each cell barcode were log2(x+1)-transformed, z-scaled by each cell barcode, and visualized by using pheatmap package in R version 4.2.1 (2022-06-23).


Figure 1: Heatmap representation of tag expression and assignment in each cell barcode

Figure 2: Scatter plot of each tag expression and doublet identification

File locations

Demultiplexed plot was stored at ./result/AokitargetBD_results/plots/AokitargetBD_demultiplex.png. Demultiplexed expression matrix was stored at ./result/AokitargetBD_results/matrix/matrix_inflection_demulti_AokitargetBD.txt.gz.



Seurat analysis

Perform Seurat (version 4.1.0) analysis.
- https://satijalab.org/seurat/
Please see Seurat vignettes (PBMC3K tutorial) for detail explanation of the each step of Seurat analysis.
- https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
Entire Seurat object was saved by using qs package with .qs extension. Please use qs package to read .qs file with R.

We first calculating metrics (percent.mito, percent.ribosomal protein, percent.ribosomal RNA) and filtered out mitochondrial gene-high cells (over 0.25), doublets, and not-tag-assigned cells.
Because BD Rhapsody TAS-Seq data is non-UMI data (raw count data), we perform global normalization to 1M tags (according to the Tabula Muris (Nature 2019) parameter settings) and perform scaling by using Seurat ScaleData function with regressing out library size (total raw read count of each cell) as a confounding factor.


Basic plots of statistics

stats1

Figure 2. Density heatmap (pseudocolor) plot of statistics

stats2

Figure 3. Ridge plot, separate by Tags

File locations

Plot files were stored at ./result/AokitargetBD_results/Seurat/Seurat_plots/.


PCA and Jackstraw analysis

Next, PCA analysis was performed against all detected genes, because the data were targeted RNA-seq.

Figure 3-1. Jackstraw plot

1:11 PCs were selected (p<1e-05).

Jackstraw plot file was stored at ./result/AokitargetBD_results/Seurat/Seurat_plots/AokitargetBD_Jackstraw.png.


Clustering

Cell clustering was performed by using FindNeighbors and FindClusters function in Seurat v4.1.0 in R version 4.2.1 (2022-06-23).
1:11 PCs were used for clustering analysis.
Clustering resolution parameters were changed sequentially and evaluate clustering stability by using silhouette score against FIt-SNE space.

Seurat clustering silhouette score plot

We assumed that resolution 0.6 was optimal because mean silhouette score of resolution 0.6 was maximal among calculated resolutions.

Figure 4. Boxplot of silhouette scores of clustering in each resolution parameter.

Seurat clustering visualization in FIt-SNE plot

We use python implementation of FIt-SNE 1.1.1 (George C. Linderman et al Nat Methods 2019) through reticulate package in R version 4.2.1 (2022-06-23) for visualization of clustering result, because it is better than commonly-used UMAP method in terms of calculation speed and visualization. Tag assignment and Clustering results of each resolution parameter were shown separately in tabs.

Figure 5. FIt-SNE visualization of Seurat clustering result

optimal clustering result