09 — Single-Cell RNA-seq#
Scale transcriptome analysis to individual cells using droplet-based single-cell RNA sequencing (scRNA-seq). This workflow demonstrates a high-throughput pipeline for processing thousands of cells per sample, including barcode demultiplexing, quantification, and downstream clustering.
Concepts Covered
- Droplet-based scRNA-seq processing (10x Genomics style)
- Custom rule templates for repetitive preprocessing
- High-concurrency scatter across cell barcodes
- Resource-intensive alignment with splice-aware aligners
- Integration with specialized scRNA-seq R/Python environments
Pipeline Overview#
graph TD
A[cellranger_count] --> B[clustering_analysis]
B --> C[trajectory_inference]
C --> D[generate_sc_report]
Steps:
- Quantification — Align reads to transcriptome and count UMI/barcodes (e.g., CellRanger)
- Analysis — Quality control, normalization, and cell clustering (e.g., Seurat/Scanpy)
- Inference — Developmental trajectory and cell type identification
- Report — Generate an interactive single-cell analysis report
Workflow Definition#
# examples/gallery/09_single_cell_rnaseq.oxoflow
[workflow]
name = "sc-rnaseq-pipeline"
version = "1.0.0"
description = "Single-cell RNA-seq pipeline: CellRanger + Seurat"
author = "oxo-flow examples"
[config]
reference = "/data/references/GRCh38/cellranger_index"
samples = "sc_samples.csv"
[defaults]
threads = 8
memory = "32G"
[[rules]]
name = "cellranger_count"
input = ["raw/{sample}_R1.fastq.gz", "raw/{sample}_R2.fastq.gz"]
output = ["counts/{sample}/outs/filtered_feature_bc_matrix.h5"]
threads = 16
memory = "64G"
description = "scRNA-seq quantification with CellRanger"
shell = """
cellranger count --id={sample} \
--fastqs=raw/ \
--sample={sample} \
--transcriptome={config.reference} \
--localcores={threads} \
--localmem=60
"""
[rules.environment]
docker = "10xgenomics/cellranger:7.1.0"
[[rules]]
name = "clustering_analysis"
input = ["counts/{sample}/outs/filtered_feature_bc_matrix.h5"]
output = ["analysis/{sample}/seurat_object.rds", "analysis/{sample}/tsne_plot.png"]
threads = 4
memory = "16G"
description = "Cell clustering and visualization with Seurat"
shell = "Rscript scripts/seurat_analysis.R --input {input[0]} --output-dir analysis/{sample}/"
[rules.environment]
conda = "envs/seurat.yaml"
[[rules]]
name = "generate_sc_report"
input = ["analysis/{sample}/seurat_object.rds", "analysis/{sample}/tsne_plot.png"]
output = ["results/{sample}.sc_report.html"]
description = "Generate single-cell analysis report"
shell = "Rscript -e \"rmarkdown::render('templates/sc_report.Rmd', output_file='{output[0]}')\""
[rules.environment]
conda = "envs/rmarkdown.yaml"
Scientific Context#
Why Single-Cell?#
Traditional "bulk" RNA-seq measures the average expression across thousands of cells, masking biological heterogeneity. scRNA-seq reveals:
- Cellular Heterogeneity — Identify rare cell types and sub-populations
- Dynamic Processes — Trace cell differentiation and state transitions
- Spatial Resolution — Map cell types back to tissue architecture
Computational Challenges#
scRNA-seq workflows are significantly more resource-intensive than bulk RNA-seq:
- Memory Pressure — Alignment to large transcriptomes and UMI counting can require 64GB+ of RAM.
- Sparse Data — Downstream analysis handles sparse matrices with millions of entries (cells × genes).
- Environment Management — Often requires complex combinations of R (Seurat) and Python (Scanpy) tools.
Running the Workflow#
Validate#
$ oxo-flow validate examples/gallery/09_single_cell_rnaseq.oxoflow
✓ examples/gallery/09_single_cell_rnaseq.oxoflow — 3 rules, 2 dependencies
Further Reading#
- RNA-seq Quantification — Standard bulk RNA-seq pipeline
- Resource Management — How to handle memory-intensive steps on clusters
- Environment Backends — Using Docker and Conda together