Cell Ranger — 10x Genomics单细胞数据处理流程¶

一句话说明¶

Cell Ranger（v10.0）是 10x Genomics 官方的数据处理套件——把原始测序FASTQ文件转化成单细胞的"基因-细胞计数矩阵"，就像把乱码的测序数据翻译成"每个细胞表达了哪些基因，各表达多少"的标准格式。

安装与配置¶

# 下载Cell Ranger（需在10x官网注册下载，v10.0）
# 下载地址：https://www.10xgenomics.com/support/software/cell-ranger/downloads
wget https://cf.10xgenomics.com/releases/cell-exp/cellranger-10.0.0.tar.gz
tar -xzvf cellranger-10.0.0.tar.gz    # 解压

# 配置PATH（加入.bashrc以永久生效）
export PATH=/path/to/cellranger-10.0.0:$PATH

# 验证安装
cellranger --version    # 应显示 cellranger-10.0.0

# 下载参考基因组（10x预建参考，包含STAR索引）
# 人类基因组（GRCh38）
wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz
tar -xzvf refdata-gex-GRCh38-2024-A.tar.gz

# 小鼠基因组（mm10）
wget https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-mm10-2020-A.tar.gz
tar -xzvf refdata-gex-mm10-2020-A.tar.gz

核心用法¶

cellranger count — 单样本基因表达分析（最常用）¶

# 标准单细胞3' Gene Expression分析
cellranger count \
    --id=sample_3p \                       # 运行ID（输出目录名）
    --transcriptome=/path/to/refdata-gex-GRCh38-2024-A \  # 参考基因组
    --fastqs=/path/to/fastq_dir \          # FASTQ文件目录
    --sample=SampleName \                  # 样本名（FASTQ文件名前缀）
    --localcores=16 \                      # 本地CPU核心数
    --localmem=64                          # 本地内存（GB）

# 注意：v10.0已弃用cellranger mkfastq，改用Illumina BCL Convert

cellranger multi — 多功能联合分析（推荐用于新数据）¶

# v10.0推荐使用multi替代count（更灵活，支持多种库类型）
# 首先创建配置文件 multi_config.csv

cat > multi_config.csv << 'EOF'
[gene-expression]
reference,/path/to/refdata-gex-GRCh38-2024-A
chemistry,auto
expect-cells,5000

[libraries]
fastq_id,fastqs,feature_types
Sample_GEX,/path/to/fastqs,Gene Expression
Sample_FB,/path/to/fb_fastqs,Antibody Capture
EOF

# 运行multi
cellranger multi \
    --id=multi_run \
    --csv=multi_config.csv \
    --localcores=16 \
    --localmem=128

cellranger aggr — 多样本整合（合并多个run）¶

# 创建样本聚合配置文件
cat > aggr_config.csv << 'EOF'
sample_id,molecule_h5
sample1,/path/to/sample1/outs/molecule_info.h5
sample2,/path/to/sample2/outs/molecule_info.h5
sample3,/path/to/sample3/outs/molecule_info.h5
EOF

# 运行聚合（将多个样本合并为一个矩阵）
cellranger aggr \
    --id=combined_project \
    --csv=aggr_config.csv \
    --normalize=mapped              # 标准化方法：mapped（默认）

参数详解¶

命令	参数	说明
`count/multi`	`--transcriptome`	参考基因组目录
`count`	`--sample`	样本名（匹配FASTQ文件名前缀）
`count/multi`	`--fastqs`	FASTQ文件目录（可多个，逗号分隔）
`count`	`--expect-cells`	预期细胞数量（影响细胞过滤）
`count`	`--include-introns`	包含内含子reads（默认开启，用于未剪接RNA）
`count/multi`	`--localcores`	本地CPU核心数
`count/multi`	`--localmem`	本地内存限制（GB）
`aggr`	`--normalize`	标准化：`mapped`（深度）、`none`

实战案例¶

# 完整单细胞3'测序分析流程

# 数据结构（10x FASTQ命名约定）
# Sample_S1_L001_R1_001.fastq.gz  - Read1（barcode+UMI）
# Sample_S1_L001_R2_001.fastq.gz  - Read2（cDNA）

REF="/data/refdata-gex-GRCh38-2024-A"
FASTQ_DIR="/data/raw_fastqs"
SAMPLE="PBMC_3p"

# 1. 运行cellranger count
cellranger count \
    --id=${SAMPLE}_run \
    --transcriptome=$REF \
    --fastqs=$FASTQ_DIR \
    --sample=$SAMPLE \
    --expect-cells=5000 \          # 预期5000个细胞
    --localcores=24 \
    --localmem=128

# 2. 查看输出目录结构
ls ${SAMPLE}_run/outs/
# web_summary.html          - 质控报告（重要！先看这个）
# metrics_summary.csv       - 关键指标CSV
# filtered_feature_bc_matrix/  - 过滤后的细胞矩阵（用于下游Seurat分析）
#   barcodes.tsv.gz         - 细胞条形码列表
#   features.tsv.gz         - 基因列表
#   matrix.mtx.gz           - 稀疏计数矩阵
# raw_feature_bc_matrix/    - 未过滤的原始矩阵（含空液滴）
# molecule_info.h5           - 分子信息（aggr需要）
# possorted_genome_bam.bam  - 比对BAM文件

# 3. 查看关键质控指标（打开web_summary.html查看）
cat ${SAMPLE}_run/outs/metrics_summary.csv
# 关注：
# - Estimated Number of Cells: 检测到细胞数
# - Mean Reads per Cell: 每个细胞平均reads数（一般>25000算好）
# - Median Genes per Cell: 每个细胞基因数（PBMC一般1500-3000）
# - Sequencing Saturation: 测序饱和度（>70%表示测序深度足够）
# - Reads Mapped to Genome: 基因组比对率（>90%算好）

# 4. 用R/Seurat读入矩阵继续分析（见321号文档）
# 矩阵路径：${SAMPLE}_run/outs/filtered_feature_bc_matrix/

常见报错与解决¶

报错1：No input FASTQs were found for the given parameters - 原因：--sample参数与FASTQ文件名前缀不匹配 - 解决：检查FASTQ文件名ls $FASTQ_DIR/*.fastq.gz | head，--sample要匹配_S1_前面的部分

报错2：Memory: 128 GB required, only 64 GB available - 原因：内存不足 - 解决：增加--localmem值，或添加--disable-ui减少内存占用（pipestance模式）

报错3：Web summary显示Sequencing Saturation极低（<30%） - 原因：测序深度不足，通常需要每细胞≥20000 reads - 解决：需要增加测序量（追加测序）；或降低预期细胞数--expect-cells重新分析

速查表¶

命令	说明
`cellranger count`	单样本基因表达分析
`cellranger multi`	多功能联合分析（v10推荐）
`cellranger aggr`	多样本整合（合并矩阵）
`--transcriptome`	10x预建参考基因组目录
`--expect-cells`	预期细胞数（影响过滤）
`filtered_feature_bc_matrix/`	Seurat输入目录
`web_summary.html`	质控报告（先检查）
`metrics_summary.csv`	关键指标（细胞数、基因数等）
Sequencing Saturation >70%	测序深度充足标准
Reads Mapped >90%	比对质量良好标准