Nextflow 流程管理¶

一句话概述：Nextflow 是生信领域最流行的工作流管理器，用 DSL2 语法定义分析流程，支持 Docker/Singularity 容器和 HPC/云端执行，nf-core 社区提供了 100+ 现成的标准流程。

核心知识点¶

概念	白话解释
Process	进程 = 一个独立的分析步骤（如比对、质控）
Channel	通道 = 进程之间传递数据的管道
DSL2	语法版本 = 支持模块化的新语法（现在的默认）
Workflow	工作流 = 多个进程的连接方式
nf-core	社区 = 提供标准化的生信流程（100+ 个流程）
Executor	执行器 = 定义在哪里跑（本地/SLURM/AWS/K8s）

安装配置¶

# 安装 Nextflow（需要 Java 11+）
curl -s https://get.nextflow.io | bash           # 下载
sudo mv nextflow /usr/local/bin/                  # 移到 PATH
nextflow -version                                 # 验证

# 或用 Conda 安装
mamba install -c bioconda nextflow                # Conda 安装

# 安装 nf-core 工具
pip install nf-core                               # nf-core CLI

基本使用¶

// main.nf — 简单的 FastQC + MultiQC 流程
nextflow.enable.dsl = 2

// 参数定义
params.reads = "data/*_{1,2}.fastq.gz"            // 输入文件
params.outdir = "results"                          // 输出目录

// 进程：运行 FastQC
process FASTQC {
    tag "$sample_id"                               // 标签（日志中显示）
    publishDir "${params.outdir}/fastqc"           // 结果发布目录
    container 'biocontainers/fastqc:v0.12.1'       // Docker 容器

    input:
    tuple val(sample_id), path(reads)              // 输入：样本ID + 文件

    output:
    path "*.html", emit: html                      // 输出：HTML 报告
    path "*.zip", emit: zip                        // 输出：ZIP 数据

    script:
    """
    fastqc -t 2 ${reads}                           
    """
}

// 进程：运行 MultiQC
process MULTIQC {
    publishDir "${params.outdir}/multiqc"
    container 'multiqc/multiqc:latest'

    input:
    path '*'                                       // 输入：所有 FastQC 结果

    output:
    path 'multiqc_report.html'                     // 输出：汇总报告

    script:
    """
    multiqc .                                      
    """
}

// 工作流：连接进程
workflow {
    reads_ch = Channel.fromFilePairs(params.reads) // 创建通道
    FASTQC(reads_ch)                               // 运行 FastQC
    MULTIQC(FASTQC.out.zip.collect())              // 汇总所有结果
}

# 运行流程
nextflow run main.nf                               # 本地运行
nextflow run main.nf -with-docker                  # 用 Docker
nextflow run main.nf -with-singularity             # 用 Singularity
nextflow run main.nf -resume                       # 断点续跑（重要！）

# 运行 nf-core 流程
nextflow run nf-core/rnaseq -r 3.14.0 \
  --input samplesheet.csv \
  --genome GRCh38 \
  --outdir results \
  -profile docker                                  # RNA-seq 标准流程

配置文件¶

// nextflow.config — 运行配置
params {
    reads = "data/*_{1,2}.fastq.gz"
    outdir = "results"
    genome = "GRCh38"
}

process {
    cpus = 4                                       // 默认 CPU
    memory = '8 GB'                                // 默认内存
    time = '2h'                                    // 默认时间

    withName: 'FASTQC' {                           // 特定进程配置
        cpus = 2
        memory = '4 GB'
    }
}

// SLURM 集群配置
profiles {
    slurm {
        process.executor = 'slurm'                 // 用 SLURM 调度
        process.queue = 'normal'                   // 队列名
        singularity.enabled = true                 // 用 Singularity
    }
}

高级用法¶

DSL2 模块化¶

// modules/fastqc.nf — 独立模块
process FASTQC {
    tag "$sample_id"
    container 'biocontainers/fastqc:v0.12.1'

    input:
    tuple val(sample_id), path(reads)

    output:
    path "*.html", emit: html
    path "*.zip", emit: zip

    script:
    """
    fastqc -t ${task.cpus} ${reads}
    """
}

// main.nf — 导入模块
include { FASTQC } from './modules/fastqc'         // 导入模块
include { MULTIQC } from './modules/multiqc'

workflow {
    reads_ch = Channel.fromFilePairs(params.reads)
    FASTQC(reads_ch)
    MULTIQC(FASTQC.out.zip.collect())
}

常见报错¶

报错信息	原因	解决方法
`No such file`	输入文件路径错误	检查 glob 模式和路径
`Process terminated with error`	命令执行失败	查看 `.command.err` 日志
`Unable to acquire lock`	多个实例冲突	删除 `.nextflow.lock`
`Missing container`	容器未拉取	先 `docker pull` 或 `apptainer pull`

速查表¶

# === 运行 ===
nextflow run main.nf                    # 本地运行
nextflow run main.nf -resume            # 断点续跑
nextflow run main.nf -with-docker       # Docker 模式
nextflow run main.nf -with-singularity  # Singularity 模式
nextflow run main.nf -profile slurm     # SLURM 集群

# === nf-core ===
nf-core list                            # 列出所有流程
nf-core launch nf-core/rnaseq           # 交互式启动
nextflow run nf-core/sarek              # 变异检测
nextflow run nf-core/ampliseq           # 扩增子分析
nextflow run nf-core/mag                # 宏基因组组装

# === 调试 ===
nextflow log                            # 查看运行历史
nextflow log <run_name> -f hash,name,status  # 详细日志
cat work/<hash>/.command.err            # 查看错误

参考：Nextflow 文档 | nf-core | 更新于 2026 年