生信R包开发入门¶

一句话概述¶

R包开发是将生信分析代码标准化、可复用和可分享的核心技能，涵盖devtools工作流、roxygen2文档生成、testthat单元测试、pkgdown网站构建，以及向CRAN/Bioconductor提交的完整流程。

核心知识点表格¶

知识点	说明
devtools	R包开发的核心工作流工具，整合创建/构建/检查/安装
usethis	自动化包结构设置（LICENSE、README、CI等）
roxygen2	通过代码注释自动生成函数文档（.Rd文件）
testthat	R的单元测试框架，确保代码正确性
pkgdown	从包文档自动生成美观的网站
NAMESPACE	控制函数的导出（export）和导入（import）
DESCRIPTION	包的元数据文件（依赖、版本、作者等）
vignettes	长篇使用教程（Rmd格式）
CRAN	综合R包仓库，有严格提交标准
Bioconductor	生物信息学R包仓库，有额外规范

各步骤详解¶

第一步：创建包骨架¶

白话解释： 一个R包本质上是一个特定目录结构的文件夹。usethis::create_package()帮你一键生成这个结构，包括必需的DESCRIPTION、NAMESPACE和R/目录。

技术细节： - 必需文件：DESCRIPTION, NAMESPACE, R/（代码目录） - 推荐文件：README.md, LICENSE, tests/, vignettes/, man/ - 包名规范：只包含字母、数字和点，以字母开头，不能以点结尾 - 建议使用RStudio的Package项目类型

代码示例：

# 安装开发工具
install.packages(c("devtools", "usethis", "roxygen2", "testthat", "pkgdown"))

# 创建包（在期望的路径下）
usethis::create_package("~/projects/BioAnalyzer")

# 创建后的目录结构：
# BioAnalyzer/
# ├── DESCRIPTION
# ├── NAMESPACE
# ├── R/
# ├── BioAnalyzer.Rproj
# └── .Rbuildignore

# 设置基本信息
usethis::use_mit_license()                    # 添加MIT License
usethis::use_readme_rmd()                      # README.Rmd模板
usethis::use_news_md()                         # NEWS.md变更日志
usethis::use_git()                             # 初始化git
usethis::use_github()                          # 关联GitHub仓库

# 设置包描述（编辑DESCRIPTION文件）
# 或使用：
usethis::use_description(fields = list(
  Title = "Bioinformatics Analysis Toolkit",
  Description = "A comprehensive toolkit for common bioinformatics analyses including differential expression, enrichment analysis, and visualization.",
  `Authors@R` = 'person("Your", "Name", email = "you@email.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-2345-6789"))'
))

DESCRIPTION文件示例：

Package: BioAnalyzer
Title: Bioinformatics Analysis Toolkit
Version: 0.1.0
Authors@R: 
    person("Your", "Name", email = "you@email.com", 
           role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-2345-6789"))
Description: A comprehensive toolkit for common bioinformatics analyses
    including differential expression, enrichment analysis, and visualization.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Depends: R (>= 4.1.0)
Imports:
    ggplot2 (>= 3.4.0),
    dplyr,
    stats
Suggests:
    testthat (>= 3.0.0),
    knitr,
    rmarkdown
VignetteBuilder: knitr
URL: https://github.com/yourname/BioAnalyzer
BugReports: https://github.com/yourname/BioAnalyzer/issues

第二步：编写函数与文档（roxygen2）¶

白话解释： 把分析功能写成R函数放在R/目录下。每个函数前面用特殊注释格式（#'开头）写文档，roxygen2会自动把这些注释转换成帮助页面和NAMESPACE。

技术细节： - roxygen2标签：@param(参数)、@return(返回值)、@export(导出)、@examples(示例) - @importFrom引入其他包的函数 - devtools::document()生成man/*.Rd文件和NAMESPACE - 支持Markdown语法（设置Roxygen: list(markdown = TRUE)）

代码示例：

# 文件：R/differential_expression.R

#' Perform Differential Expression Analysis
#'
#' Run DESeq2 differential expression analysis on a count matrix
#' with given sample conditions.
#'
#' @param count_matrix A numeric matrix of raw counts (genes x samples).
#'   Row names should be gene IDs, column names should be sample IDs.
#' @param conditions A character vector of condition labels for each sample.
#'   Must have the same length as `ncol(count_matrix)`.
#' @param reference The reference level for comparison (default: first level).
#' @param padj_cutoff Adjusted p-value cutoff for significance (default: 0.05).
#' @param lfc_cutoff Log2 fold change cutoff (default: 1).
#'
#' @return A data.frame with columns:
#'   \describe{
#'     \item{gene}{Gene identifier}
#'     \item{log2FoldChange}{Log2 fold change}
#'     \item{padj}{Adjusted p-value}
#'     \item{significant}{Logical, whether the gene is significant}
#'   }
#'
#' @export
#'
#' @examples
#' # Create example data
#' set.seed(42)
#' counts <- matrix(rpois(1000, lambda = 10), nrow = 100, ncol = 10)
#' rownames(counts) <- paste0("Gene", 1:100)
#' colnames(counts) <- paste0("Sample", 1:10)
#' conditions <- rep(c("Control", "Treatment"), each = 5)
#'
#' # Run analysis
#' results <- run_deseq2(counts, conditions)
#' head(results)
#'
#' @importFrom stats p.adjust
run_deseq2 <- function(count_matrix,
                        conditions,
                        reference = NULL,
                        padj_cutoff = 0.05,
                        lfc_cutoff = 1) {
  # 输入验证
  if (!is.matrix(count_matrix) && !is.data.frame(count_matrix)) {
    stop("`count_matrix` must be a matrix or data.frame")
  }
  if (length(conditions) != ncol(count_matrix)) {
    stop("`conditions` length must equal number of columns in `count_matrix`")
  }
  if (!requireNamespace("DESeq2", quietly = TRUE)) {
    stop("Package 'DESeq2' is required. Install with BiocManager::install('DESeq2')")
  }

  # 构建DESeqDataSet
  col_data <- data.frame(condition = factor(conditions))
  if (!is.null(reference)) {
    col_data$condition <- relevel(col_data$condition, ref = reference)
  }

  dds <- DESeq2::DESeqDataSetFromMatrix(
    countData = round(count_matrix),
    colData = col_data,
    design = ~ condition
  )

  # 运行DESeq2
  dds <- DESeq2::DESeq(dds)
  res <- DESeq2::results(dds)
  res_df <- as.data.frame(res)

  # 格式化输出
  result <- data.frame(
    gene = rownames(res_df),
    log2FoldChange = res_df$log2FoldChange,
    padj = res_df$padj,
    significant = !is.na(res_df$padj) &
      res_df$padj < padj_cutoff &
      abs(res_df$log2FoldChange) > lfc_cutoff,
    stringsAsFactors = FALSE
  )

  return(result)
}


#' Plot Volcano
#'
#' Create a volcano plot from differential expression results.
#'
#' @param de_results A data.frame from [run_deseq2()].
#' @param padj_cutoff Adjusted p-value cutoff line (default: 0.05).
#' @param lfc_cutoff Log2 fold change cutoff lines (default: 1).
#' @param top_n Number of top genes to label (default: 10).
#'
#' @return A ggplot2 object.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' results <- run_deseq2(counts, conditions)
#' plot_volcano(results)
#' }
#'
#' @importFrom ggplot2 ggplot aes geom_point theme_minimal labs geom_hline geom_vline
plot_volcano <- function(de_results,
                          padj_cutoff = 0.05,
                          lfc_cutoff = 1,
                          top_n = 10) {
  # 验证输入
  required_cols <- c("gene", "log2FoldChange", "padj", "significant")
  if (!all(required_cols %in% names(de_results))) {
    stop("de_results must contain columns: ", paste(required_cols, collapse = ", "))
  }

  de_results$neg_log10_padj <- -log10(de_results$padj)

  p <- ggplot2::ggplot(de_results,
    ggplot2::aes(x = log2FoldChange, y = neg_log10_padj, color = significant)) +
    ggplot2::geom_point(alpha = 0.6, size = 1) +
    ggplot2::geom_hline(yintercept = -log10(padj_cutoff), linetype = "dashed") +
    ggplot2::geom_vline(xintercept = c(-lfc_cutoff, lfc_cutoff), linetype = "dashed") +
    ggplot2::scale_color_manual(values = c("grey60", "red")) +
    ggplot2::labs(x = "Log2 Fold Change", y = "-Log10 Adjusted P-value") +
    ggplot2::theme_minimal()

  return(p)
}

# 生成文档
devtools::document()  # 生成man/*.Rd + 更新NAMESPACE

第三步：添加依赖管理¶

白话解释： 你的包用到了哪些其他R包，需要在DESCRIPTION里声明清楚。Imports表示必须安装的；Suggests表示可选的（如测试用的包）。

代码示例：

# 添加依赖
usethis::use_package("ggplot2", type = "Imports")     # 必需依赖
usethis::use_package("dplyr", type = "Imports")
usethis::use_package("DESeq2", type = "Suggests")     # 可选依赖
usethis::use_package("testthat", type = "Suggests")
usethis::use_package("knitr", type = "Suggests")

# 内部使用其他包函数的正确方式
# 方式1：在函数文档中声明 @importFrom
#' @importFrom dplyr filter mutate

# 方式2：直接使用包名::函数名（推荐，更明确）
# dplyr::filter(df, condition == "A")

# 方式3：导入整个包的所有函数（不推荐，污染命名空间）
#' @import ggplot2

# 设置pipe操作符
usethis::use_pipe()  # 添加 %>% 支持

# 创建内部数据（随包附带的示例数据）
# 创建数据
example_counts <- matrix(rpois(500, 10), nrow = 50, ncol = 10)
rownames(example_counts) <- paste0("Gene", 1:50)
colnames(example_counts) <- paste0("Sample", 1:10)
usethis::use_data(example_counts)  # 保存到data/目录

# 内部数据（不暴露给用户）
internal_params <- list(default_colors = c("blue", "red", "green"))
usethis::use_data(internal_params, internal = TRUE)  # 保存到R/sysdata.rda

第四步：单元测试（testthat）¶

白话解释： 写测试代码验证你的函数是否正确工作。每次修改代码后跑一遍测试，确保没有把原本正确的功能搞坏（回归测试）。

技术细节： - usethis::use_testthat()初始化测试框架 - 测试文件放在tests/testthat/目录 - 文件命名：test-功能名.R - 核心函数：test_that()、expect_equal()、expect_error()等 - devtools::test()运行所有测试

代码示例：

# 初始化测试
usethis::use_testthat(edition = 3)  # 使用testthat 3rd edition

# 为函数创建测试文件
usethis::use_test("differential_expression")
# 自动创建 tests/testthat/test-differential_expression.R

# === 编写测试 ===
# 文件：tests/testthat/test-differential_expression.R

test_that("run_deseq2 returns correct structure", {
  skip_if_not_installed("DESeq2")

  # 准备测试数据
  set.seed(42)
  counts <- matrix(rpois(500, lambda = 10), nrow = 50, ncol = 10)
  rownames(counts) <- paste0("Gene", 1:50)
  colnames(counts) <- paste0("Sample", 1:10)
  conditions <- rep(c("A", "B"), each = 5)

  # 运行函数
  result <- run_deseq2(counts, conditions)

  # 检查返回值结构

  expect_s3_class(result, "data.frame")
  expect_named(result, c("gene", "log2FoldChange", "padj", "significant"))
  expect_equal(nrow(result), 50)
  expect_type(result$significant, "logical")
})

test_that("run_deseq2 validates input", {
  # 错误输入类型
  expect_error(run_deseq2("not a matrix", c("A", "B")))

  # conditions长度不匹配
  counts <- matrix(1:20, nrow = 5, ncol = 4)
  expect_error(run_deseq2(counts, c("A", "B")),
               "length must equal")
})

test_that("plot_volcano returns ggplot object", {
  de_results <- data.frame(
    gene = paste0("Gene", 1:10),
    log2FoldChange = rnorm(10),
    padj = runif(10, 0, 0.1),
    significant = c(rep(TRUE, 3), rep(FALSE, 7))
  )

  p <- plot_volcano(de_results)
  expect_s3_class(p, "ggplot")
})

test_that("plot_volcano validates input columns", {
  bad_df <- data.frame(x = 1:5, y = 1:5)
  expect_error(plot_volcano(bad_df), "must contain columns")
})

# 运行测试
devtools::test()

# 检查测试覆盖率
# install.packages("covr")
covr::package_coverage()
covr::report()  # 生成覆盖率报告HTML

第五步：Vignette编写¶

白话解释： Vignette是包的"使用教程"，用R Markdown写成，展示典型的使用场景和完整workflow。用户可以通过browseVignettes("你的包名")查看。

代码示例：

# 创建vignette
usethis::use_vignette("introduction", title = "Introduction to BioAnalyzer")

---
title: "Introduction to BioAnalyzer"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to BioAnalyzer}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Quick Start

BioAnalyzer provides easy-to-use functions for common bioinformatics analyses.

### Differential Expression Analysis

```{r}
library(BioAnalyzer)

# Load example data
data(example_counts)

# Define conditions
conditions <- rep(c("Control", "Treatment"), each = 5)

# Run DESeq2 analysis
results <- run_deseq2(example_counts, conditions)
head(results)

Visualization¶

plot_volcano(results, top_n = 5)

---

### 第六步：构建、检查与发布

**白话解释：** 写完代码和测试后，运行`R CMD check`确保包没有ERROR/WARNING/NOTE。CRAN有非常严格的标准——零ERROR、零WARNING才能提交。Bioconductor还有额外的编码规范。

**代码示例：**

```r
# === 开发周期中的常用命令 ===
devtools::load_all()      # 快速加载包（不需要安装）
devtools::document()      # 更新文档
devtools::test()          # 运行测试
devtools::check()         # 完整检查（模拟CRAN检查）

# === 构建与安装 ===
devtools::build()         # 构建.tar.gz源码包
devtools::install()       # 安装到本地

# === CRAN提交前的完整检查 ===
# 本地检查
devtools::check(cran = TRUE)  # 使用CRAN标准检查

# rhub检查（多平台）
# install.packages("rhub")
rhub::check_for_cran()

# win-builder检查（Windows平台）
devtools::check_win_devel()
devtools::check_win_release()

# === 提交到CRAN ===
devtools::submit_cran()

# === pkgdown网站 ===
usethis::use_pkgdown()
pkgdown::build_site()

# 自动部署（GitHub Actions）
usethis::use_pkgdown_github_pages()

第七步：Bioconductor提交规范¶

白话解释： Bioconductor是专门的生物信息学R包仓库，比CRAN有更多要求：必须使用S4类、要有BiocCheck通过、必须有vignette、版本号规范、使用BiocStyle等。

技术细节： - 版本号：开发版使用x.y.z（z为奇数），release为偶数 - 必须通过BiocCheck::BiocCheck() - 需要有完整的vignette（不只是帮助页面） - 数据包和软件包分开提交 - 通过GitHub新建issue提交（不像CRAN通过邮件）

代码示例：

# 安装BiocCheck
BiocManager::install("BiocCheck")

# 运行Bioconductor特定检查
BiocCheck::BiocCheck("BioAnalyzer_0.1.0.tar.gz")

# Bioconductor DESCRIPTION额外要求
# biocViews: 必须指定分类
# 例如：
# biocViews: DifferentialExpression, Visualization, RNASeq

# 版本号规范
# 开发版(devel): 0.99.0 → 0.99.1 → 0.99.2 ... (提交时)
# Release后: 1.0.0

# 提交流程：
# 1. 确保 BiocCheck 零 ERROR 和 WARNING
# 2. 在 https://github.com/Bioconductor/Contributions 创建新 issue
# 3. 填写提交模板
# 4. 等待reviewer反馈并修改
# 5. 通过后进入下一个release周期

# Bioconductor风格的S4类定义
setClass("BioResult",
  representation(
    genes = "character",
    log2fc = "numeric",
    padj = "numeric",
    metadata = "list"
  ),
  validity = function(object) {
    errors <- character()
    if (length(object@genes) != length(object@log2fc)) {
      errors <- c(errors, "genes and log2fc must have same length")
    }
    if (length(errors) == 0) TRUE else errors
  }
)

# 泛型方法
setGeneric("summary", function(object, ...) standardGeneric("summary"))
setMethod("summary", "BioResult", function(object, ...) {
  cat("BioResult with", length(object@genes), "genes\n")
  cat("Significant:", sum(object@padj < 0.05, na.rm = TRUE), "\n")
})

第八步：CI/CD与GitHub Actions¶

白话解释： 设置自动化：每次push代码到GitHub，自动运行测试和R CMD check，确保代码始终可用。还可以自动构建pkgdown网站。

代码示例：

# 设置GitHub Actions
usethis::use_github_action("check-standard")    # R CMD check
usethis::use_github_action("test-coverage")     # 测试覆盖率
usethis::use_github_action("pkgdown")           # 自动构建网站

# 添加badges到README
usethis::use_github_actions_badge("R-CMD-check")

# .github/workflows/R-CMD-check.yaml（自动生成）
# 关键部分：
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}
    strategy:
      matrix:
        config:
          - {os: macos-latest, r: 'release'}
          - {os: windows-latest, r: 'release'}
          - {os: ubuntu-latest, r: 'release'}
          - {os: ubuntu-latest, r: 'devel'}

实战命令（可复制）¶

# ===== 从零开始创建生信R包的完整步骤 =====

# 1. 创建包
usethis::create_package("~/BioAnalyzer")

# 2. 基础设置
usethis::use_mit_license()
usethis::use_readme_rmd()
usethis::use_news_md()
usethis::use_git()
usethis::use_testthat(edition = 3)
usethis::use_pipe()

# 3. 添加依赖
usethis::use_package("ggplot2")
usethis::use_package("dplyr")
usethis::use_package("DESeq2", type = "Suggests")

# 4. 写函数（在R/目录下创建.R文件）
usethis::use_r("differential_expression")
usethis::use_r("visualization")
usethis::use_r("utils")

# 5. 写测试
usethis::use_test("differential_expression")
usethis::use_test("visualization")

# 6. 写vignette
usethis::use_vignette("introduction")

# 7. 生成文档
devtools::document()

# 8. 开发循环
devtools::load_all()    # 加载
devtools::test()        # 测试
devtools::check()       # 检查

# 9. 构建网站
usethis::use_pkgdown()
pkgdown::build_site()

# 10. CI/CD
usethis::use_github_action("check-standard")
usethis::use_github_action("pkgdown")

# 11. 提交前最终检查
devtools::check(cran = TRUE)
# BiocCheck::BiocCheck("BioAnalyzer_0.1.0.tar.gz")  # 如果要提交Bioconductor

面试常问点¶

Q1: R包的NAMESPACE文件有什么作用？¶

A: NAMESPACE控制包的API边界：(1) export()声明哪些函数对用户可见（公开API）；(2) import()/importFrom()声明从其他包导入哪些函数（内部使用）。它解决了命名冲突问题——两个包有同名函数时，通过NAMESPACE明确使用哪个。不在NAMESPACE中export的函数是内部函数，用户不应直接调用（但可通过:::访问）。

Q2: Imports和Depends的区别？¶

A: Depends：加载你的包时也加载依赖包（附加到search path），用户可以直接使用依赖包的函数。Imports：加载时不附加依赖包到search path，通过NAMESPACE导入使用。现代最佳实践：几乎总是使用Imports而非Depends（除非你的包是对其他包的扩展且用户需要同时使用两者）。Depends主要保留用于指定最低R版本：Depends: R (>= 4.1.0)。

Q3: roxygen2的`@export`和`@importFrom`分别做什么？¶

A: @export：将该函数添加到NAMESPACE的export列表，用户library(你的包)后可以直接使用。不加@export的函数是内部函数。@importFrom pkg func：在NAMESPACE中声明从另一个包导入特定函数，这样你可以在代码中直接写func()而不需要pkg::func()。最佳实践：使用pkg::func()显式调用比@importFrom更清晰。

Q4: testthat 3rd edition有什么变化？¶

A: 主要变化：(1) test_that()中多个expect失败时全部报告（而非第一个失败就停止）；(2) 引入snapshot testing（expect_snapshot()）用于测试输出文本；(3) 更严格的作用域隔离；(4) 新的describe()和it()BDD风格；(5) setup()和teardown()被withr::local_*()替代。设置：在DESCRIPTION中添加Config/testthat/edition: 3。

Q5: 如何处理包中的大型数据集？¶

A: (1) 小数据（<1MB）：usethis::use_data()放在data/目录；(2) 中等数据：创建单独的数据包（如BioAnalyzerData）放在Bioconductor ExperimentHub；(3) 内部数据：usethis::use_data(internal=TRUE)放在R/sysdata.rda；(4) 原始数据：放inst/extdata/，通过system.file()访问；(5) 大数据集：提供下载函数而非打包。

Q6: CRAN和Bioconductor提交的核心区别？¶

A: CRAN：(1) 通过devtools::submit_cran()提交；(2) 主要检查R CMD check零ERROR/WARNING；(3) 审核时间1-3周；(4) 更新频率不限。Bioconductor：(1) 通过GitHub issue提交；(2) 需要额外通过BiocCheck；(3) 要求S4类和方法；(4) 必须有vignette；(5) 版本号与Bioc release同步；(6) 有代码审核；(7) 每半年release一次。

Q7: 如何确保包在不同平台上工作？¶

A: (1) 使用GitHub Actions在Linux/macOS/Windows上自动运行R CMD check；(2) 使用rhub::check_for_cran()检查多平台；(3) 避免平台特定代码（如Windows路径分隔符）；(4) 使用.Platform和.Machine做平台适配；(5) 测试中使用skip_on_os()跳过平台特定测试。

Q8: R包版本号的最佳实践？¶

A: 遵循语义化版本（Semantic Versioning）：MAJOR.MINOR.PATCH。MAJOR：不兼容API变更（如删除函数、改变参数含义）。MINOR：向后兼容的新功能。PATCH：向后兼容的Bug修复。开发版使用第4位：0.1.0.9000。CRAN首次提交通常为0.1.0或1.0.0。

Q9: 如何处理包的向后兼容性？¶

A: (1) 弃用函数用.Deprecated("new_func")提示用户；(2) 弃用参数用lifecycle::deprecate_warn()；(3) 使用lifecycle包管理函数生命周期；(4) 在NEWS.md中清晰记录Breaking Changes；(5) 遵循SemVer在MAJOR版本中做不兼容变更。

易错点¶

1. 在函数中使用`library()`或`require()`¶

问题： 在包函数中加载其他包会改变用户的search path，引起命名冲突。 正确做法： 使用pkg::func()显式调用或通过NAMESPACE的importFrom导入。

2. 在examples/tests中使用绝对路径¶

问题： 本机路径在CRAN检查机器上不存在，导致检查失败。 正确做法： 使用system.file("extdata", "file.csv", package = "YourPkg")或tempdir()。

3. DESCRIPTION中依赖版本未指定¶

问题： 包依赖的新功能在旧版本中不存在，用户安装旧版依赖后报错。 正确做法： 指定最低版本：ggplot2 (>= 3.4.0)。

4. roxygen2文档中`@examples`运行时间过长¶

问题： CRAN要求所有examples在合理时间内完成（一般<5秒/例）。 正确做法： 长时间示例用\dontrun{}或\donttest{}包裹；或使用小的模拟数据。

5. 未处理suggests包不可用的情况¶

问题： Suggests中的包用户可能没装，直接调用会报错。 正确做法： 在代码中检查：if (!requireNamespace("DESeq2", quietly = TRUE)) stop("Install DESeq2")。

6. 测试中依赖网络或外部资源¶

问题： CRAN检查时无网络，依赖外部API或文件下载的测试会失败。 正确做法： 使用skip_if_offline()或mock外部依赖；测试数据打包在tests/testthat/fixtures/中。

7. 忽略R CMD check的NOTE¶

问题： 认为NOTE不重要，但CRAN可能因NOTE拒绝包。 正确做法： 尽量消除所有NOTE。常见NOTE：全局变量绑定（dplyr管道中的列名）——用utils::globalVariables()或.data$column解决。

8. 不写vignette就提交Bioconductor¶

问题： Bioconductor强制要求至少一个vignette，缺失会直接拒绝。 正确做法： 写至少一个介绍性vignette，展示核心功能的完整workflow。

补充知识¶

开发工作流速查¶

# 日常开发循环
devtools::load_all()     # Ctrl+Shift+L (RStudio)
devtools::document()     # Ctrl+Shift+D
devtools::test()         # Ctrl+Shift+T
devtools::check()        # Ctrl+Shift+E

# 发布前
devtools::spell_check()  # 拼写检查
urlchecker::url_check()  # URL有效性
devtools::check_win_devel()  # Windows检查
devtools::submit_cran()  # 提交

项目结构模板¶

BioAnalyzer/
├── DESCRIPTION
├── NAMESPACE
├── LICENSE
├── README.Rmd
├── NEWS.md
├── R/
│   ├── BioAnalyzer-package.R  # 包级文档
│   ├── differential_expression.R
│   ├── visualization.R
│   ├── utils.R
│   └── data.R               # 数据集文档
├── man/                      # 自动生成（不手动编辑）
├── tests/
│   ├── testthat.R
│   └── testthat/
│       ├── test-differential_expression.R
│       └── test-visualization.R
├── vignettes/
│   └── introduction.Rmd
├── data/                     # .rda数据文件
├── inst/
│   └── extdata/              # 外部数据文件
├── .Rbuildignore
├── .gitignore
└── BioAnalyzer.Rproj

资源	说明
R Packages (2e)	Hadley Wickham的权威R包开发书（https://r-pkgs.org）
Writing R Extensions	R官方手册（最全面但难读）
Bioconductor开发者指南	https://contributions.bioconductor.org
usethis文档	https://usethis.r-lib.org
devtools文档	https://devtools.r-lib.org

生信R包开发入门¶

一句话概述¶

核心知识点表格¶

各步骤详解¶

第一步：创建包骨架¶

第二步：编写函数与文档（roxygen2）¶

第三步：添加依赖管理¶

第四步：单元测试（testthat）¶

第五步：Vignette编写¶

Visualization¶

第七步：Bioconductor提交规范¶

第八步：CI/CD与GitHub Actions¶

实战命令（可复制）¶

面试常问点¶

Q1: R包的NAMESPACE文件有什么作用？¶

Q2: Imports和Depends的区别？¶

Q3: roxygen2的@export和@importFrom分别做什么？¶

Q4: testthat 3rd edition有什么变化？¶

Q5: 如何处理包中的大型数据集？¶

Q6: CRAN和Bioconductor提交的核心区别？¶

Q7: 如何确保包在不同平台上工作？¶

Q8: R包版本号的最佳实践？¶

Q9: 如何处理包的向后兼容性？¶

易错点¶

1. 在函数中使用library()或require()¶

2. 在examples/tests中使用绝对路径¶

3. DESCRIPTION中依赖版本未指定¶

4. roxygen2文档中@examples运行时间过长¶

5. 未处理suggests包不可用的情况¶

6. 测试中依赖网络或外部资源¶

7. 忽略R CMD check的NOTE¶

8. 不写vignette就提交Bioconductor¶

补充知识¶

开发工作流速查¶

项目结构模板¶

推荐学习资源¶

📚 相关文章推荐

Q3: roxygen2的`@export`和`@importFrom`分别做什么？¶

1. 在函数中使用`library()`或`require()`¶

4. roxygen2文档中`@examples`运行时间过长¶