Jupyter Notebook 高效使用¶

1. 一句话说明¶

Jupyter 是可以边写代码边看结果的交互式笔记本，一个 cell 写代码/一个 cell 写笔记/一个 cell 看图表，是数据分析和生信探索的最佳搭档。

2. Jupyter 是什么¶

白话解释¶

普通的 Python 脚本像写信——你把所有内容写完，然后一次性寄出（运行），中间想看某一段的结果？不行，得全部重跑。

Jupyter 像微信聊天——你发一句（写一个 cell），对方立刻回复（显示结果）。可以随时回看之前的对话（之前的 cell 结果还在），也可以修改某一句重新发（只重跑一个 cell）。

适用场景¶

数据探索和可视化（边看数据边调参数）
分析报告（代码+图表+说明文字一体化）
教学和演示（交互式展示）
快速原型验证（试算法、调参数）

不适用场景¶

大型软件开发（用 VSCode/PyCharm）
生产环境部署（用 .py 脚本）
需要版本控制的核心代码（.ipynb 格式对 git diff 不友好）

3. 安装配置¶

3.1 JupyterLab vs Notebook¶

特性	JupyterLab（推荐）	Jupyter Notebook（经典）
界面	完整 IDE 风格，多标签	单文档，简洁
文件管理	左侧文件浏览器	需要回到文件列表页
终端	内置终端	无
扩展	新扩展系统	旧扩展系统
版本	v4.5.7（当前最新）	v7.5.6
建议	新用户直接用这个	追求极简

3.2 安装¶

# 方式1：pip 安装（推荐）
pip install jupyterlab              # 安装 JupyterLab（包含 notebook）

# 方式2：conda 安装
conda install -c conda-forge jupyterlab

# 启动
jupyter lab                         # 启动 JupyterLab（浏览器自动打开）
jupyter notebook                    # 启动经典 Notebook

# 指定端口和不自动打开浏览器
jupyter lab --port 8889 --no-browser

3.3 远程服务器配置¶

# 场景：服务器上跑 Jupyter，本地浏览器访问

# 服务器端：生成配置文件
jupyter lab --generate-config       # 生成 ~/.jupyter/jupyter_lab_config.py

# 设置密码
jupyter lab password                # 设置访问密码（会保存哈希值）

# 修改配置（~/.jupyter/jupyter_lab_config.py）
# c.ServerApp.ip = '0.0.0.0'       # 监听所有 IP
# c.ServerApp.port = 8888           # 端口
# c.ServerApp.open_browser = False  # 不自动开浏览器
# c.ServerApp.allow_remote_access = True  # 允许远程访问

# 启动（推荐用 screen/tmux 保持后台运行）
screen -S jupyter                   # 创建 screen 会话
jupyter lab --no-browser            # 启动
# Ctrl+A+D 退出 screen（Jupyter 继续后台跑）

# 本地通过 SSH 隧道访问（更安全）
ssh -L 8888:localhost:8888 user@server  # 本地 8888 转发到服务器 8888
# 浏览器访问 http://localhost:8888

3.4 多 Kernel 管理¶

# Kernel = Jupyter 背后执行代码的引擎
# 可以在同一个 JupyterLab 中使用不同的 conda 环境

# 安装 ipykernel
conda activate bioinfo              # 激活目标环境
pip install ipykernel               # 安装 kernel 包

# 注册 kernel
python -m ipykernel install --user --name bioinfo --display-name "Bioinfo (Python 3.10)"
python -m ipykernel install --user --name ml_env --display-name "T2D ML (Python 3.9)"

# 查看已注册的 kernel
jupyter kernelspec list             # 列出所有 kernel

# 删除 kernel
jupyter kernelspec uninstall bioinfo  # 删除指定 kernel

# 安装 R kernel（在 R 中执行）
# install.packages('IRkernel')
# IRkernel::installspec()

4. 高效技巧¶

4.1 魔术命令（Magic Commands）¶

# 魔术命令是 Jupyter 特有的快捷功能，以 % 或 %% 开头
# % 是行魔术（单行），%% 是单元魔术（整个 cell）

# ===== 计时 =====
%timeit sum(range(1000))           # 多次运行取平均，测试一行代码速度
%%timeit                            # 测试整个 cell 的速度
x = [i**2 for i in range(1000)]

%time result = my_function()        # 只跑一次，报告实际耗时

# ===== 运行外部脚本 =====
%run scripts/my_analysis.py         # 执行外部 Python 脚本（变量会保留）
%run -t scripts/process.py          # 执行并计时

# ===== 加载代码 =====
%load scripts/utils.py              # 把文件内容加载到当前 cell（方便修改调试）

# ===== Shell 命令 =====
!ls -la data/                       # 执行 shell 命令（前面加感叹号）
!pip install seaborn                # 安装包
!wc -l data/*.fastq                 # 统计行数

# 将 shell 结果存入变量
files = !ls data/*.csv              # 结果存入 Python 列表
print(files)

# ===== 环境信息 =====
%who                                # 列出当前所有变量
%who str                            # 只列出字符串变量
%whos                               # 详细信息（类型、值）
%reset                              # 清除所有变量（谨慎！）

# ===== 调试 =====
%debug                              # 在报错后进入调试模式
%pdb on                             # 自动在异常时进入调试器

# ===== 其他实用 =====
%pwd                                # 当前工作目录
%cd /path/to/dir                    # 切换目录
%env                                # 查看环境变量
%history                            # 查看命令历史
%matplotlib inline                  # 图表内嵌显示（经典）
%matplotlib widget                  # 交互式图表

4.2 自动重载（autoreload）¶

# 问题：你在 notebook 中 import 了自己写的模块，修改后需要重启 kernel 才能生效
# 解决：autoreload 扩展，修改文件后自动重新加载

%load_ext autoreload                # 加载 autoreload 扩展
%autoreload 2                       # 模式2：每次执行 cell 前自动重载所有模块

# 现在修改 scripts/utils.py 后，不需要重启 kernel
import scripts.utils as utils       # 修改 utils.py 后，下次调用自动更新
utils.process_data(df)              # 用的是最新版代码

4.3 变量探索器¶

# JupyterLab 内置变量探索器：View > Variable Inspector
# 或安装扩展：
# pip install jupyterlab-variableInspector

# 手动查看变量详情
%whos                                # 查看所有变量（名称、类型、值）

# DataFrame 快速探索
df.info()                           # 列信息、类型、非空计数
df.describe()                       # 数值列统计摘要
df.head()                           # 前5行
df.shape                            # (行数, 列数)
df.memory_usage(deep=True).sum()    # 内存占用

4.4 快捷键（JupyterLab）¶

# 命令模式（按 Esc 进入，cell 边框为蓝色）
A            - 上方插入新 cell
B            - 下方插入新 cell
DD           - 删除当前 cell
M            - 转为 Markdown
Y            - 转为代码
Z            - 撤销删除
Shift+Enter  - 运行当前 cell 并跳到下一个
Ctrl+Enter   - 运行当前 cell 不跳转
Shift+M      - 合并选中 cells
C/V/X        - 复制/粘贴/剪切 cell

# 编辑模式（按 Enter 进入，cell 边框为绿色）
Tab          - 代码补全
Shift+Tab    - 查看函数文档
Ctrl+/       - 注释/取消注释
Ctrl+D       - 删除整行
Ctrl+Shift+- - 在光标处分割 cell

5. nbconvert 导出¶

# nbconvert 将 .ipynb 转换为其他格式

# 转 HTML（最常用，保留所有输出和图表）
jupyter nbconvert --to html notebook.ipynb
# 输出：notebook.html

# 转 PDF（需要安装 LaTeX）
jupyter nbconvert --to pdf notebook.ipynb
# 安装依赖：sudo apt install texlive-xetex texlive-fonts-recommended

# 转 Python 脚本（去掉 Markdown 和输出，只保留代码）
jupyter nbconvert --to script notebook.ipynb
# 输出：notebook.py

# 转 Markdown
jupyter nbconvert --to markdown notebook.ipynb

# 先执行再转换（确保结果最新）
jupyter nbconvert --to html --execute notebook.ipynb
# --execute 先跑一遍 notebook，再导出

# 隐藏代码只显示结果（做报告用）
jupyter nbconvert --to html --no-input notebook.ipynb
# --no-input 不显示代码 cell，只显示输出

# 批量转换
jupyter nbconvert --to html notebooks/*.ipynb

6. Jupyter + 生信¶

6.1 分析报告模板¶

# 标准生信分析 Notebook 结构建议：

# Cell 1: 标题和说明（Markdown）
# # T2D 肠道菌群多样性分析报告
# - 日期：2026-05-03
# - 分析人：[用户]
# - 数据来源：PRJNA123456

# Cell 2: 导入和配置
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
%matplotlib inline
plt.rcParams['figure.dpi'] = 150     # 高分辨率图表
plt.rcParams['font.size'] = 12

# Cell 3: 数据加载
df = pd.read_csv("results/abundance_table.tsv", sep='\t')
metadata = pd.read_csv("data/metadata.csv")
print(f"样本数: {df.shape[1]-1}, 物种数: {df.shape[0]}")

# Cell 4: 数据质控（Markdown + 代码 + 图表）
# Cell 5: 多样性分析
# Cell 6: 差异分析
# Cell 7: 可视化
# Cell 8: 结论（Markdown）

6.2 Papermill 参数化执行¶

# Papermill：把 Notebook 当函数用，传入不同参数批量执行

pip install papermill               # 安装

# 在 notebook 中标记参数 cell（添加 tag "parameters"）
# JupyterLab: 右键 cell > Add Tag > parameters

# 参数化 notebook 示例（analysis_template.ipynb）

# Cell 1 (带 parameters tag):
sample_id = "default_sample"        # 默认参数值
threshold = 0.01
output_dir = "results"

# Cell 2+: 使用这些参数做分析
df = pd.read_csv(f"data/{sample_id}_abundance.csv")
df_filtered = df[df['relative_abundance'] > threshold]
df_filtered.to_csv(f"{output_dir}/{sample_id}_filtered.csv")

# 命令行批量执行
papermill analysis_template.ipynb output/sampleA.ipynb \
    -p sample_id "sampleA" \
    -p threshold 0.05

# Python 批量执行
python -c "
import papermill as pm
samples = ['sampleA', 'sampleB', 'sampleC']
for s in samples:
    pm.execute_notebook(
        'analysis_template.ipynb',
        f'output/{s}_report.ipynb',
        parameters={'sample_id': s, 'threshold': 0.01}
    )
"

6.3 可重复研究实践¶

# 在 Notebook 开头记录环境信息（保证可复现）
import sys
print(f"Python: {sys.version}")
print(f"Pandas: {pd.__version__}")
print(f"NumPy: {np.__version__}")

# 或用 watermark 扩展
%load_ext watermark
%watermark -v -p pandas,numpy,scipy,matplotlib,scikit-learn
# 输出：Python 3.11.5 | pandas 2.1.0 | numpy 1.25.2 | ...

# 固定随机种子
import numpy as np
np.random.seed(42)

7. 扩展推荐¶

JupyterLab 扩展¶

扩展	功能	安装
jupyterlab-git	可视化 Git 操作	`pip install jupyterlab-git`
jupytext	notebook 与 .py/.md 双向同步	`pip install jupytext`
nbstripout	git 提交前自动清除输出	`pip install nbstripout`
jupyterlab-lsp	代码补全/跳转/重命名	`pip install jupyterlab-lsp`
jupyterlab-execute-time	显示每个 cell 执行时间	`pip install jupyterlab-execute-time`

重点推荐：jupytext + nbstripout¶

# jupytext：将 .ipynb 自动同步为 .py 文件（方便 git diff）
pip install jupytext
# 配置：在 notebook metadata 中设置格式对
# 或全局配置 ~/.jupyter/jupytext.toml：
# formats = "ipynb,py:percent"

# nbstripout：提交时自动清除 notebook 输出（减小文件体积，避免 diff 噪音）
pip install nbstripout
nbstripout --install                 # 安装 git filter（仓库级别）
# 之后 git commit 时自动去掉输出
# git show 时看到的是干净的 notebook

8. JupyterHub 多用户部署¶

# JupyterHub：团队/实验室共享的 Jupyter 服务
# 每个用户独立环境，管理员统一管理

# 安装
pip install jupyterhub
npm install -g configurable-http-proxy  # 依赖

# 生成配置
jupyterhub --generate-config         # 生成 jupyterhub_config.py

# 基本配置
# c.JupyterHub.ip = '0.0.0.0'
# c.JupyterHub.port = 8000
# c.Authenticator.admin_users = {'admin'}
# c.Spawner.default_url = '/lab'     # 默认打开 JupyterLab

# 启动
jupyterhub                           # 启动服务

# 适用场景：
# - 实验室共享服务器，每个学生一个账号
# - 教学环境，统一 Python 环境
# - 企业数据分析团队

9. Jupyter vs VSCode vs RStudio 对比¶

维度	Jupyter	VSCode	RStudio
定位	交互式数据探索	通用代码编辑器	R 语言 IDE
语言支持	Python/R/Julia 等	几乎所有语言	主要 R，支持 Python
交互性	极强（cell 逐步执行）	中等（需要插件）	强（Console 交互）
适合场景	数据分析/报告/教学	软件开发/大项目	统计分析/R 项目
版本控制	差（.ipynb 是 JSON）	优秀	一般
调试	基础（%debug）	强大（断点/变量）	中等
代码补全	中等	极强（Copilot）	强（R 生态）
可视化	内嵌图表，所见即所得	需要额外窗口	右侧 Plots 面板
大文件编辑	不适合	适合	不适合
部署	不适合	适合	不适合
生信使用	探索分析/报告	Pipeline 开发	R 包（DESeq2 等）

最佳实践组合¶

探索阶段：Jupyter（边试边看，快速迭代）
    ↓ 确定方案后
开发阶段：VSCode（写正式脚本/模块，有完善的调试和 git）
    ↓ 需要 R 统计时
统计分析：RStudio（DESeq2/edgeR/ggplot2）
    ↓ 最终
报告阶段：Jupyter（图表+代码+说明，导出 HTML/PDF 给老板看）

10. 面试怎么答¶

Q1：你平时怎么用 Jupyter？¶

"我主要用 JupyterLab 做数据探索和分析报告。探索阶段用 Jupyter 边写代码边看结果，比如查看丰度分布、调试可视化参数、快速验证统计检验。确定分析方案后，我会把核心代码提取到 .py 模块中，Jupyter 只做调用和展示。最终用 nbconvert 导出 HTML 报告给导师。"

Q2：Jupyter Notebook 的缺点是什么？怎么克服？¶

"主要问题：1）版本控制不友好（.ipynb 是大 JSON 文件）——用 nbstripout 提交前清除输出，或 jupytext 同步为 .py 文件；2）隐式执行顺序依赖——养成从头到尾顺序执行的习惯，发布前 Restart & Run All 验证；3）不适合写大型模块——核心逻辑写在 .py 文件中，notebook 只做调用。"

Q3：如何在服务器上远程使用 Jupyter？¶

"两种方式：1）SSH 隧道——服务器启动 jupyter lab --no-browser，本地用 ssh -L 8888:localhost:8888 user@server 转发端口，浏览器访问 localhost:8888；2）直接暴露——配置 ip='0.0.0.0' 加密码，但不推荐在公网使用。团队协作可以部署 JupyterHub，每人一个账号。"

Q4：怎么保证 Jupyter 分析的可复现性？¶

"几个关键实践：1）开头用 watermark 记录所有包版本；2）固定随机种子；3）用 Papermill 参数化执行，避免手动改参数；4）nbconvert --execute 验证从头跑能得到同样结果；5）environment.yml 锁定 conda 环境；6）结果导出为 HTML 存档。"

Q5：Jupyter 中如何管理多个 conda 环境？¶

"给每个 conda 环境注册一个 kernel。先 conda activate 环境名，然后 pip install ipykernel，再用 python -m ipykernel install --user --name 环境名 --display-name '显示名' 注册。之后在 JupyterLab 中新建 notebook 时可以选择不同的 kernel，或者在运行时通过 Kernel > Change Kernel 切换。"

11. 速查表¶

# ===== 启动 =====
jupyter lab                          # 启动 JupyterLab
jupyter lab --port 8889              # 指定端口
jupyter lab --no-browser             # 不自动打开浏览器
jupyter notebook                     # 启动经典 Notebook

# ===== Kernel 管理 =====
jupyter kernelspec list              # 列出所有 kernel
python -m ipykernel install --user --name ENV_NAME --display-name "NAME"
jupyter kernelspec uninstall NAME    # 删除 kernel

# ===== 导出 =====
jupyter nbconvert --to html nb.ipynb        # 转 HTML
jupyter nbconvert --to pdf nb.ipynb         # 转 PDF
jupyter nbconvert --to script nb.ipynb      # 转 .py
jupyter nbconvert --execute --to html nb.ipynb  # 先执行再导出
jupyter nbconvert --no-input --to html nb.ipynb # 隐藏代码

# ===== 常用魔术命令 =====
%timeit expr                         # 计时（多次平均）
%time expr                           # 计时（单次）
%run script.py                       # 运行脚本
%load script.py                      # 加载代码到 cell
%who / %whos                         # 查看变量
%matplotlib inline                   # 内嵌图表
%load_ext autoreload                 # 加载自动重载
%autoreload 2                        # 启用自动重载
!command                             # 执行 shell 命令
%debug                               # 事后调试

# ===== 扩展 =====
pip install jupyterlab-git           # Git 可视化
pip install jupytext                 # ipynb <-> py 同步
pip install nbstripout; nbstripout --install  # 清除输出
pip install papermill                # 参数化执行

12. 延伸资源¶

资源	说明
JupyterLab 文档	官方文档（v4.5.7）
Jupyter Notebook 文档	经典版文档
nbviewer	在线预览 .ipynb 文件
Google Colab	免费 GPU 的云端 Jupyter
Papermill	参数化批量执行
Binder	将 GitHub 仓库变为可运行的 Notebook
jupytext 文档	Notebook 版本控制最佳实践

Jupyter 是探索工具而非生产工具。最佳实践：Jupyter 探索 → 提取到 .py → CI/CD 验证 → 部署。不要把所有代码都写在 notebook 里。