跳转至

PyTorch 入门与实战

一句话说明:PyTorch 是 Facebook 开发的深度学习框架,像 NumPy 但能用 GPU 加速 + 自动求导,是目前学术界和工业界最流行的深度学习工具。


1. PyTorch 是什么

白话解释:想象 NumPy 是一把普通螺丝刀,PyTorch 就是电动螺丝刀——做的事一样(矩阵运算),但快得多(GPU加速),还自带"自动算斜率"功能(自动求导),让你不用手动算数学公式就能训练神经网络。

核心特点: - 动态计算图:代码写到哪算到哪,像写普通 Python 一样直觉(TensorFlow 1.x 需要先画图再运行) - Pythonic:API 设计贴近 Python 习惯,debug 友好 - GPU 加速:一行 .cuda() 就能把计算从 CPU 搬到显卡 - 生态丰富:torchvision(图像)、torchaudio(音频)、HuggingFace(NLP/生信大模型)

当前版本:PyTorch 2.11.0(2025年稳定版),支持 CUDA 12.6/12.8/13.0


2. 六大核心概念(白话解释)

2.1 Tensor(张量)

白话:就是"多维数组",1维=向量,2维=矩阵,3维=立方体数据...
类比:Excel 表格是 2D Tensor,一叠表格是 3D Tensor
import torch

# 标量(0维)
scalar = torch.tensor(3.14)

# 向量(1维)—— 比如一个样本的3个特征
vector = torch.tensor([1.0, 2.0, 3.0])

# 矩阵(2维)—— 比如5个样本×3个特征
matrix = torch.tensor([[1, 2, 3],
                       [4, 5, 6]])

# 3D张量 —— 比如一批图片(batch × height × width)
tensor_3d = torch.randn(32, 28, 28)  # 32张28×28灰度图

2.2 AutoGrad(自动求导)

白话:你告诉 PyTorch "我要算梯度",它就自动帮你算出"参数该往哪个方向调整"
类比:GPS导航——你只管设终点,它自动算出该左转还是右转
# requires_grad=True 告诉 PyTorch:"这个变量要算梯度"
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x  # y = x² + 3x

y.backward()         # 自动求导:dy/dx = 2x + 3 = 7
print(x.grad)        # tensor(7.) ← PyTorch 自动算出来了!

2.3 nn.Module(神经网络模块)

白话:搭积木的"积木块"——每个 Module 是一层或一整个网络
类比:乐高积木——小块拼大块,大块拼成城堡
import torch.nn as nn

# 定义一个简单网络:继承 nn.Module
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(10, 32)   # 输入10维 → 输出32维
        self.layer2 = nn.Linear(32, 1)    # 32维 → 1维(预测值)

    def forward(self, x):                  # 前向传播:数据怎么流过网络
        x = torch.relu(self.layer1(x))    # 第一层 + 激活函数
        x = self.layer2(x)                # 第二层输出
        return x

2.4 DataLoader(数据加载器)

白话:自动把大数据集切成小批次喂给模型,还能打乱顺序、多线程加载
类比:自助餐传送带——食物(数据)自动一盘盘送到面前
from torch.utils.data import DataLoader, TensorDataset

# 假设有100个样本,每个10维特征
X = torch.randn(100, 10)
y = torch.randn(100, 1)

dataset = TensorDataset(X, y)           # 打包成数据集
loader = DataLoader(dataset,
                    batch_size=16,       # 每次取16个样本
                    shuffle=True,        # 每个epoch打乱顺序
                    num_workers=2)       # 2个子进程并行加载

2.5 Optimizer(优化器)

白话:根据梯度(方向)来更新模型参数(调整权重)
类比:下山的策略——SGD是蒙眼直走,Adam是带指南针+记忆的智能下山
import torch.optim as optim

model = SimpleNet()
# Adam优化器:学习率0.001(步长)
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 训练一步的流程:
optimizer.zero_grad()   # 1. 清空上一步的梯度
loss.backward()         # 2. 计算梯度(反向传播)
optimizer.step()        # 3. 更新参数(迈一步)

2.6 Loss Function(损失函数)

白话:衡量"模型预测值"和"真实值"差多远的分数
类比:考试评分——Loss 越低说明答案越接近标准答案
# 回归问题:均方误差(MSE)
loss_fn = nn.MSELoss()

# 分类问题:交叉熵损失
loss_fn = nn.CrossEntropyLoss()

# 计算 loss
predictions = model(X_batch)            # 模型预测
loss = loss_fn(predictions, y_batch)    # 和真实值比较

3. 安装配置

3.1 CPU 版(所有电脑通用)

# 方式1:pip 安装(推荐新手)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# 方式2:conda 安装
conda install pytorch torchvision torchaudio cpuonly -c pytorch

3.2 CUDA GPU 版(有 NVIDIA 显卡)

# 先确认 CUDA 版本
nvidia-smi  # 看右上角 CUDA Version

# CUDA 12.6(推荐)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

# CUDA 12.8
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# conda 安装(自动匹配 CUDA)
conda install pytorch torchvision torchaudio pytorch-cuda=12.6 -c pytorch -c nvidia

3.3 验证安装

import torch

print(torch.__version__)          # 应输出 2.11.0 或更高
print(torch.cuda.is_available())  # True 表示 GPU 可用
print(torch.cuda.get_device_name(0))  # 显示显卡型号

# 快速测试 GPU 计算
x = torch.randn(1000, 1000).cuda()  # 在 GPU 上创建矩阵
y = x @ x.T                          # GPU 矩阵乘法
print(y.shape)                        # torch.Size([1000, 1000])

3.4 conda 环境管理(推荐)

# 创建独立环境,避免包冲突
conda create -n pytorch_env python=3.11
conda activate pytorch_env

# 安装 PyTorch + 常用包
conda install pytorch torchvision torchaudio pytorch-cuda=12.6 -c pytorch -c nvidia
pip install matplotlib scikit-learn pandas jupyter

4. 实操教程

4.1 Tensor 基本操作

import torch

# ===== 创建 Tensor =====
zeros = torch.zeros(3, 4)        # 3×4 全零矩阵
ones = torch.ones(2, 3)          # 2×3 全一矩阵
rand = torch.randn(5, 5)         # 5×5 标准正态分布随机数
arange = torch.arange(0, 10, 2)  # [0, 2, 4, 6, 8] 等差序列

# 从 NumPy 转换(共享内存,不复制数据)
import numpy as np
np_arr = np.array([1, 2, 3])
tensor = torch.from_numpy(np_arr)  # NumPy → Tensor
back = tensor.numpy()              # Tensor → NumPy

# ===== 基本运算 =====
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print(a + b)          # 逐元素加法:tensor([5., 7., 9.])
print(a * b)          # 逐元素乘法:tensor([4., 10., 18.])
print(torch.dot(a, b))  # 点积:tensor(32.)

# 矩阵乘法(两种写法)
m1 = torch.randn(3, 4)
m2 = torch.randn(4, 5)
result = m1 @ m2           # 方式1:@ 运算符
result = torch.matmul(m1, m2)  # 方式2:函数

# ===== 形状操作 =====
x = torch.randn(2, 3, 4)    # shape: [2, 3, 4]
x_reshaped = x.view(6, 4)   # 重塑为 [6, 4](总元素数不变)
x_flat = x.flatten()         # 展平为1维:[24]
x_t = x.permute(0, 2, 1)    # 维度交换:[2, 4, 3]

# ===== GPU 操作 =====
if torch.cuda.is_available():
    device = torch.device('cuda')       # 定义设备
    x_gpu = x.to(device)                # 搬到 GPU
    x_cpu = x_gpu.to('cpu')             # 搬回 CPU
    # 或者简写
    x_gpu = x.cuda()
    x_cpu = x_gpu.cpu()

4.2 简单线性回归(从零开始)

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# ===== 1. 生成模拟数据 =====
# 真实关系:y = 3x + 2 + 噪声
torch.manual_seed(42)                    # 固定随机种子,保证可复现
X = torch.linspace(0, 10, 100).unsqueeze(1)  # 100个点,变成列向量 [100,1]
y = 3 * X + 2 + torch.randn(100, 1) * 0.5   # 加入噪声

# ===== 2. 定义模型 =====
model = nn.Linear(1, 1)  # 输入1维,输出1维(即 y = wx + b)

# ===== 3. 定义损失函数和优化器 =====
loss_fn = nn.MSELoss()                         # 均方误差
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)  # 随机梯度下降

# ===== 4. 训练循环 =====
losses = []  # 记录每个epoch的loss
for epoch in range(100):
    # 前向传播:模型预测
    y_pred = model(X)

    # 计算损失
    loss = loss_fn(y_pred, y)
    losses.append(loss.item())

    # 反向传播 + 更新参数
    optimizer.zero_grad()  # 清空梯度(重要!不清空会累加)
    loss.backward()        # 计算梯度
    optimizer.step()       # 更新参数

    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

# ===== 5. 查看学到的参数 =====
w = model.weight.item()  # 应接近 3
b = model.bias.item()    # 应接近 2
print(f'学到的参数: w={w:.4f}, b={b:.4f}')
print(f'真实参数: w=3.0000, b=2.0000')

4.3 手写数字分类 CNN(完整代码)

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

# ===== 1. 数据准备 =====
# 定义数据预处理:转Tensor + 标准化
transform = transforms.Compose([
    transforms.ToTensor(),                    # 图片 → [0,1] 的 Tensor
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST 均值和标准差
])

# 下载 MNIST 数据集(手写数字 0-9,28×28 灰度图)
train_dataset = datasets.MNIST('./data', train=True,  download=True, transform=transform)
test_dataset  = datasets.MNIST('./data', train=False, download=True, transform=transform)

# 创建数据加载器
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader  = DataLoader(test_dataset,  batch_size=1000, shuffle=False)

# ===== 2. 定义 CNN 模型 =====
class MnistCNN(nn.Module):
    """
    卷积神经网络结构:
    输入 [1, 28, 28] → Conv1 → Conv2 → FC1 → FC2 → 输出 [10]
    """
    def __init__(self):
        super().__init__()
        # 卷积层1:1个输入通道(灰度图),32个输出通道,3×3卷积核
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        # 卷积层2:32输入通道,64输出通道
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        # 池化层:2×2 最大池化(尺寸减半)
        self.pool = nn.MaxPool2d(2, 2)
        # 全连接层:64×7×7 → 128
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        # 输出层:128 → 10(10个数字类别)
        self.fc2 = nn.Linear(128, 10)
        # Dropout:训练时随机关闭25%的神经元(防止过拟合)
        self.dropout = nn.Dropout(0.25)

    def forward(self, x):
        # x: [batch, 1, 28, 28]
        x = self.pool(torch.relu(self.conv1(x)))   # → [batch, 32, 14, 14]
        x = self.pool(torch.relu(self.conv2(x)))   # → [batch, 64, 7, 7]
        x = x.flatten(1)                            # → [batch, 64*7*7=3136]
        x = torch.relu(self.fc1(x))                # → [batch, 128]
        x = self.dropout(x)                         # Dropout
        x = self.fc2(x)                            # → [batch, 10]
        return x

# ===== 3. 初始化 =====
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MnistCNN().to(device)              # 模型搬到 GPU/CPU
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()            # 多分类用交叉熵

# ===== 4. 训练函数 =====
def train_one_epoch(model, loader, optimizer, loss_fn, device):
    model.train()                           # 切换到训练模式(启用 Dropout)
    total_loss = 0
    correct = 0

    for batch_X, batch_y in loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)

        optimizer.zero_grad()               # 清空梯度
        output = model(batch_X)             # 前向传播
        loss = loss_fn(output, batch_y)     # 计算损失
        loss.backward()                     # 反向传播
        optimizer.step()                    # 更新参数

        total_loss += loss.item()
        pred = output.argmax(dim=1)         # 取概率最大的类别
        correct += (pred == batch_y).sum().item()

    avg_loss = total_loss / len(loader)
    accuracy = correct / len(loader.dataset)
    return avg_loss, accuracy

# ===== 5. 评估函数 =====
def evaluate(model, loader, loss_fn, device):
    model.eval()                            # 切换到评估模式(关闭 Dropout)
    total_loss = 0
    correct = 0

    with torch.no_grad():                   # 评估时不需要算梯度(省内存)
        for batch_X, batch_y in loader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)
            output = model(batch_X)
            loss = loss_fn(output, batch_y)
            total_loss += loss.item()
            pred = output.argmax(dim=1)
            correct += (pred == batch_y).sum().item()

    avg_loss = total_loss / len(loader)
    accuracy = correct / len(loader.dataset)
    return avg_loss, accuracy

# ===== 6. 训练循环 =====
num_epochs = 10

for epoch in range(num_epochs):
    train_loss, train_acc = train_one_epoch(model, train_loader, optimizer, loss_fn, device)
    test_loss, test_acc = evaluate(model, test_loader, loss_fn, device)

    print(f'Epoch [{epoch+1}/{num_epochs}] '
          f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f} | '
          f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}')

# 预期结果:10个epoch后测试准确率 > 99%

4.4 保存和加载模型

# ===== 保存模型 =====
# 方式1:只保存参数(推荐,文件小,跨版本兼容)
torch.save(model.state_dict(), 'mnist_cnn.pth')

# 方式2:保存整个模型(包括结构,但可能有兼容性问题)
torch.save(model, 'mnist_cnn_full.pth')

# 保存 checkpoint(训练中断时可恢复)
torch.save({
    'epoch': epoch,
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'loss': train_loss,
}, 'checkpoint.pth')

# ===== 加载模型 =====
# 方式1:加载参数
model_loaded = MnistCNN()                          # 先创建相同结构的模型
model_loaded.load_state_dict(torch.load('mnist_cnn.pth', weights_only=True))
model_loaded.eval()                                 # 切换到推理模式

# 方式2:加载整个模型
model_loaded = torch.load('mnist_cnn_full.pth', weights_only=False)

# 恢复训练
checkpoint = torch.load('checkpoint.pth', weights_only=True)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
start_epoch = checkpoint['epoch']

5. 生信实战:DNA 序列分类器

用 PyTorch 训练一个简单的 CNN,判断 DNA 序列是启动子(promoter)还是非启动子。

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# ===== 1. DNA 序列编码 =====
def one_hot_encode(seq):
    """
    将 DNA 序列转为 one-hot 编码
    A=[1,0,0,0], T=[0,1,0,0], G=[0,0,1,0], C=[0,0,0,1]
    输入: "ATGC" → 输出: 4×4 矩阵
    """
    mapping = {'A': 0, 'T': 1, 'G': 2, 'C': 3}
    encoded = np.zeros((4, len(seq)), dtype=np.float32)  # [4, seq_len]
    for i, base in enumerate(seq):
        if base in mapping:
            encoded[mapping[base], i] = 1.0
    return encoded

# ===== 2. 生成模拟数据 =====
# 实际项目中应该用真实数据(如 EPDnew 启动子数据库)
np.random.seed(42)
seq_length = 100  # 每条序列100bp

def generate_promoter(n=500):
    """生成模拟启动子序列(含 TATA-box 模体)"""
    seqs = []
    for _ in range(n):
        seq = list(np.random.choice(['A', 'T', 'G', 'C'], seq_length))
        # 在位置25-32插入 TATA-box(启动子的标志性模体)
        tata = list('TATAAA')
        pos = np.random.randint(20, 35)
        seq[pos:pos+len(tata)] = tata
        seqs.append(''.join(seq))
    return seqs

def generate_non_promoter(n=500):
    """生成随机非启动子序列"""
    seqs = []
    for _ in range(n):
        seq = np.random.choice(['A', 'T', 'G', 'C'], seq_length)
        seqs.append(''.join(seq))
    return seqs

# 生成数据
promoters = generate_promoter(500)           # 500条启动子
non_promoters = generate_non_promoter(500)   # 500条非启动子

# 编码 + 标签
all_seqs = promoters + non_promoters
all_labels = [1] * 500 + [0] * 500           # 1=启动子, 0=非启动子

X = np.array([one_hot_encode(seq) for seq in all_seqs])  # [1000, 4, 100]
y = np.array(all_labels)

# 划分训练集/测试集
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# 转为 PyTorch Tensor
X_train_t = torch.FloatTensor(X_train)   # [800, 4, 100]
y_train_t = torch.LongTensor(y_train)    # [800]
X_test_t = torch.FloatTensor(X_test)     # [200, 4, 100]
y_test_t = torch.LongTensor(y_test)      # [200]

# 创建 DataLoader
train_dataset = TensorDataset(X_train_t, y_train_t)
test_dataset = TensorDataset(X_test_t, y_test_t)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# ===== 3. 定义 DNA 序列分类 CNN =====
class DNACNN(nn.Module):
    """
    1D CNN 用于 DNA 序列分类
    输入: [batch, 4, 100](4通道=ATGC,长度100bp)
    输出: [batch, 2](二分类:启动子 vs 非启动子)
    """
    def __init__(self, seq_length=100):
        super().__init__()
        # 1D卷积:把"4通道×100长度"看作1D信号
        self.conv1 = nn.Conv1d(4, 32, kernel_size=8, padding=3)   # 捕获短模体
        self.conv2 = nn.Conv1d(32, 64, kernel_size=5, padding=2)  # 捕获组合模式
        self.pool = nn.MaxPool1d(2)           # 长度减半
        self.dropout = nn.Dropout(0.3)

        # 计算全连接层输入维度:100 → pool → 50 → pool → 25
        self.fc1 = nn.Linear(64 * 25, 64)
        self.fc2 = nn.Linear(64, 2)           # 二分类输出

    def forward(self, x):
        # x: [batch, 4, 100]
        x = self.pool(torch.relu(self.conv1(x)))   # → [batch, 32, 50]
        x = self.pool(torch.relu(self.conv2(x)))   # → [batch, 64, 25]
        x = self.dropout(x)
        x = x.flatten(1)                            # → [batch, 1600]
        x = torch.relu(self.fc1(x))                # → [batch, 64]
        x = self.fc2(x)                            # → [batch, 2]
        return x

# ===== 4. 训练 =====
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = DNACNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()

for epoch in range(20):
    model.train()
    total_loss = 0
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)

        optimizer.zero_grad()
        output = model(batch_X)
        loss = loss_fn(output, batch_y)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

    if (epoch + 1) % 5 == 0:
        # 评估
        model.eval()
        correct = 0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                batch_X, batch_y = batch_X.to(device), batch_y.to(device)
                pred = model(batch_X).argmax(dim=1)
                correct += (pred == batch_y).sum().item()
        acc = correct / len(test_dataset)
        print(f'Epoch [{epoch+1}/20] Loss: {total_loss/len(train_loader):.4f}, Test Acc: {acc:.4f}')

# ===== 5. 最终评估 =====
model.eval()
all_preds = []
with torch.no_grad():
    for batch_X, batch_y in test_loader:
        batch_X = batch_X.to(device)
        pred = model(batch_X).argmax(dim=1).cpu().numpy()
        all_preds.extend(pred)

print("\n分类报告:")
print(classification_report(y_test, all_preds, target_names=['非启动子', '启动子']))

生信扩展应用: - 蛋白质功能预测:将氨基酸序列 one-hot 编码后用 CNN/RNN 分类 - 变异致病性预测:输入变异位点上下文序列,预测是否致病 - 结合 DNABERT:用预训练的 DNA 语言模型做迁移学习(HuggingFace)


6. PyTorch vs TensorFlow 对比表

对比维度PyTorchTensorFlow
开发方Meta (Facebook)Google
计算图动态图(即时执行)TF2 默认 Eager,也支持静态图
调试直接用 Python print/pdbTF2 改善了,但仍不如 PyTorch 直觉
学术界占比~75%+ 论文使用~20%
工业部署TorchServe / ONNXTF Serving / TFLite(更成熟)
移动端PyTorch Mobile / ExecuTorchTFLite(更成熟)
API 风格Pythonic,OOP 风格Keras 高层 API(简洁但灵活度略低)
生态HuggingFace/timm/PyGTFHub / TF-Addons
学习曲线对 Python 程序员更友好Keras 入门简单,底层复杂
分布式训练DDP / FSDPtf.distribute(更自动化)
生信领域主流(AlphaFold2用JAX,ESM用PyTorch)早期工具较多,现在减少

结论:2025年做生信深度学习,优先选 PyTorch(论文多、HuggingFace 生态好、调试方便)。


7. 8G 显存优化技巧

当显存不够时(常见于 8G 消费级显卡如 RTX 3060/4060),使用以下技巧:

7.1 混合精度训练(AMP)

from torch.amp import autocast, GradScaler

# 原理:用 float16 做前向/反向传播(省一半显存),float32 做参数更新(保精度)
scaler = GradScaler()  # 梯度缩放器,防止 float16 下溢

for batch_X, batch_y in train_loader:
    optimizer.zero_grad()

    # autocast 区域内自动用 float16 计算
    with autocast(device_type='cuda'):
        output = model(batch_X.cuda())
        loss = loss_fn(output, batch_y.cuda())

    # 用 scaler 缩放 loss 再反向传播
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()

# 效果:显存减少 30-50%,训练速度提升 20-50%

7.2 梯度累积

# 原理:小 batch 多次累积梯度,等效于大 batch 训练
accumulation_steps = 4  # 每4步更新一次 = 等效 batch_size×4

for i, (batch_X, batch_y) in enumerate(train_loader):
    with autocast(device_type='cuda'):
        output = model(batch_X.cuda())
        loss = loss_fn(output, batch_y.cuda())
        loss = loss / accumulation_steps  # 平均损失

    loss.backward()  # 梯度累积(不 zero_grad)

    if (i + 1) % accumulation_steps == 0:
        optimizer.step()       # 累积够了才更新
        optimizer.zero_grad()  # 清空梯度

# 效果:batch_size=16 + accumulation_steps=4 ≈ batch_size=64 的效果

7.3 梯度检查点(Gradient Checkpointing)

from torch.utils.checkpoint import checkpoint

class LargeModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(nn.Linear(1024, 1024), nn.ReLU())
        self.block2 = nn.Sequential(nn.Linear(1024, 1024), nn.ReLU())
        self.block3 = nn.Sequential(nn.Linear(1024, 10))

    def forward(self, x):
        # 用 checkpoint 包裹中间层:前向传播不保存中间结果,反向传播时重新计算
        x = checkpoint(self.block1, x, use_reentrant=False)
        x = checkpoint(self.block2, x, use_reentrant=False)
        x = self.block3(x)
        return x

# 效果:显存减少 40-60%,但训练速度慢 20-30%(用时间换空间)

7.4 其他实用技巧

# 1. 及时释放不用的 Tensor
del intermediate_tensor
torch.cuda.empty_cache()  # 清理 GPU 缓存(不一定立即释放)

# 2. 推理时关闭梯度计算
with torch.no_grad():       # 省约 50% 显存
    predictions = model(x)

# 3. 用更小的数据类型加载模型
model = model.half()  # 全部转 float16(推理时可用)

# 4. 数据加载不放 GPU
# 错误:把整个数据集放 GPU
# X_all = X_all.cuda()  # 占大量显存!
# 正确:每个 batch 临时搬过去
for batch_X, batch_y in loader:
    batch_X = batch_X.cuda()  # 只有当前 batch 占 GPU 显存

8. 常见报错及解决方案

报错 1:CUDA out of memory

RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB

原因:显存不够
解决

# 1. 减小 batch_size(最简单)
train_loader = DataLoader(dataset, batch_size=16)  # 从64改成16

# 2. 用混合精度(见7.1)
# 3. 用梯度累积(见7.2)
# 4. 清理缓存
torch.cuda.empty_cache()

报错 2:Expected all tensors on same device

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

原因:模型在 GPU,数据在 CPU(或反过来)
解决

# 确保模型和数据在同一设备
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
X = X.to(device)
y = y.to(device)

报错 3:size mismatch

RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x3136 and 2048x128)

原因:全连接层输入维度算错了
解决

# 先用 dummy input 打印中间尺寸
x = torch.randn(1, 1, 28, 28)  # 模拟一个输入
x = self.pool(torch.relu(self.conv1(x)))
print(x.shape)  # 看这里的尺寸,用它来设 Linear 的 in_features

报错 4:gradient computation has been modified by an inplace operation

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

原因:使用了原地操作(如 x += 1x.relu_())干扰了自动求导
解决

# 错误:原地操作
x += 1
x.relu_()

# 正确:非原地操作
x = x + 1
x = torch.relu(x)

报错 5:num_samples=0

ValueError: num_samples should be a positive integer value, but got num_samples=0

原因:数据集为空或路径错误
解决

# 检查数据集长度
print(len(dataset))  # 应该 > 0

# 检查数据路径是否正确
import os
print(os.path.exists('./data/MNIST'))

报错 6:版本不兼容

ImportError: cannot import name 'autocast' from 'torch.cuda.amp'

原因:PyTorch 版本问题,新版 API 路径变了
解决

# PyTorch 2.x 新写法(推荐)
from torch.amp import autocast, GradScaler

# PyTorch 1.x 旧写法(兼容)
from torch.cuda.amp import autocast, GradScaler


9. 速查表

常用 Tensor 操作

操作代码说明
创建零矩阵torch.zeros(3, 4)3×4 全零
随机正态torch.randn(3, 4)标准正态分布
从 NumPytorch.from_numpy(arr)共享内存
改形状x.view(2, -1)x.reshape(2, -1)-1 自动计算
拼接torch.cat([a, b], dim=0)沿维度拼接
堆叠torch.stack([a, b], dim=0)新增维度堆叠
转置x.Tx.permute(1, 0)交换维度
搬到 GPUx.to('cuda')x.cuda()
转 NumPyx.cpu().detach().numpy()需先搬回 CPU

训练循环模板

for epoch in range(num_epochs):
    model.train()
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        optimizer.zero_grad()
        output = model(batch_X)
        loss = loss_fn(output, batch_y)
        loss.backward()
        optimizer.step()

    model.eval()
    with torch.no_grad():
        # 验证...

常用层速查

代码用途
全连接nn.Linear(in, out)基础变换
1D卷积nn.Conv1d(in_ch, out_ch, kernel)序列(DNA/信号)
2D卷积nn.Conv2d(in_ch, out_ch, kernel)图像
LSTMnn.LSTM(input, hidden, num_layers)时序/序列
Transformernn.TransformerEncoder(...)NLP/蛋白质
BatchNormnn.BatchNorm1d(features)加速收敛
Dropoutnn.Dropout(p=0.5)防过拟合
MaxPoolnn.MaxPool1d(2) / nn.MaxPool2d(2)下采样

常用优化器

优化器代码适用场景
SGDoptim.SGD(params, lr=0.01, momentum=0.9)大规模、需精调
Adamoptim.Adam(params, lr=0.001)默认首选
AdamWoptim.AdamW(params, lr=0.001, weight_decay=0.01)Transformer

10. 延伸资源

官方资源

  • PyTorch 官方教程:https://pytorch.org/tutorials/
  • PyTorch 文档:https://pytorch.org/docs/stable/
  • PyTorch 论坛:https://discuss.pytorch.org/

生信 + 深度学习

  • DNABERT-2:DNA 预训练语言模型(HuggingFace 可直接用)
  • ESM-2:Meta 的蛋白质语言模型(PyTorch 实现)
  • scGPT:单细胞转录组预训练模型
  • DeepVariant:Google 的变异检测工具(TensorFlow,可用 ONNX 转 PyTorch)

推荐学习路径

  1. 跑通本文的线性回归和 MNIST 例子
  2. 尝试修改 DNA 分类器(换数据集、调结构)
  3. 学习 HuggingFace Transformers 库(调用预训练模型)
  4. 实战:用 DNABERT 做自己项目的序列分类

进阶主题

  • 分布式训练torch.nn.parallel.DistributedDataParallel
  • 模型量化torch.quantization(部署时减小模型体积)
  • ONNX 导出torch.onnx.export()(跨框架部署)
  • TorchScripttorch.jit.script()(生产环境加速)

最后提醒:深度学习的核心不是框架,而是理解"数据→模型→损失→梯度→更新"这个循环。PyTorch 只是帮你高效实现这个循环的工具。先把线性回归跑通、理解每一行代码的含义,再去挑战更复杂的网络。