PyTorch 入门与实战¶
一句话说明:PyTorch 是 Facebook 开发的深度学习框架,像 NumPy 但能用 GPU 加速 + 自动求导,是目前学术界和工业界最流行的深度学习工具。
1. PyTorch 是什么¶
白话解释:想象 NumPy 是一把普通螺丝刀,PyTorch 就是电动螺丝刀——做的事一样(矩阵运算),但快得多(GPU加速),还自带"自动算斜率"功能(自动求导),让你不用手动算数学公式就能训练神经网络。
核心特点: - 动态计算图:代码写到哪算到哪,像写普通 Python 一样直觉(TensorFlow 1.x 需要先画图再运行) - Pythonic:API 设计贴近 Python 习惯,debug 友好 - GPU 加速:一行 .cuda() 就能把计算从 CPU 搬到显卡 - 生态丰富:torchvision(图像)、torchaudio(音频)、HuggingFace(NLP/生信大模型)
当前版本:PyTorch 2.11.0(2025年稳定版),支持 CUDA 12.6/12.8/13.0
2. 六大核心概念(白话解释)¶
2.1 Tensor(张量)¶
import torch
# 标量(0维)
scalar = torch.tensor(3.14)
# 向量(1维)—— 比如一个样本的3个特征
vector = torch.tensor([1.0, 2.0, 3.0])
# 矩阵(2维)—— 比如5个样本×3个特征
matrix = torch.tensor([[1, 2, 3],
[4, 5, 6]])
# 3D张量 —— 比如一批图片(batch × height × width)
tensor_3d = torch.randn(32, 28, 28) # 32张28×28灰度图
2.2 AutoGrad(自动求导)¶
# requires_grad=True 告诉 PyTorch:"这个变量要算梯度"
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2 + 3 * x # y = x² + 3x
y.backward() # 自动求导:dy/dx = 2x + 3 = 7
print(x.grad) # tensor(7.) ← PyTorch 自动算出来了!
2.3 nn.Module(神经网络模块)¶
import torch.nn as nn
# 定义一个简单网络:继承 nn.Module
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.layer1 = nn.Linear(10, 32) # 输入10维 → 输出32维
self.layer2 = nn.Linear(32, 1) # 32维 → 1维(预测值)
def forward(self, x): # 前向传播:数据怎么流过网络
x = torch.relu(self.layer1(x)) # 第一层 + 激活函数
x = self.layer2(x) # 第二层输出
return x
2.4 DataLoader(数据加载器)¶
from torch.utils.data import DataLoader, TensorDataset
# 假设有100个样本,每个10维特征
X = torch.randn(100, 10)
y = torch.randn(100, 1)
dataset = TensorDataset(X, y) # 打包成数据集
loader = DataLoader(dataset,
batch_size=16, # 每次取16个样本
shuffle=True, # 每个epoch打乱顺序
num_workers=2) # 2个子进程并行加载
2.5 Optimizer(优化器)¶
import torch.optim as optim
model = SimpleNet()
# Adam优化器:学习率0.001(步长)
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练一步的流程:
optimizer.zero_grad() # 1. 清空上一步的梯度
loss.backward() # 2. 计算梯度(反向传播)
optimizer.step() # 3. 更新参数(迈一步)
2.6 Loss Function(损失函数)¶
# 回归问题:均方误差(MSE)
loss_fn = nn.MSELoss()
# 分类问题:交叉熵损失
loss_fn = nn.CrossEntropyLoss()
# 计算 loss
predictions = model(X_batch) # 模型预测
loss = loss_fn(predictions, y_batch) # 和真实值比较
3. 安装配置¶
3.1 CPU 版(所有电脑通用)¶
# 方式1:pip 安装(推荐新手)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# 方式2:conda 安装
conda install pytorch torchvision torchaudio cpuonly -c pytorch
3.2 CUDA GPU 版(有 NVIDIA 显卡)¶
# 先确认 CUDA 版本
nvidia-smi # 看右上角 CUDA Version
# CUDA 12.6(推荐)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
# CUDA 12.8
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
# conda 安装(自动匹配 CUDA)
conda install pytorch torchvision torchaudio pytorch-cuda=12.6 -c pytorch -c nvidia
3.3 验证安装¶
import torch
print(torch.__version__) # 应输出 2.11.0 或更高
print(torch.cuda.is_available()) # True 表示 GPU 可用
print(torch.cuda.get_device_name(0)) # 显示显卡型号
# 快速测试 GPU 计算
x = torch.randn(1000, 1000).cuda() # 在 GPU 上创建矩阵
y = x @ x.T # GPU 矩阵乘法
print(y.shape) # torch.Size([1000, 1000])
3.4 conda 环境管理(推荐)¶
# 创建独立环境,避免包冲突
conda create -n pytorch_env python=3.11
conda activate pytorch_env
# 安装 PyTorch + 常用包
conda install pytorch torchvision torchaudio pytorch-cuda=12.6 -c pytorch -c nvidia
pip install matplotlib scikit-learn pandas jupyter
4. 实操教程¶
4.1 Tensor 基本操作¶
import torch
# ===== 创建 Tensor =====
zeros = torch.zeros(3, 4) # 3×4 全零矩阵
ones = torch.ones(2, 3) # 2×3 全一矩阵
rand = torch.randn(5, 5) # 5×5 标准正态分布随机数
arange = torch.arange(0, 10, 2) # [0, 2, 4, 6, 8] 等差序列
# 从 NumPy 转换(共享内存,不复制数据)
import numpy as np
np_arr = np.array([1, 2, 3])
tensor = torch.from_numpy(np_arr) # NumPy → Tensor
back = tensor.numpy() # Tensor → NumPy
# ===== 基本运算 =====
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])
print(a + b) # 逐元素加法:tensor([5., 7., 9.])
print(a * b) # 逐元素乘法:tensor([4., 10., 18.])
print(torch.dot(a, b)) # 点积:tensor(32.)
# 矩阵乘法(两种写法)
m1 = torch.randn(3, 4)
m2 = torch.randn(4, 5)
result = m1 @ m2 # 方式1:@ 运算符
result = torch.matmul(m1, m2) # 方式2:函数
# ===== 形状操作 =====
x = torch.randn(2, 3, 4) # shape: [2, 3, 4]
x_reshaped = x.view(6, 4) # 重塑为 [6, 4](总元素数不变)
x_flat = x.flatten() # 展平为1维:[24]
x_t = x.permute(0, 2, 1) # 维度交换:[2, 4, 3]
# ===== GPU 操作 =====
if torch.cuda.is_available():
device = torch.device('cuda') # 定义设备
x_gpu = x.to(device) # 搬到 GPU
x_cpu = x_gpu.to('cpu') # 搬回 CPU
# 或者简写
x_gpu = x.cuda()
x_cpu = x_gpu.cpu()
4.2 简单线性回归(从零开始)¶
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
# ===== 1. 生成模拟数据 =====
# 真实关系:y = 3x + 2 + 噪声
torch.manual_seed(42) # 固定随机种子,保证可复现
X = torch.linspace(0, 10, 100).unsqueeze(1) # 100个点,变成列向量 [100,1]
y = 3 * X + 2 + torch.randn(100, 1) * 0.5 # 加入噪声
# ===== 2. 定义模型 =====
model = nn.Linear(1, 1) # 输入1维,输出1维(即 y = wx + b)
# ===== 3. 定义损失函数和优化器 =====
loss_fn = nn.MSELoss() # 均方误差
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # 随机梯度下降
# ===== 4. 训练循环 =====
losses = [] # 记录每个epoch的loss
for epoch in range(100):
# 前向传播:模型预测
y_pred = model(X)
# 计算损失
loss = loss_fn(y_pred, y)
losses.append(loss.item())
# 反向传播 + 更新参数
optimizer.zero_grad() # 清空梯度(重要!不清空会累加)
loss.backward() # 计算梯度
optimizer.step() # 更新参数
if (epoch + 1) % 20 == 0:
print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')
# ===== 5. 查看学到的参数 =====
w = model.weight.item() # 应接近 3
b = model.bias.item() # 应接近 2
print(f'学到的参数: w={w:.4f}, b={b:.4f}')
print(f'真实参数: w=3.0000, b=2.0000')
4.3 手写数字分类 CNN(完整代码)¶
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
# ===== 1. 数据准备 =====
# 定义数据预处理:转Tensor + 标准化
transform = transforms.Compose([
transforms.ToTensor(), # 图片 → [0,1] 的 Tensor
transforms.Normalize((0.1307,), (0.3081,)) # MNIST 均值和标准差
])
# 下载 MNIST 数据集(手写数字 0-9,28×28 灰度图)
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)
# 创建数据加载器
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)
# ===== 2. 定义 CNN 模型 =====
class MnistCNN(nn.Module):
"""
卷积神经网络结构:
输入 [1, 28, 28] → Conv1 → Conv2 → FC1 → FC2 → 输出 [10]
"""
def __init__(self):
super().__init__()
# 卷积层1:1个输入通道(灰度图),32个输出通道,3×3卷积核
self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
# 卷积层2:32输入通道,64输出通道
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
# 池化层:2×2 最大池化(尺寸减半)
self.pool = nn.MaxPool2d(2, 2)
# 全连接层:64×7×7 → 128
self.fc1 = nn.Linear(64 * 7 * 7, 128)
# 输出层:128 → 10(10个数字类别)
self.fc2 = nn.Linear(128, 10)
# Dropout:训练时随机关闭25%的神经元(防止过拟合)
self.dropout = nn.Dropout(0.25)
def forward(self, x):
# x: [batch, 1, 28, 28]
x = self.pool(torch.relu(self.conv1(x))) # → [batch, 32, 14, 14]
x = self.pool(torch.relu(self.conv2(x))) # → [batch, 64, 7, 7]
x = x.flatten(1) # → [batch, 64*7*7=3136]
x = torch.relu(self.fc1(x)) # → [batch, 128]
x = self.dropout(x) # Dropout
x = self.fc2(x) # → [batch, 10]
return x
# ===== 3. 初始化 =====
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MnistCNN().to(device) # 模型搬到 GPU/CPU
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss() # 多分类用交叉熵
# ===== 4. 训练函数 =====
def train_one_epoch(model, loader, optimizer, loss_fn, device):
model.train() # 切换到训练模式(启用 Dropout)
total_loss = 0
correct = 0
for batch_X, batch_y in loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
optimizer.zero_grad() # 清空梯度
output = model(batch_X) # 前向传播
loss = loss_fn(output, batch_y) # 计算损失
loss.backward() # 反向传播
optimizer.step() # 更新参数
total_loss += loss.item()
pred = output.argmax(dim=1) # 取概率最大的类别
correct += (pred == batch_y).sum().item()
avg_loss = total_loss / len(loader)
accuracy = correct / len(loader.dataset)
return avg_loss, accuracy
# ===== 5. 评估函数 =====
def evaluate(model, loader, loss_fn, device):
model.eval() # 切换到评估模式(关闭 Dropout)
total_loss = 0
correct = 0
with torch.no_grad(): # 评估时不需要算梯度(省内存)
for batch_X, batch_y in loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
output = model(batch_X)
loss = loss_fn(output, batch_y)
total_loss += loss.item()
pred = output.argmax(dim=1)
correct += (pred == batch_y).sum().item()
avg_loss = total_loss / len(loader)
accuracy = correct / len(loader.dataset)
return avg_loss, accuracy
# ===== 6. 训练循环 =====
num_epochs = 10
for epoch in range(num_epochs):
train_loss, train_acc = train_one_epoch(model, train_loader, optimizer, loss_fn, device)
test_loss, test_acc = evaluate(model, test_loader, loss_fn, device)
print(f'Epoch [{epoch+1}/{num_epochs}] '
f'Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f} | '
f'Test Loss: {test_loss:.4f}, Test Acc: {test_acc:.4f}')
# 预期结果:10个epoch后测试准确率 > 99%
4.4 保存和加载模型¶
# ===== 保存模型 =====
# 方式1:只保存参数(推荐,文件小,跨版本兼容)
torch.save(model.state_dict(), 'mnist_cnn.pth')
# 方式2:保存整个模型(包括结构,但可能有兼容性问题)
torch.save(model, 'mnist_cnn_full.pth')
# 保存 checkpoint(训练中断时可恢复)
torch.save({
'epoch': epoch,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'loss': train_loss,
}, 'checkpoint.pth')
# ===== 加载模型 =====
# 方式1:加载参数
model_loaded = MnistCNN() # 先创建相同结构的模型
model_loaded.load_state_dict(torch.load('mnist_cnn.pth', weights_only=True))
model_loaded.eval() # 切换到推理模式
# 方式2:加载整个模型
model_loaded = torch.load('mnist_cnn_full.pth', weights_only=False)
# 恢复训练
checkpoint = torch.load('checkpoint.pth', weights_only=True)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
start_epoch = checkpoint['epoch']
5. 生信实战:DNA 序列分类器¶
用 PyTorch 训练一个简单的 CNN,判断 DNA 序列是启动子(promoter)还是非启动子。
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# ===== 1. DNA 序列编码 =====
def one_hot_encode(seq):
"""
将 DNA 序列转为 one-hot 编码
A=[1,0,0,0], T=[0,1,0,0], G=[0,0,1,0], C=[0,0,0,1]
输入: "ATGC" → 输出: 4×4 矩阵
"""
mapping = {'A': 0, 'T': 1, 'G': 2, 'C': 3}
encoded = np.zeros((4, len(seq)), dtype=np.float32) # [4, seq_len]
for i, base in enumerate(seq):
if base in mapping:
encoded[mapping[base], i] = 1.0
return encoded
# ===== 2. 生成模拟数据 =====
# 实际项目中应该用真实数据(如 EPDnew 启动子数据库)
np.random.seed(42)
seq_length = 100 # 每条序列100bp
def generate_promoter(n=500):
"""生成模拟启动子序列(含 TATA-box 模体)"""
seqs = []
for _ in range(n):
seq = list(np.random.choice(['A', 'T', 'G', 'C'], seq_length))
# 在位置25-32插入 TATA-box(启动子的标志性模体)
tata = list('TATAAA')
pos = np.random.randint(20, 35)
seq[pos:pos+len(tata)] = tata
seqs.append(''.join(seq))
return seqs
def generate_non_promoter(n=500):
"""生成随机非启动子序列"""
seqs = []
for _ in range(n):
seq = np.random.choice(['A', 'T', 'G', 'C'], seq_length)
seqs.append(''.join(seq))
return seqs
# 生成数据
promoters = generate_promoter(500) # 500条启动子
non_promoters = generate_non_promoter(500) # 500条非启动子
# 编码 + 标签
all_seqs = promoters + non_promoters
all_labels = [1] * 500 + [0] * 500 # 1=启动子, 0=非启动子
X = np.array([one_hot_encode(seq) for seq in all_seqs]) # [1000, 4, 100]
y = np.array(all_labels)
# 划分训练集/测试集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# 转为 PyTorch Tensor
X_train_t = torch.FloatTensor(X_train) # [800, 4, 100]
y_train_t = torch.LongTensor(y_train) # [800]
X_test_t = torch.FloatTensor(X_test) # [200, 4, 100]
y_test_t = torch.LongTensor(y_test) # [200]
# 创建 DataLoader
train_dataset = TensorDataset(X_train_t, y_train_t)
test_dataset = TensorDataset(X_test_t, y_test_t)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
# ===== 3. 定义 DNA 序列分类 CNN =====
class DNACNN(nn.Module):
"""
1D CNN 用于 DNA 序列分类
输入: [batch, 4, 100](4通道=ATGC,长度100bp)
输出: [batch, 2](二分类:启动子 vs 非启动子)
"""
def __init__(self, seq_length=100):
super().__init__()
# 1D卷积:把"4通道×100长度"看作1D信号
self.conv1 = nn.Conv1d(4, 32, kernel_size=8, padding=3) # 捕获短模体
self.conv2 = nn.Conv1d(32, 64, kernel_size=5, padding=2) # 捕获组合模式
self.pool = nn.MaxPool1d(2) # 长度减半
self.dropout = nn.Dropout(0.3)
# 计算全连接层输入维度:100 → pool → 50 → pool → 25
self.fc1 = nn.Linear(64 * 25, 64)
self.fc2 = nn.Linear(64, 2) # 二分类输出
def forward(self, x):
# x: [batch, 4, 100]
x = self.pool(torch.relu(self.conv1(x))) # → [batch, 32, 50]
x = self.pool(torch.relu(self.conv2(x))) # → [batch, 64, 25]
x = self.dropout(x)
x = x.flatten(1) # → [batch, 1600]
x = torch.relu(self.fc1(x)) # → [batch, 64]
x = self.fc2(x) # → [batch, 2]
return x
# ===== 4. 训练 =====
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = DNACNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.CrossEntropyLoss()
for epoch in range(20):
model.train()
total_loss = 0
for batch_X, batch_y in train_loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
optimizer.zero_grad()
output = model(batch_X)
loss = loss_fn(output, batch_y)
loss.backward()
optimizer.step()
total_loss += loss.item()
if (epoch + 1) % 5 == 0:
# 评估
model.eval()
correct = 0
with torch.no_grad():
for batch_X, batch_y in test_loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
pred = model(batch_X).argmax(dim=1)
correct += (pred == batch_y).sum().item()
acc = correct / len(test_dataset)
print(f'Epoch [{epoch+1}/20] Loss: {total_loss/len(train_loader):.4f}, Test Acc: {acc:.4f}')
# ===== 5. 最终评估 =====
model.eval()
all_preds = []
with torch.no_grad():
for batch_X, batch_y in test_loader:
batch_X = batch_X.to(device)
pred = model(batch_X).argmax(dim=1).cpu().numpy()
all_preds.extend(pred)
print("\n分类报告:")
print(classification_report(y_test, all_preds, target_names=['非启动子', '启动子']))
生信扩展应用: - 蛋白质功能预测:将氨基酸序列 one-hot 编码后用 CNN/RNN 分类 - 变异致病性预测:输入变异位点上下文序列,预测是否致病 - 结合 DNABERT:用预训练的 DNA 语言模型做迁移学习(HuggingFace)
6. PyTorch vs TensorFlow 对比表¶
| 对比维度 | PyTorch | TensorFlow |
|---|---|---|
| 开发方 | Meta (Facebook) | |
| 计算图 | 动态图(即时执行) | TF2 默认 Eager,也支持静态图 |
| 调试 | 直接用 Python print/pdb | TF2 改善了,但仍不如 PyTorch 直觉 |
| 学术界占比 | ~75%+ 论文使用 | ~20% |
| 工业部署 | TorchServe / ONNX | TF Serving / TFLite(更成熟) |
| 移动端 | PyTorch Mobile / ExecuTorch | TFLite(更成熟) |
| API 风格 | Pythonic,OOP 风格 | Keras 高层 API(简洁但灵活度略低) |
| 生态 | HuggingFace/timm/PyG | TFHub / TF-Addons |
| 学习曲线 | 对 Python 程序员更友好 | Keras 入门简单,底层复杂 |
| 分布式训练 | DDP / FSDP | tf.distribute(更自动化) |
| 生信领域 | 主流(AlphaFold2用JAX,ESM用PyTorch) | 早期工具较多,现在减少 |
结论:2025年做生信深度学习,优先选 PyTorch(论文多、HuggingFace 生态好、调试方便)。
7. 8G 显存优化技巧¶
当显存不够时(常见于 8G 消费级显卡如 RTX 3060/4060),使用以下技巧:
7.1 混合精度训练(AMP)¶
from torch.amp import autocast, GradScaler
# 原理:用 float16 做前向/反向传播(省一半显存),float32 做参数更新(保精度)
scaler = GradScaler() # 梯度缩放器,防止 float16 下溢
for batch_X, batch_y in train_loader:
optimizer.zero_grad()
# autocast 区域内自动用 float16 计算
with autocast(device_type='cuda'):
output = model(batch_X.cuda())
loss = loss_fn(output, batch_y.cuda())
# 用 scaler 缩放 loss 再反向传播
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
# 效果:显存减少 30-50%,训练速度提升 20-50%
7.2 梯度累积¶
# 原理:小 batch 多次累积梯度,等效于大 batch 训练
accumulation_steps = 4 # 每4步更新一次 = 等效 batch_size×4
for i, (batch_X, batch_y) in enumerate(train_loader):
with autocast(device_type='cuda'):
output = model(batch_X.cuda())
loss = loss_fn(output, batch_y.cuda())
loss = loss / accumulation_steps # 平均损失
loss.backward() # 梯度累积(不 zero_grad)
if (i + 1) % accumulation_steps == 0:
optimizer.step() # 累积够了才更新
optimizer.zero_grad() # 清空梯度
# 效果:batch_size=16 + accumulation_steps=4 ≈ batch_size=64 的效果
7.3 梯度检查点(Gradient Checkpointing)¶
from torch.utils.checkpoint import checkpoint
class LargeModel(nn.Module):
def __init__(self):
super().__init__()
self.block1 = nn.Sequential(nn.Linear(1024, 1024), nn.ReLU())
self.block2 = nn.Sequential(nn.Linear(1024, 1024), nn.ReLU())
self.block3 = nn.Sequential(nn.Linear(1024, 10))
def forward(self, x):
# 用 checkpoint 包裹中间层:前向传播不保存中间结果,反向传播时重新计算
x = checkpoint(self.block1, x, use_reentrant=False)
x = checkpoint(self.block2, x, use_reentrant=False)
x = self.block3(x)
return x
# 效果:显存减少 40-60%,但训练速度慢 20-30%(用时间换空间)
7.4 其他实用技巧¶
# 1. 及时释放不用的 Tensor
del intermediate_tensor
torch.cuda.empty_cache() # 清理 GPU 缓存(不一定立即释放)
# 2. 推理时关闭梯度计算
with torch.no_grad(): # 省约 50% 显存
predictions = model(x)
# 3. 用更小的数据类型加载模型
model = model.half() # 全部转 float16(推理时可用)
# 4. 数据加载不放 GPU
# 错误:把整个数据集放 GPU
# X_all = X_all.cuda() # 占大量显存!
# 正确:每个 batch 临时搬过去
for batch_X, batch_y in loader:
batch_X = batch_X.cuda() # 只有当前 batch 占 GPU 显存
8. 常见报错及解决方案¶
报错 1:CUDA out of memory¶
原因:显存不够
解决:
# 1. 减小 batch_size(最简单)
train_loader = DataLoader(dataset, batch_size=16) # 从64改成16
# 2. 用混合精度(见7.1)
# 3. 用梯度累积(见7.2)
# 4. 清理缓存
torch.cuda.empty_cache()
报错 2:Expected all tensors on same device¶
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
原因:模型在 GPU,数据在 CPU(或反过来)
解决:
# 确保模型和数据在同一设备
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
X = X.to(device)
y = y.to(device)
报错 3:size mismatch¶
原因:全连接层输入维度算错了
解决:
# 先用 dummy input 打印中间尺寸
x = torch.randn(1, 1, 28, 28) # 模拟一个输入
x = self.pool(torch.relu(self.conv1(x)))
print(x.shape) # 看这里的尺寸,用它来设 Linear 的 in_features
报错 4:gradient computation has been modified by an inplace operation¶
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
原因:使用了原地操作(如 x += 1 或 x.relu_())干扰了自动求导
解决:
报错 5:num_samples=0¶
原因:数据集为空或路径错误
解决:
报错 6:版本不兼容¶
原因:PyTorch 版本问题,新版 API 路径变了
解决:
# PyTorch 2.x 新写法(推荐)
from torch.amp import autocast, GradScaler
# PyTorch 1.x 旧写法(兼容)
from torch.cuda.amp import autocast, GradScaler
9. 速查表¶
常用 Tensor 操作¶
| 操作 | 代码 | 说明 |
|---|---|---|
| 创建零矩阵 | torch.zeros(3, 4) | 3×4 全零 |
| 随机正态 | torch.randn(3, 4) | 标准正态分布 |
| 从 NumPy | torch.from_numpy(arr) | 共享内存 |
| 改形状 | x.view(2, -1) 或 x.reshape(2, -1) | -1 自动计算 |
| 拼接 | torch.cat([a, b], dim=0) | 沿维度拼接 |
| 堆叠 | torch.stack([a, b], dim=0) | 新增维度堆叠 |
| 转置 | x.T 或 x.permute(1, 0) | 交换维度 |
| 搬到 GPU | x.to('cuda') 或 x.cuda() | — |
| 转 NumPy | x.cpu().detach().numpy() | 需先搬回 CPU |
训练循环模板¶
for epoch in range(num_epochs):
model.train()
for batch_X, batch_y in train_loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
optimizer.zero_grad()
output = model(batch_X)
loss = loss_fn(output, batch_y)
loss.backward()
optimizer.step()
model.eval()
with torch.no_grad():
# 验证...
常用层速查¶
| 层 | 代码 | 用途 |
|---|---|---|
| 全连接 | nn.Linear(in, out) | 基础变换 |
| 1D卷积 | nn.Conv1d(in_ch, out_ch, kernel) | 序列(DNA/信号) |
| 2D卷积 | nn.Conv2d(in_ch, out_ch, kernel) | 图像 |
| LSTM | nn.LSTM(input, hidden, num_layers) | 时序/序列 |
| Transformer | nn.TransformerEncoder(...) | NLP/蛋白质 |
| BatchNorm | nn.BatchNorm1d(features) | 加速收敛 |
| Dropout | nn.Dropout(p=0.5) | 防过拟合 |
| MaxPool | nn.MaxPool1d(2) / nn.MaxPool2d(2) | 下采样 |
常用优化器¶
| 优化器 | 代码 | 适用场景 |
|---|---|---|
| SGD | optim.SGD(params, lr=0.01, momentum=0.9) | 大规模、需精调 |
| Adam | optim.Adam(params, lr=0.001) | 默认首选 |
| AdamW | optim.AdamW(params, lr=0.001, weight_decay=0.01) | Transformer |
10. 延伸资源¶
官方资源¶
- PyTorch 官方教程:https://pytorch.org/tutorials/
- PyTorch 文档:https://pytorch.org/docs/stable/
- PyTorch 论坛:https://discuss.pytorch.org/
生信 + 深度学习¶
- DNABERT-2:DNA 预训练语言模型(HuggingFace 可直接用)
- ESM-2:Meta 的蛋白质语言模型(PyTorch 实现)
- scGPT:单细胞转录组预训练模型
- DeepVariant:Google 的变异检测工具(TensorFlow,可用 ONNX 转 PyTorch)
推荐学习路径¶
- 跑通本文的线性回归和 MNIST 例子
- 尝试修改 DNA 分类器(换数据集、调结构)
- 学习 HuggingFace Transformers 库(调用预训练模型)
- 实战:用 DNABERT 做自己项目的序列分类
进阶主题¶
- 分布式训练:
torch.nn.parallel.DistributedDataParallel - 模型量化:
torch.quantization(部署时减小模型体积) - ONNX 导出:
torch.onnx.export()(跨框架部署) - TorchScript:
torch.jit.script()(生产环境加速)
最后提醒:深度学习的核心不是框架,而是理解"数据→模型→损失→梯度→更新"这个循环。PyTorch 只是帮你高效实现这个循环的工具。先把线性回归跑通、理解每一行代码的含义,再去挑战更复杂的网络。