735. 微生物组功能冗余分析¶

一句话概述：分析微生物群落中多少不同的菌种能执行相同的功能——就像一个公司里有多少人会同一项技能，冗余越高，群落越"抗打"。

核心知识点速查表¶

概念	白话解释	关键工具
功能冗余(FR)	多个不同物种执行相同功能的程度	自定义计算
功能唯一性	只有一个物种能执行的功能比例	FRI指标
功能β多样性	不同样本间功能组成的差异	Bray-Curtis
物种-功能解耦	物种组成变了但功能不变	Procrustes分析
保险效应	物种越多→功能冗余越高→群落越稳定	生态学理论
FRI	Functional Redundancy Index功能冗余指数	R计算

一、原理（白话版）¶

1.1 什么是功能冗余？¶

简单类比： - 一个班级有30个同学 - 其中15个人会Python，10个人会R，5个人会Java - 如果走了1个会Python的人，还有14个人能干Python的活 → 高冗余 - 如果走了那1个会Java的人，就没人能干Java的活了 → 低冗余（唯一性）

微生物群落也一样： - 如果10种不同的菌都能降解纤维素 → 去掉其中一种影响不大 - 如果只有1种菌能产生维生素B12 → 去掉它就没了

1.2 功能冗余为什么重要？¶

场景	冗余高	冗余低
抗生素干扰后	功能恢复快	某些功能可能丧失
饮食改变	群落功能稳定	功能波动大
菌群移植(FMT)	供体菌不一定能定植	关键功能菌容易缺失
疾病关联	难以找到"关键菌"	更容易找到功能标志物

1.3 计算公式¶

功能冗余指数(FRI)的核心思路：

FRI = 1 - (功能多样性 / 物种多样性)

解释：
- 如果物种很多但功能种类少 → FR高（很多菌做同样的事）
- 如果物种少但功能种类多 → FR低（每种菌做不同的事）

二、分析流程¶

2.1 数据准备¶

# ===== 计算功能冗余需要两个表 =====
# 1. 物种丰度表（taxonomy abundance）
# 2. 功能丰度表（function abundance，如KO/EC/pathway）

import pandas as pd  # 导入pandas
import numpy as np  # 导入numpy
from scipy.spatial.distance import braycurtis  # 导入Bray-Curtis距离
from scipy.stats import spearmanr  # 导入Spearman相关
from skbio.diversity import alpha_diversity  # 导入alpha多样性计算

# 读取物种丰度表（来自MetaPhlAn或Kraken2）
species_table = pd.read_csv("species_abundance.tsv", sep="\t", index_col=0)
# 行=样本，列=物种，值=相对丰度
print(f"样本数: {species_table.shape[0]}, 物种数: {species_table.shape[1]}")

# 读取功能丰度表（来自HUMAnN3的KO丰度）
function_table = pd.read_csv("ko_abundance.tsv", sep="\t", index_col=0)
# 行=样本，列=KO功能，值=丰度
print(f"样本数: {function_table.shape[0]}, 功能数: {function_table.shape[1]}")

# 确保样本一致
common_samples = species_table.index.intersection(function_table.index)
species_table = species_table.loc[common_samples]  # 只保留共有样本
function_table = function_table.loc[common_samples]
print(f"共有样本数: {len(common_samples)}")

2.2 计算功能冗余指数(FRI)¶

# ===== 方法一：基于Shannon多样性的FRI =====
def calculate_fri(species_df, function_df):
    """
    计算每个样本的功能冗余指数
    FRI = 1 - (H_function / H_species)
    H = Shannon多样性指数
    """
    fri_values = []  # 存储每个样本的FRI

    for sample in species_df.index:  # 遍历每个样本
        # 获取物种丰度（去掉丰度为0的）
        sp_abund = species_df.loc[sample]  # 该样本的物种丰度
        sp_abund = sp_abund[sp_abund > 0]  # 只保留存在的物种

        # 获取功能丰度
        fn_abund = function_df.loc[sample]  # 该样本的功能丰度
        fn_abund = fn_abund[fn_abund > 0]  # 只保留存在的功能

        # 计算Shannon多样性
        # H = -sum(pi * ln(pi))
        sp_rel = sp_abund / sp_abund.sum()  # 转为相对丰度
        fn_rel = fn_abund / fn_abund.sum()  # 转为相对丰度

        h_species = -np.sum(sp_rel * np.log(sp_rel))  # 物种Shannon多样性
        h_function = -np.sum(fn_rel * np.log(fn_rel))  # 功能Shannon多样性

        # 计算FRI
        if h_species > 0:  # 避免除以0
            fri = 1 - (h_function / h_species)  # FRI公式
        else:
            fri = np.nan  # 物种多样性为0则FRI无意义

        fri_values.append({
            "Sample": sample,
            "H_species": h_species,
            "H_function": h_function,
            "FRI": fri,
            "N_species": len(sp_abund),  # 物种丰富度
            "N_functions": len(fn_abund)  # 功能丰富度
        })

    return pd.DataFrame(fri_values)  # 返回DataFrame

# 计算FRI
fri_df = calculate_fri(species_table, function_table)
print(fri_df.describe())  # 打印统计摘要

2.3 物种-功能解耦分析¶

# ===== Procrustes分析：看物种组成变化和功能组成变化是否一致 =====
from sklearn.manifold import MDS  # 导入多维缩放
from scipy.spatial.distance import pdist, squareform  # 导入距离计算
import matplotlib.pyplot as plt  # 导入绑图

# 计算Bray-Curtis距离矩阵
sp_dist = squareform(pdist(species_table.values, metric="braycurtis"))  # 物种距离
fn_dist = squareform(pdist(function_table.values, metric="braycurtis"))  # 功能距离

# MDS降维到2D
mds = MDS(n_components=2, dissimilarity="precomputed", random_state=42)
sp_coords = mds.fit_transform(sp_dist)  # 物种坐标
fn_coords = mds.fit_transform(fn_dist)  # 功能坐标

# Mantel检验：两个距离矩阵的相关性
from skbio.stats.distance import mantel  # 导入Mantel检验
from skbio import DistanceMatrix  # 导入距离矩阵类

sp_dm = DistanceMatrix(sp_dist, ids=species_table.index.tolist())  # 构建距离矩阵对象
fn_dm = DistanceMatrix(fn_dist, ids=function_table.index.tolist())

coeff, p_value, n = mantel(sp_dm, fn_dm, method="spearman")  # 运行Mantel检验
print(f"Mantel r = {coeff:.4f}, p = {p_value:.4f}")
# r值高(>0.7) → 物种和功能强耦合（功能冗余低）
# r值低(<0.3) → 物种和功能解耦（功能冗余高）

# 绘制Procrustes分析图
fig, axes = plt.subplots(1, 2, figsize=(14, 6))  # 创建2个子图

# 左图：物种PCoA
axes[0].scatter(sp_coords[:, 0], sp_coords[:, 1], c="steelblue", s=50)
axes[0].set_title("Taxonomic Composition (PCoA)")
axes[0].set_xlabel("MDS1")
axes[0].set_ylabel("MDS2")

# 右图：功能PCoA
axes[1].scatter(fn_coords[:, 0], fn_coords[:, 1], c="coral", s=50)
axes[1].set_title("Functional Composition (PCoA)")
axes[1].set_xlabel("MDS1")
axes[1].set_ylabel("MDS2")

plt.suptitle(f"Mantel r = {coeff:.3f}, p = {p_value:.4f}", fontsize=14)
plt.tight_layout()
plt.savefig("species_function_coupling.png", dpi=300)
plt.show()

2.4 功能冗余的分组比较¶

# ===== 比较健康vs疾病组的功能冗余 =====
import seaborn as sns  # 导入seaborn
from scipy.stats import mannwhitneyu  # 导入Mann-Whitney U检验

# 读取分组信息
metadata = pd.read_csv("metadata.tsv", sep="\t")  # 读取样本元数据
fri_df = fri_df.merge(metadata[["Sample", "Group"]], on="Sample")  # 合并分组信息

# Mann-Whitney U检验
healthy = fri_df[fri_df["Group"] == "Healthy"]["FRI"]  # 健康组FRI
disease = fri_df[fri_df["Group"] == "T2D"]["FRI"]  # 疾病组FRI
stat, pval = mannwhitneyu(healthy, disease)  # 非参数检验
print(f"Mann-Whitney U test: U={stat:.1f}, p={pval:.4f}")

# 绘制箱线图
plt.figure(figsize=(8, 6))
sns.boxplot(data=fri_df, x="Group", y="FRI", palette=["#2ecc71", "#e74c3c"])
sns.stripplot(data=fri_df, x="Group", y="FRI", color="black", alpha=0.5, size=4)
plt.title(f"Functional Redundancy: Healthy vs T2D\n(p = {pval:.4f})")
plt.ylabel("Functional Redundancy Index (FRI)")
plt.tight_layout()
plt.savefig("fri_comparison.png", dpi=300)
plt.show()

三、R语言实现（补充）¶

# ===== R语言版功能冗余分析 =====
library(vegan)     # 生态学分析包
library(ggplot2)   # 绑图包

# 读取数据
species <- read.table("species_abundance.tsv", header=TRUE, 
                      row.names=1, sep="\t")  # 物种丰度表
functions <- read.table("ko_abundance.tsv", header=TRUE,
                        row.names=1, sep="\t")  # 功能丰度表

# 计算Shannon多样性
h_sp <- diversity(species, index="shannon")  # 物种Shannon指数
h_fn <- diversity(functions, index="shannon")  # 功能Shannon指数

# 计算FRI
fri <- 1 - (h_fn / h_sp)  # 功能冗余指数

# Mantel检验
sp_dist <- vegdist(species, method="bray")  # 物种Bray-Curtis距离
fn_dist <- vegdist(functions, method="bray")  # 功能Bray-Curtis距离
mantel_result <- mantel(sp_dist, fn_dist, method="spearman", 
                        permutations=9999)  # Mantel检验，9999次置换
print(mantel_result)  # 打印结果

# Procrustes分析
sp_pcoa <- cmdscale(sp_dist, k=2)  # 物种PCoA
fn_pcoa <- cmdscale(fn_dist, k=2)  # 功能PCoA
proc <- procrustes(sp_pcoa, fn_pcoa)  # Procrustes旋转
protest_result <- protest(sp_pcoa, fn_pcoa, permutations=9999)  # 统计检验
print(protest_result)

四、常见报错与解决¶

报错信息	原因	解决方案
`FRI > 1 or < 0`	功能多样性大于物种多样性	检查数据单位是否一致，确认未用绝对丰度混用相对丰度
`Mantel: p = NA`	样本数太少	至少需要10个样本才能做Mantel检验
`Division by zero`	某样本物种多样性为0	过滤掉物种数<3的样本
`Procrustes: singular matrix`	降维后数据退化	增加PCoA维度(k=3)或检查是否有全0行
`skbio import error`	scikit-bio未安装	`pip install scikit-bio`

五、面试高频问题¶

Q1: 功能冗余和功能多样性的区别？¶

A: 功能多样性是群落能执行多少种不同功能；功能冗余是每种功能有多少个物种能做。高功能多样性≠高功能冗余。理想的健康群落是两者都高。

Q2: 功能冗余高就一定好吗？¶

A: 不一定。功能冗余高意味着群落对干扰的缓冲力强（保险效应），但也可能意味着资源竞争激烈。在肠道中，适度的功能冗余通常被认为是健康的标志。

Q3: 如何判断哪些功能是"关键但不冗余的"？¶

A: 计算每个功能的UniqueIndex（只有1-2个物种能执行的功能），这些功能的物种如果消失，该功能就会丧失。这些叫做"关键功能"或"瓶颈功能"。

六、速查表¶

# ===== 功能冗余分析速查 =====

# 输入数据：
# 1. 物种丰度表 (species_abundance.tsv)
# 2. 功能丰度表 (ko_abundance.tsv)

# 核心指标：
# FRI = 1 - (H_function / H_species)
# Mantel test: 物种-功能距离矩阵的相关性
# Procrustes: 物种-功能空间的匹配程度

# 解读：
# FRI高 → 功能冗余高 → 群落稳定
# FRI低 → 功能冗余低 → 群落脆弱
# Mantel r高 → 物种和功能强耦合（冗余低）
# Mantel r低 → 物种和功能解耦（冗余高）

# Python关键包：
pip install pandas numpy scipy scikit-bio matplotlib seaborn

# R关键包：
install.packages(c("vegan", "ggplot2"))