跳转至

非编码RNA靶基因预测

一句话概述:非编码RNA(ncRNA)不翻译成蛋白质但能调控基因表达——miRNA像"小剪刀"降解mRNA,lncRNA像"调度员"指挥基因开关,预测它们的靶基因是理解基因调控网络的关键。

核心知识点表

知识点白话解释重要程度
miRNA约22nt的小RNA,通过碱基配对沉默靶基因⭐⭐⭐⭐⭐
lncRNA>200nt的长非编码RNA,多种调控机制⭐⭐⭐⭐⭐
种子序列miRNA的2-8位碱基,与靶基因3'UTR互补配对⭐⭐⭐⭐⭐
TargetScan最常用的miRNA靶基因预测工具⭐⭐⭐⭐⭐
miRDB机器学习预测miRNA靶基因的数据库⭐⭐⭐⭐
ceRNA假说lncRNA通过"海绵吸附"miRNA间接调控靶基因⭐⭐⭐⭐

一、非编码RNA调控原理

miRNA调控机制(最常见):
  miRNA(~22nt) + RISC复合物 → 与靶基因mRNA的3'UTR配对 → 降解或抑制翻译

  miRNA:   3'- UGAGGUAGUAGGUUGUAUAGUU -5'  (let-7a)
                    |||||||                    种子序列配对
  靶基因3'UTR: 5'- ...CUACCUC... -3'

种子序列类型(配对严格程度):
  8mer    → 完美8碱基配对(最严格,最可靠)
  7mer-m8 → 7碱基配对+A1位点匹配
  7mer-A1 → 7碱基配对(种子2-7+A1)
  6mer    → 6碱基配对(最宽松,假阳性多)

lncRNA调控机制(多样化):
  1. ceRNA(竞争性内源RNA)→ 海绵吸附miRNA
  2. 表观遗传调控 → 招募PRC2等修饰复合物
  3. 转录调控 → 与转录因子互作
  4. mRNA稳定性调控 → 稳定或降解mRNA
  5. 翻译调控 → 影响核糖体翻译

二、miRNA靶基因预测

2.1 在线工具

#!/usr/bin/env python3
"""miRNA靶基因预测 - 多工具整合"""

import requests  # HTTP请求
import pandas as pd  # 数据处理

# ========== 1. TargetScan API查询 ==========
def query_targetscan(mirna_name, species="human"):
    """从TargetScan网站获取miRNA靶基因"""
    # TargetScan网址:https://www.targetscan.org/
    # 需要手动下载或爬取,这里提供解析方法

    print(f"TargetScan查询: {mirna_name}")
    print(f"请访问: https://www.targetscan.org/cgi-bin/targetscan/vert_80/targetscan.cgi?mirg={mirna_name}")
    print("下载结果后用下面的代码解析")

# ========== 2. miRDB API查询 ==========
def query_mirdb(mirna_name):
    """从miRDB获取miRNA靶基因预测"""

    url = "http://mirdb.org/cgi-bin/search.cgi"
    data = {
        "searchType": "miRNA",
        "searchBox": mirna_name,
        "species": "Human"
    }

    print(f"miRDB查询: {mirna_name}")
    print(f"请访问: http://mirdb.org/ 搜索 {mirna_name}")
    print("miRDB分数>80为高可信度预测")

# ========== 3. 解析TargetScan下载文件 ==========
def parse_targetscan(filename):
    """解析TargetScan预测结果"""

    df = pd.read_csv(filename, sep="\t")

    # 关键列:
    # Gene Symbol - 靶基因名
    # Cumulative weighted context++ score - 综合打分(越负越可靠)
    # Total num of conserved sites - 保守位点数
    # Total num of poorly conserved sites - 非保守位点数

    # 按context++分数排序
    df_sorted = df.sort_values("Cumulative weighted context++ score")

    # 筛选高可信度靶基因
    high_conf = df_sorted[
        df_sorted["Cumulative weighted context++ score"] < -0.2  # 分数<-0.2
    ]

    print(f"高可信度靶基因数: {len(high_conf)}")
    return high_conf

# ========== 4. 多工具交叉验证 ==========
def cross_validate_targets(targetscan_genes, mirdb_genes, mirtarbase_genes):
    """取多个工具预测的交集,提高可靠性"""

    # 交集:至少两个工具都预测到
    ts_set = set(targetscan_genes)
    mdb_set = set(mirdb_genes)
    mtb_set = set(mirtarbase_genes)

    # 两两交集
    ts_mdb = ts_set & mdb_set
    ts_mtb = ts_set & mtb_set
    mdb_mtb = mdb_set & mtb_set

    # 三工具交集
    all_three = ts_set & mdb_set & mtb_set

    # 至少两个工具预测
    at_least_two = ts_mdb | ts_mtb | mdb_mtb

    print(f"TargetScan预测: {len(ts_set)}")
    print(f"miRDB预测: {len(mdb_set)}")
    print(f"miRTarBase验证: {len(mtb_set)}")
    print(f"三工具交集: {len(all_three)}")
    print(f"至少两工具预测: {len(at_least_two)}")

    return at_least_two

2.2 自实现种子序列匹配

#!/usr/bin/env python3
"""基于种子序列的miRNA靶基因预测"""

from Bio import SeqIO  # 序列解析
import re  # 正则表达式

def reverse_complement(seq):
    """计算互补配对序列"""
    complement = {"A": "U", "U": "A", "G": "C", "C": "G",
                  "a": "u", "u": "a", "g": "c", "c": "g",
                  "T": "A", "t": "a"}
    return "".join(complement.get(base, base) for base in reversed(seq))

def find_seed_matches(mirna_seq, utr_seq, mirna_name="miRNA", gene_name="gene"):
    """
    在3'UTR中搜索miRNA种子序列匹配位点
    种子序列 = miRNA的第2-8位
    """
    seed = mirna_seq[1:8]  # 提取种子序列(位置2-8)
    seed_rc = reverse_complement(seed)  # 种子的反向互补

    # 将DNA的T转换为RNA的U
    utr_rna = utr_seq.upper().replace("T", "U")

    matches = []

    # 搜索8mer匹配(最严格)
    pattern_8mer = seed_rc + "A"  # 8mer: 种子配对+A1
    for m in re.finditer(pattern_8mer, utr_rna):
        matches.append({
            "type": "8mer",
            "position": m.start(),
            "matched_seq": m.group()
        })

    # 搜索7mer-m8匹配
    pattern_7mer_m8 = seed_rc  # 7mer-m8: 种子2-8配对
    for m in re.finditer(pattern_7mer_m8, utr_rna):
        matches.append({
            "type": "7mer-m8",
            "position": m.start(),
            "matched_seq": m.group()
        })

    # 搜索6mer匹配
    seed6 = mirna_seq[1:7]
    seed6_rc = reverse_complement(seed6)
    for m in re.finditer(seed6_rc, utr_rna):
        matches.append({
            "type": "6mer",
            "position": m.start(),
            "matched_seq": m.group()
        })

    if matches:
        print(f"{mirna_name}{gene_name}: 找到 {len(matches)} 个种子匹配位点")
        for m in matches:
            print(f"  {m['type']} at position {m['position']}: {m['matched_seq']}")

    return matches

# ========== 使用示例 ==========
# hsa-miR-21-5p序列
mir21 = "UAGCUUAUCAGACUGAUGUUGA"

# 示例3'UTR序列
pdcd4_utr = "AGCUUAUCAUUUUAUAUAAGCUA"  # PDCD4的3'UTR(miR-21已知靶基因)

matches = find_seed_matches(mir21, pdcd4_utr, "hsa-miR-21-5p", "PDCD4")

三、lncRNA靶基因预测

#!/usr/bin/env python3
"""lncRNA靶基因预测方法"""

# ========== 常用lncRNA靶基因预测工具 ==========
lncrna_tools = {
    "ceRNA分析": {
        "ENCORI/starBase": {
            "url": "https://rnasysu.com/encori/",
            "method": "CLIP-seq数据挖掘",
            "description": "整合了miRNA-mRNA、miRNA-lncRNA互作数据"
        },
        "miRcode": {
            "url": "http://www.mircode.org/",
            "method": "种子序列匹配",
            "description": "预测lncRNA上的miRNA结合位点"
        },
        "LncBase v3": {
            "url": "https://diana.e-ce.uth.gr/lncbasev3",
            "method": "实验+预测",
            "description": "lncRNA-miRNA互作数据库"
        },
    },
    "RNA-RNA互作": {
        "IntaRNA": {
            "url": "http://rna.informatik.uni-freiburg.de/IntaRNA/",
            "method": "热力学计算",
            "description": "预测RNA-RNA结合的自由能"
        },
        "RNAplex": {
            "url": "https://www.tbi.univie.ac.at/RNA/",
            "method": "快速杂交预测",
            "description": "Vienna RNA package的一部分"
        },
    },
    "共表达分析": {
        "WGCNA": {
            "url": "https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/",
            "method": "加权基因共表达网络",
            "description": "找与lncRNA共表达的基因模块"
        },
    }
}

# 打印工具列表
for category, tools in lncrna_tools.items():
    print(f"\n{'='*50}")
    print(f"  {category}")
    print(f"{'='*50}")
    for name, info in tools.items():
        print(f"  {name}")
        print(f"    方法: {info['method']}")
        print(f"    网址: {info['url']}")

四、ceRNA网络构建

#!/usr/bin/env Rscript
# ceRNA竞争性内源RNA网络构建

library(ggplot2)
library(igraph)

# ========== ceRNA假说 ==========
# lncRNA通过海绵吸附miRNA,解除miRNA对靶基因的抑制
# lncRNA ---抑制--→ miRNA ---抑制--→ mRNA
# 结果:lncRNA间接激活mRNA表达

# ========== 构建ceRNA网络 ==========
# 需要三组数据:
# 1. miRNA-mRNA互作(TargetScan/miRDB)
# 2. miRNA-lncRNA互作(miRcode/ENCORI)
# 3. 表达相关性验证

# 示例数据
mirna_mrna <- data.frame(
    miRNA = c("miR-21", "miR-21", "miR-155", "miR-155", "miR-200a"),
    mRNA = c("PDCD4", "PTEN", "SOCS1", "TP53INP1", "ZEB1"),
    score = c(95, 88, 92, 85, 90)
)

mirna_lncrna <- data.frame(
    miRNA = c("miR-21", "miR-21", "miR-155", "miR-200a"),
    lncRNA = c("HOTAIR", "MALAT1", "NEAT1", "HOTAIR"),
    sites = c(3, 2, 4, 1)
)

# 构建网络
edges <- rbind(
    data.frame(from = mirna_mrna$miRNA, to = mirna_mrna$mRNA, type = "miRNA-mRNA"),
    data.frame(from = mirna_lncrna$miRNA, to = mirna_lncrna$lncRNA, type = "miRNA-lncRNA")
)

g <- graph_from_data_frame(edges, directed = TRUE)
V(g)$type <- ifelse(V(g)$name %in% mirna_mrna$miRNA, "miRNA",
                     ifelse(V(g)$name %in% mirna_lncrna$lncRNA, "lncRNA", "mRNA"))

# 可视化
colors <- c("miRNA" = "red", "lncRNA" = "blue", "mRNA" = "green")
V(g)$color <- colors[V(g)$type]

plot(g, 
     vertex.size = 20,
     vertex.label.cex = 0.8,
     edge.arrow.size = 0.5,
     main = "ceRNA Network")

常见报错与解决

报错信息原因解决方法
miRNA name not foundmiRNA命名不规范用miRBase标准名称(如hsa-miR-21-5p)
No targets predictedmiRNA序列不对确认是成熟miRNA序列,不是pre-miRNA
Too many targets种子太短(6mer匹配太多)只看7mer-m8和8mer匹配
UTR sequence error3'UTR序列获取错误从UCSC或Ensembl重新获取

速查表

========================================
非编码RNA靶基因预测 速查表
========================================

【miRNA靶基因预测工具】
TargetScan            → 保守性+配对打分(最经典)
miRDB                 → 机器学习预测(分数>80可信)
miRTarBase            → 实验验证的互作(金标准)
ENCORI/starBase       → CLIP-seq数据挖掘

【种子序列匹配类型】
8mer                  → 最严格,最可靠
7mer-m8               → 较严格
7mer-A1               → 中等
6mer                  → 最宽松,假阳性多

【lncRNA靶基因预测】
ceRNA分析             → miRcode + ENCORI
RNA互作预测           → IntaRNA / RNAplex
共表达分析            → WGCNA
ChIRP-seq/CHART       → 实验验证lncRNA结合位点

【lncRNA数据库】
NONCODE               → 最全的ncRNA数据库
LNCipedia             → 人类lncRNA注释
lncRNAdb              → 功能注释的lncRNA

【miRNA数据库】
miRBase               → miRNA序列和命名(权威)
mirDB                 → 靶基因预测
miRTarBase             → 实验验证靶基因

【靶基因验证策略】
计算预测              → 多工具取交集(≥2个工具预测)
表达相关性            → miRNA上调 → 靶基因下调
实验验证              → 荧光素酶报告基因实验(金标准)
CLIP-seq              → 大规模实验鉴定真实靶基因

【面试考点】
Q: miRNA的种子序列是什么?
A: miRNA第2-8位碱基,与靶基因3'UTR互补配对

Q: ceRNA假说是什么?
A: lncRNA通过竞争结合miRNA,解除miRNA对靶基因的抑制

Q: 怎么验证miRNA靶基因?
A: 荧光素酶报告基因实验(种子突变后失去抑制效果)
========================================

参考资料:TargetScan | miRDB | miRTarBase | ENCORI | Bartel, Cell 2009