DSPy 编程式 Prompt 优化¶

为什么要学¶

传统的 prompt engineering 是"手工艺"——反复试错调整提示词。DSPy 将其变为"工程"：

编程式定义：用代码声明输入输出，而非手写 prompt 文本
自动优化：编译器自动找到最佳 prompt/few-shot 示例
模块化组合：像搭积木一样组合 LLM 调用管道
可评估：内置评估框架，量化每次优化效果
模型无关：换模型不需要重写 prompt

DSPy 的核心理念：prompt 是实现细节，不应该由开发者手写；开发者只需定义任务的输入输出和质量标准。

核心概念¶

白话解释¶

把 DSPy 想象成"LLM 应用的编译器"： - Signature（签名）：定义"任务是什么"（输入→输出） - Module（模块）：实现"怎么做"的策略（Chain-of-Thought 等） - Optimizer（优化器）：自动找到最好的 prompt 和示例 - Metric（指标）：定义"什么是好的"

核心概念对照表¶

概念	说明	类比
Signature	声明任务的输入输出字段	函数签名 def f(x) -> y
Module	封装一个LLM调用策略	类/组件
Predict	最简单的模块(直接调用)	普通函数调用
ChainOfThought	带推理过程的模块	"请一步步想"
Program	多模块组合的完整管道	应用程序
Optimizer/Teleprompter	自动优化prompt的编译器	编译器gcc -O3
Metric	评估输出质量的函数	单元测试断言
Example	训练/评估用的示例	测试用例
Assertion	运行时约束检查	assert语句

安装配置¶

安装¶

pip install dspy-ai

# 或最新开发版
pip install git+https://github.com/stanfordnlp/dspy.git

配置 LLM¶

import dspy

# OpenAI
lm = dspy.LM("openai/gpt-4o-mini", api_key="sk-...")
dspy.configure(lm=lm)

# 本地Ollama
lm = dspy.LM("ollama_chat/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)

# Azure OpenAI
lm = dspy.LM(
    "azure/gpt-4o-mini",
    api_base="https://xxx.openai.azure.com/",
    api_key="...",
    api_version="2024-02-01"
)
dspy.configure(lm=lm)

快速上手¶

第一个 DSPy 程序¶

import dspy

# 1. 配置LLM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# 2. 定义Signature(任务的输入输出)
class QA(dspy.Signature):
    """回答用户的问题"""
    question: str = dspy.InputField(desc="用户提出的问题")
    answer: str = dspy.OutputField(desc="简洁准确的回答")

# 3. 创建Module
qa = dspy.Predict(QA)

# 4. 调用
result = qa(question="什么是向量数据库?")
print(result.answer)

Chain-of-Thought 推理¶

# 使用ChainOfThought自动添加推理步骤
cot_qa = dspy.ChainOfThought(QA)
result = cot_qa(question="如果一个数能被6整除，它一定能被3整除吗？为什么？")

print(f"推理: {result.rationale}")
print(f"答案: {result.answer}")

多步骤管道¶

class SearchQuery(dspy.Signature):
    """将问题转为搜索查询"""
    question: str = dspy.InputField()
    query: str = dspy.OutputField(desc="优化后的搜索查询词")

class AnswerWithContext(dspy.Signature):
    """基于上下文回答问题"""
    context: str = dspy.InputField(desc="检索到的相关文档")
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

class RAGPipeline(dspy.Module):
    def __init__(self):
        self.query_gen = dspy.ChainOfThought(SearchQuery)
        self.answer = dspy.ChainOfThought(AnswerWithContext)

    def forward(self, question):
        # 步骤1: 生成搜索查询
        query = self.query_gen(question=question).query

        # 步骤2: 检索(模拟)
        context = self.retrieve(query)

        # 步骤3: 生成回答
        return self.answer(context=context, question=question)

    def retrieve(self, query):
        # 实际场景替换为真正的检索
        return f"关于'{query}'的相关文档内容..."

rag = RAGPipeline()
result = rag(question="Python中GIL的作用是什么?")
print(result.answer)

进阶用法¶

1. 自动优化(编译)¶

import dspy
from dspy.teleprompt import BootstrapFewShot

# 准备训练数据
trainset = [
    dspy.Example(
        question="什么是机器学习?",
        answer="机器学习是AI的子领域，让计算机从数据中学习规律而无需显式编程"
    ).with_inputs("question"),
    dspy.Example(
        question="Python的优势是什么?",
        answer="Python的优势包括语法简洁、生态丰富、适合快速开发和数据科学"
    ).with_inputs("question"),
    # ... 更多示例
]

# 定义评估指标
def metric(example, prediction, trace=None):
    # 简单的关键词匹配评估
    return len(prediction.answer) > 20 and "是" in prediction.answer

# 编译优化
optimizer = BootstrapFewShot(metric=metric, max_bootstrapped_demos=3)
compiled_qa = optimizer.compile(
    dspy.ChainOfThought(QA),
    trainset=trainset
)

# 优化后的模型自动包含最佳few-shot示例
result = compiled_qa(question="什么是深度学习?")
print(result.answer)

2. 高级优化器¶

from dspy.teleprompt import (
    BootstrapFewShot,      # 自动选择few-shot示例
    BootstrapFewShotWithRandomSearch,  # + 随机搜索
    MIPRO,                 # 多指令提案+优化
    MIPROv2,               # MIPRO改进版
    BootstrapFinetune,     # 生成微调数据
)

# MIPRO: 自动生成和优化指令
optimizer = MIPROv2(
    metric=metric,
    num_candidates=10,  # 生成10个候选指令
    init_temperature=1.0
)
compiled = optimizer.compile(
    dspy.ChainOfThought(QA),
    trainset=trainset,
    num_trials=20  # 尝试20种组合
)

3. 断言(Assertions)¶

class CitedAnswer(dspy.Signature):
    """回答问题并引用来源"""
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()
    citations: list[str] = dspy.OutputField(desc="引用的来源列表")

class RAGWithAssertions(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought(CitedAnswer)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        # 断言：回答必须有引用
        dspy.Assert(
            len(result.citations) > 0,
            "回答必须至少引用一个来源"
        )

        # 断言：回答不能太短
        dspy.Assert(
            len(result.answer) > 50,
            "回答至少需要50个字符"
        )

        return result

4. 检索集成¶

import dspy
from dspy.retrieve import ChromadbRM

# 配置检索模型
retriever = ChromadbRM(
    collection_name="my_docs",
    persist_directory="./chroma_db",
    embedding_function=None,  # 使用默认
    k=3
)

dspy.configure(lm=lm, rm=retriever)

# 使用内置RAG模块
class SimpleRAG(dspy.Module):
    def __init__(self, k=3):
        self.retrieve = dspy.Retrieve(k=k)
        self.answer = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        docs = self.retrieve(question).passages
        context = "\n".join(docs)
        return self.answer(context=context, question=question)

5. 多模型协作¶

# 不同步骤使用不同模型
class MultiModelPipeline(dspy.Module):
    def __init__(self):
        # 快速模型做分类
        self.classifier = dspy.Predict("question -> category")
        # 强模型做生成
        self.generator = dspy.ChainOfThought("category, question -> answer")

    def forward(self, question):
        # 用小模型分类
        with dspy.context(lm=dspy.LM("openai/gpt-4o-mini")):
            cat = self.classifier(question=question)

        # 用大模型生成
        with dspy.context(lm=dspy.LM("openai/gpt-4o")):
            answer = self.generator(category=cat.category, question=question)

        return answer

6. 评估框架¶

from dspy.evaluate import Evaluate

# 定义评估数据
devset = [
    dspy.Example(
        question="Python和Java的主要区别?",
        answer="Python是动态类型解释型语言，Java是静态类型编译型语言"
    ).with_inputs("question")
    # ... 更多
]

# 定义指标
def accuracy_metric(example, prediction, trace=None):
    # 可以用LLM判断语义相似度
    judge = dspy.Predict("reference, prediction -> score: float")
    result = judge(reference=example.answer, prediction=prediction.answer)
    return float(result.score) > 0.7

# 运行评估
evaluator = Evaluate(
    devset=devset,
    metric=accuracy_metric,
    num_threads=4,
    display_progress=True
)

score = evaluator(compiled_qa)
print(f"准确率: {score}%")

常见问题¶

Q1: DSPy 和手写 prompt 的区别？¶

方面	手写Prompt	DSPy
方法	试错迭代	编程+自动优化
可维护性	差(长字符串)	好(模块化代码)
换模型	需要重写	重新编译即可
评估	手动	内置框架
适合场景	简单单次调用	复杂多步管道

Q2: 优化需要多少训练数据？¶

BootstrapFewShot: 5-20 条即可
MIPRO: 20-100 条效果更好
微调: 100+ 条

Q3: 编译很慢怎么办？¶

减少 num_trials
使用更快的模型(gpt-4o-mini)
缓存编译结果：compiled.save("optimized_qa.json")
加载：compiled = QAModule(); compiled.load("optimized_qa.json")

Q4: 如何调试 DSPy 程序？¶

# 查看实际发送的prompt
dspy.configure(lm=lm, trace=[])

result = compiled_qa(question="test")

# 打印完整trace
for step in dspy.settings.trace:
    print(step)

# 或使用inspect
dspy.inspect_history(n=3)  # 查看最近3次LLM调用

Q5: 和 LangChain 能一起用吗？¶

可以。DSPy 专注于 prompt 优化，LangChain 专注于工具和链的编排。可以在 LangChain 的某个步骤中使用 DSPy 优化过的模块。

参考资源¶

DSPy 官方文档 - 完整文档
DSPy GitHub - 源代码
DSPy 论文 - 学术论文
DSPy Cookbook - 示例集
Stanford NLP - 研究团队