Observable Framework 完全指南¶

为什么要学 Observable Framework¶

数据应用的最佳载体：Observable Framework 让你构建数据驱动的静态网站。它不是普通的静态站点生成器——它专为数据可视化、仪表板、报告设计，将数据加载、转换和展示无缝整合在一起。
数据加载器（Data Loaders）革新数据管道：任何能输出文件的程序都可以作为数据加载器——Python 脚本、R 脚本、SQL 查询、Shell 命令、甚至 Rust 程序。Framework 在构建时执行它们，将结果缓存并嵌入页面。
多语言混合使用：在同一个项目中，你可以用 Python 预处理数据、用 SQL 查询数据库、用 JavaScript/Observable Plot 可视化、用 D3 做自定义图表。每种语言做自己最擅长的事。
交互性是一等公民：内置 Observable 的响应式运行时，变量之间自动建立依赖关系。一个滑块的值变了，所有依赖它的图表自动更新。不需要写事件监听或状态管理代码。
静态部署，零运行时成本：构建后是纯静态文件，可以部署到任何静态托管（GitHub Pages、Vercel、Netlify、S3）。没有服务器成本，没有冷启动，全球 CDN 分发，极快加载。

核心概念详解¶

Observable Framework 是什么（白话解释）¶

你可以把它想象成"带数据能力的 Markdown 静态站点"。普通的 Markdown 博客只能写文字和图片。Observable Framework 的 Markdown 中可以嵌入 JavaScript 代码块，这些代码会在浏览器中执行，可以读取数据、绑定交互控件、渲染图表。

数据来源？你写一个 Python/SQL/Shell 脚本放在特定目录，Framework 在构建时自动运行它、缓存结果，然后在页面中就可以直接引用这些数据。

核心架构¶

┌────────────────────────────────────────────────┐
│  构建时 (Build Time)                            │
│  ┌─────────────────────────────────────────┐   │
│  │ Data Loaders (.py, .sql, .r, .sh, .ts)  │   │
│  │ 执行脚本 → 生成 .csv, .json, .parquet   │   │
│  └─────────────────┬───────────────────────┘   │
│                    ↓                            │
│  ┌─────────────────────────────────────────┐   │
│  │ Markdown Pages (.md)                     │   │
│  │ 文字 + JS代码块 + 图表 + 控件            │   │
│  └─────────────────┬───────────────────────┘   │
│                    ↓                            │
│  ┌─────────────────────────────────────────┐   │
│  │ Static Site (_site/)                     │   │
│  │ HTML + JS + CSS + Data files             │   │
│  └─────────────────────────────────────────┘   │
└────────────────────────────────────────────────┘
                     ↓ 部署
┌────────────────────────────────────────────────┐
│  运行时 (Browser)                               │
│  - 加载预构建的数据文件                          │
│  - 执行 JS 代码块                               │
│  - 渲染交互式图表                               │
│  - 响应用户交互（响应式）                        │
└────────────────────────────────────────────────┘

Data Loaders（数据加载器）¶

文件命名	执行方式	输出格式	使用场景
`data/sales.csv.py`	Python	CSV	数据清洗转换
`data/users.json.ts`	TypeScript	JSON	API 聚合
`data/stats.parquet.r`	R	Parquet	统计分析
`data/report.csv.sql`	SQL (DuckDB)	CSV	数据库查询
`data/metrics.json.sh`	Shell	JSON	系统命令

命名规则：<输出文件名>.<输出格式>.<加载器语言>

Observable Framework vs Quarto vs Jupyter Book¶

特性	Observable Framework	Quarto	Jupyter Book
定位	数据应用网站	科学出版	计算叙事
前端语言	JavaScript (Observable)	Python/R/Julia	Python/R
交互性	原生响应式	有限（OJS块）	需要Widget
数据加载	多语言Loaders	代码块内	代码块内
输出	静态网站	HTML/PDF/Word/PPT	HTML/PDF
图表库	Observable Plot, D3	Matplotlib, Plotly	Matplotlib
SQL 支持	DuckDB 原生	需配置	不直接支持
性能	极快（预加载数据）	取决于渲染	中等
部署	静态文件	静态/服务端	静态
学习曲线	中（需会JS）	低	低

安装与配置¶

安装 Observable Framework¶

# 需要 Node.js 18+
node --version

# 创建新项目
npm init @observablehq

# 交互式选择：
# - 项目名称
# - 安装依赖

cd my-project

项目结构¶

my-project/
├── src/
│   ├── data/              # 数据加载器
│   │   ├── sales.csv.py   # Python → CSV
│   │   ├── users.json.ts  # TypeScript → JSON
│   │   └── metrics.parquet.sql  # SQL → Parquet
│   ├── components/        # 可复用 JS 组件
│   │   └── chart.js
│   ├── index.md           # 首页
│   ├── dashboard.md       # 仪表板页面
│   └── analysis.md        # 分析页面
├── observablehq.config.ts # 框架配置
├── package.json
└── .env                   # 环境变量

配置文件¶

// observablehq.config.ts
export default {
  title: "我的数据应用",
  pages: [
    { name: "首页", path: "/" },
    { name: "仪表板", path: "/dashboard" },
    {
      name: "分析",
      pages: [
        { name: "销售分析", path: "/analysis/sales" },
        { name: "用户分析", path: "/analysis/users" },
      ],
    },
  ],
  head: '<link rel="icon" href="/favicon.ico">',
  header: "数据分析平台",
  footer: "© 2024 数据团队",
  style: "/custom.css",
  toc: true,
  pager: true,
  root: "src",
  output: "_site",
};

开发命令¶

# 开发模式（热重载）
npm run dev
# 打开 http://localhost:3000

# 构建
npm run build

# 预览构建结果
npm run preview

# 部署到 Observable Cloud
npm run deploy

快速上手：5 分钟最小示例¶

src/index.md：

# 数据可视化示例

这是一个 Observable Framework 页面。

```js
// 加载数据
const data = await FileAttachment("data/temperatures.csv").csv({typed: true});
```

## 交互控件

```js
const year = view(Inputs.range([2000, 2024], {step: 1, value: 2024, label: "年份"}));
```

```js
const filtered = data.filter(d => d.year === year);
```

## 温度图表

```js
Plot.plot({
  title: `${year}年月度气温`,
  x: {label: "月份"},
  y: {label: "温度 (°C)", grid: true},
  marks: [
    Plot.lineY(filtered, {x: "month", y: "temperature", stroke: "city"}),
    Plot.dot(filtered, {x: "month", y: "temperature", fill: "city"})
  ]
})
```

选择的年份是 **${year}**，共 ${filtered.length} 条数据。

src/data/temperatures.csv.py：

import pandas as pd
import numpy as np
import sys

np.random.seed(42)
records = []
for year in range(2000, 2025):
    for month in range(1, 13):
        for city in ["北京", "上海", "广州"]:
            base = {"北京": 5, "上海": 15, "广州": 22}[city]
            seasonal = 15 * np.sin((month - 1) / 12 * 2 * np.pi - np.pi/2)
            temp = base + seasonal + np.random.randn() * 2
            records.append({"year": year, "month": month, "city": city, "temperature": round(temp, 1)})

df = pd.DataFrame(records)
df.to_csv(sys.stdout, index=False)

运行：

npm run dev

进阶用法¶

场景一：SQL 数据加载（DuckDB）¶

src/data/analysis.csv.sql：

-- 这个 SQL 文件由 DuckDB 执行
-- 可以直接查询 CSV/Parquet 文件

SELECT
    strftime(date, '%Y-%m') as month,
    category,
    SUM(amount) as total_sales,
    COUNT(*) as order_count,
    AVG(amount) as avg_order
FROM read_csv_auto('raw_data/orders.csv')
WHERE date >= '2024-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;

在页面中使用：

```js
const analysis = await FileAttachment("data/analysis.csv").csv({typed: true});
```

```js
Inputs.table(analysis)
```

场景二：Python + Observable Plot 可视化管道¶

src/data/processed.json.py：

import json
import sys
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# 加载和处理数据
df = pd.read_csv("raw_data/features.csv")
features = df.select_dtypes(include=[float, int])

# PCA 降维
scaler = StandardScaler()
scaled = scaler.fit_transform(features)
pca = PCA(n_components=2)
components = pca.fit_transform(scaled)

result = {
    "points": [{"x": float(x), "y": float(y), "label": label}
               for (x, y), label in zip(components, df["category"])],
    "variance_explained": pca.explained_variance_ratio_.tolist(),
}

json.dump(result, sys.stdout)

src/analysis.md：

# PCA 分析

```js
const pca = await FileAttachment("data/processed.json").json();
```

```js
Plot.plot({
  title: "PCA 投影",
  color: {legend: true},
  marks: [
    Plot.dot(pca.points, {x: "x", y: "y", fill: "label", opacity: 0.7}),
  ]
})
```

方差解释率: ${(pca.variance_explained[0] * 100).toFixed(1)}% + ${(pca.variance_explained[1] * 100).toFixed(1)}%

场景三：实时 API 数据 + 定时更新¶

src/data/github-stars.json.ts：

const repos = ["d3/d3", "observablehq/plot", "observablehq/framework"];

const results = await Promise.all(
  repos.map(async (repo) => {
    const res = await fetch(`https://api.github.com/repos/${repo}`);
    const data = await res.json();
    return {
      name: repo,
      stars: data.stargazers_count,
      forks: data.forks_count,
      updated: data.updated_at,
    };
  })
);

process.stdout.write(JSON.stringify(results));

配合 CI/CD 定时重新构建，实现数据自动更新。

场景四：仪表板布局¶

---
title: 运营仪表板
toc: false
---

<div class="grid grid-cols-4">
  <div class="card">
    <h2>日活跃用户</h2>
    <span class="big">${d3.format(",")(metrics.dau)}</span>
  </div>
  <div class="card">
    <h2>转化率</h2>
    <span class="big">${(metrics.conversion * 100).toFixed(1)}%</span>
  </div>
  <div class="card">
    <h2>平均订单额</h2>
    <span class="big">¥${metrics.aov.toFixed(0)}</span>
  </div>
  <div class="card">
    <h2>NPS 评分</h2>
    <span class="big">${metrics.nps}</span>
  </div>
</div>

```js
const metrics = await FileAttachment("data/metrics.json").json();
```

<div class="grid grid-cols-2">
  <div class="card">

  ```js
  Plot.plot({
    title: "用户增长趋势",
    height: 300,
    marks: [
      Plot.areaY(growth, {x: "date", y: "users", fill: "steelblue", opacity: 0.3}),
      Plot.lineY(growth, {x: "date", y: "users", stroke: "steelblue"})
    ]
  })
  ```

  </div>
  <div class="card">

  ```js
  Plot.plot({
    title: "收入来源分布",
    height: 300,
    marks: [
      Plot.barX(revenue, Plot.groupY({x: "sum"}, {y: "source", x: "amount", fill: "source"}))
    ]
  })
  ```

  </div>
</div>

场景五：DuckDB + Parquet 本地数据仓库¶

src/data/warehouse.parquet.py：

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
import sys

# 模拟大数据集
df = pd.read_csv("raw_data/transactions.csv", parse_dates=["date"])

# 数据转换
summary = df.groupby([pd.Grouper(key='date', freq='W'), 'category']).agg({
    'amount': ['sum', 'mean', 'count'],
    'customer_id': 'nunique'
}).reset_index()

summary.columns = ['week', 'category', 'total_amount', 'avg_amount', 'tx_count', 'unique_customers']

# 输出为 Parquet（更高效的列式存储）
table = pa.Table.from_pandas(summary)
pq.write_table(table, sys.stdout.buffer)

在页面中使用 DuckDB-Wasm 查询 Parquet：

```js
const db = await DuckDBClient.of({warehouse: FileAttachment("data/warehouse.parquet")});
```

```js
const topCategories = db.query(`
  SELECT category, SUM(total_amount) as revenue
  FROM warehouse
  WHERE week >= '2024-01-01'
  GROUP BY category
  ORDER BY revenue DESC
  LIMIT 10
`);
```

```js
Inputs.table(topCategories)
```

场景六：自定义 D3 可视化¶

src/components/forceGraph.js：

import * as d3 from "d3";

export function forceGraph(nodes, links, {width = 640, height = 400} = {}) {
  const svg = d3.create("svg")
    .attr("viewBox", [0, 0, width, height])
    .attr("width", width)
    .attr("height", height);

  const simulation = d3.forceSimulation(nodes)
    .force("link", d3.forceLink(links).id(d => d.id).distance(50))
    .force("charge", d3.forceManyBody().strength(-100))
    .force("center", d3.forceCenter(width / 2, height / 2));

  const link = svg.append("g")
    .selectAll("line")
    .data(links)
    .join("line")
    .attr("stroke", "#999")
    .attr("stroke-opacity", 0.6);

  const node = svg.append("g")
    .selectAll("circle")
    .data(nodes)
    .join("circle")
    .attr("r", 8)
    .attr("fill", d => d3.schemeCategory10[d.group % 10])
    .call(drag(simulation));

  simulation.on("tick", () => {
    link
      .attr("x1", d => d.source.x).attr("y1", d => d.source.y)
      .attr("x2", d => d.target.x).attr("y2", d => d.target.y);
    node
      .attr("cx", d => d.x).attr("cy", d => d.y);
  });

  function drag(simulation) {
    return d3.drag()
      .on("start", (event) => { if (!event.active) simulation.alphaTarget(0.3).restart(); event.subject.fx = event.subject.x; event.subject.fy = event.subject.y; })
      .on("drag", (event) => { event.subject.fx = event.x; event.subject.fy = event.y; })
      .on("end", (event) => { if (!event.active) simulation.alphaTarget(0); event.subject.fx = null; event.subject.fy = null; });
  }

  return svg.node();
}

使用：

```js
import {forceGraph} from "./components/forceGraph.js";

const graph = await FileAttachment("data/network.json").json();
```

```js
forceGraph(graph.nodes, graph.links, {width: 800, height: 500})
```

常见问题与排错¶

问题一：Data Loader 执行失败¶

# 查看详细错误
npm run dev -- --verbose

# 手动运行 data loader 调试
python src/data/my-loader.csv.py > /tmp/test.csv

# 确保输出到 stdout
# Python: print() 或 sys.stdout.write()
# R: cat() 或 write.csv(df, stdout())

问题二：数据文件路径找不到¶

Data Loader 的工作目录是项目根目录，不是 src/data/：

# 正确：相对于项目根目录
df = pd.read_csv("raw_data/input.csv")

# 错误：相对于 data loader 文件位置
df = pd.read_csv("../raw_data/input.csv")

问题三：FileAttachment 路径¶

// FileAttachment 路径相对于当前 .md 文件
// 如果在 src/index.md 中引用 src/data/sales.csv：
const data = await FileAttachment("data/sales.csv").csv();

// 注意：Data Loader 输出的文件名去掉加载器后缀
// src/data/sales.csv.py → FileAttachment("data/sales.csv")

问题四：响应式变量不更新¶

// Observable Framework 中，每个代码块的顶层变量会自动成为响应式的
// 但需要注意：同一个变量只能在一个代码块中定义

// 代码块 1
const x = view(Inputs.range([0, 100]));

// 代码块 2（自动响应 x 的变化）
const y = x * 2; // 当 x 变化时，y 自动重算

问题五：构建后数据文件过大¶

# 在 data loader 中预聚合数据，而不是传输原始数据
# 不好：传 100 万行原始数据到浏览器
# 好：在 data loader 中聚合为 1000 行摘要

# 使用 Parquet 格式（比 CSV 小很多）
# sales.parquet.py 而不是 sales.csv.py

问题六：如何部署¶

# 部署到 Observable Cloud
npm run deploy

# 或构建后部署到任意静态托管
npm run build
# 上传 _site/ 目录到 Vercel/Netlify/S3

# GitHub Pages
# .github/workflows/deploy.yml

参考资源¶

官方文档：https://observablehq.com/framework/
Observable Plot：https://observablehq.com/plot/
GitHub：https://github.com/observablehq/framework
示例项目：https://github.com/observablehq/framework/tree/main/examples
Observable 笔记本（学习 Observable JS）：https://observablehq.com/
D3.js 文档：https://d3js.org/
DuckDB-Wasm：https://duckdb.org/docs/api/wasm