← 返回上一頁

Deer-flow：LangGraph 並行 sub-agent 的 long-horizon harness

2026-05-01 tony 閱讀約 43 分鐘約 8,540 字

本頁目錄

過去半年我跑過 OpenManus、Symphony、GeneralAgent 幾套 long-horizon agent harness，碰到的痛點都很類似：跑超過 30 分鐘的任務 context 就爆、sub-agent 之間沒有真的並行（只是 looped 假平行）、sandbox 配錯讓 agent 把 host 檔案改壞。

ByteDance 在 2026 年 2 月 27 日開源的 Deer-flow 顯然是衝著這幾個痛點去的，發布 24 小時內就上 GitHub Trending #1，目前累積 64.4K stars、8.5K forks。它把 LangGraph + LangChain 當底，自己疊上 sub-agent fan-out、sandbox、memory、message gateway、skill 系統 ── 不是一套 framework，是一個 harness，意思是「執行容器」，agent 不只是文字輸出，是真的有 filesystem、有 bash、有 Python runtime 可以執行任務。

這篇實測紀錄是我把 Deer-flow 從零裝到跑「研究 RAG 最新論文 + 整理 slide deck」這個跨小時級任務的整理。順便比對 Manus、SuperAgent、OpenManus 這些常被混淆的東西，把 Deer-flow 的位置講清楚。

Deer-flow 架構圖

一、Deer-flow 是什麼

Deer-flow 2.0 的官方描述是 a super agent harness that researches, codes, and creates。要理解它的設計，先把幾個容易搞混的概念釐清。

1.1 harness vs framework vs product 的差別

很多人把 Deer-flow 跟 LangGraph、CrewAI 放在同一格比較，但它們不是同層的東西：

Framework（LangGraph、LangChain、CrewAI）── 提供 building block，你自己組 agent loop、自己接 tool、自己處理 memory
Harness（Deer-flow、ChatGPT 的 deep research）── 已經把 sub-agent 並行、sandbox、memory、message gateway 全部組好，你只寫 skill 跟 prompt
Product（Manus、ChatGPT Operator、Devin）── 完全封閉，你只用 chat 介面，看不到底層

Deer-flow 是 harness，所以它在 LangGraph 之上 ── README 明說 built on LangGraph for multi-agent orchestration and LangChain for LLM interactions。它沒重複發明這兩件事，而是把上面那層做到極致。

1.2 規模與沿革

從 GitHub 數據與更新日誌看：

2026/02/27 開源，當天登 GitHub Trending #1
兩個月內 64.4K stars / 8.5K forks
Python 70.2% / TypeScript 16.3%（前端介面用 Next.js）
MIT License
2.0 版本是 a ground-up rewrite that shares no code with version 1，不是版本升級而是重寫

ByteDance 這次選擇 MIT 而不是 Apache 2.0 是個耐人尋味的決定 ── 比較寬鬆，允許閉源衍生品。對採用方友善。

二、四大 runtime 元件

Deer-flow 的核心是 lead agent 之下的四件事：sub-agent、sandbox、memory、message gateway。這四件事決定了它能做什麼樣的任務。

2.1 Sub-agent fan-out

跟其他 framework 最不一樣的地方在這裡。引用 README：

The lead agent can spawn sub-agents on the fly — each with its own scoped context, tools, and termination conditions. Sub-agents run in parallel when possible, report back structured results.

實務上 lead agent 會在 plan 階段把任務切成 N 個獨立子任務（例：「查 5 篇論文」會切成 5 個 sub-agent 各查 1 篇），然後用 LangGraph 的 fan-out node 並行 spawn。每個 sub-agent 拿到的是 scoped context ── 不是繼承全部對話歷史，而是 lead agent 給的 task brief + 它需要的 tool。

這個設計最大的好處：context 不爆。傳統 single-thread agent 跑 30 分鐘以上 context 就會塞滿，Deer-flow 把每個 sub-agent 的 context 限縮在自己任務內，主 agent 只看「結構化結果」，token 預算才能維持在合理範圍。

2.2 Sandbox：Local / Docker / K8s 三種模式

每個任務有獨立的 sandbox，文件支援三種 backend：

Local execution（直接跑在 host 上，快但風險高）
Docker container（每個任務一個獨立容器，推薦的預設）
Kubernetes pod（要配 provisioner service，適合多人共用部署）

agent 在 sandbox 內讀寫的目錄是 /mnt/user-data/，分 uploads、workspace、outputs 三個子目錄。這個目錄 layout 對 agent 可預測 ── prompt 裡明確告訴 agent「下載的素材放 uploads、產出的中間檔放 workspace、最終交付物放 outputs」── 不會出現 agent 把檔案隨便亂塞、最後找不到的狀況。

2.3 Memory：跨 session 持久化

Across sessions, DeerFlow builds a persistent memory of your profile, preferences, and accumulated knowledge.

memory 的實作有個有趣的細節 ── memory updates now skip duplicate fact entries at apply time, so repeated preferences and context do not accumulate endlessly。意思是它會在 apply 時做 dedup，避免「我喜歡用 PostgreSQL」這種偏好被重複寫入造成 token 浪費。

這是 agent memory 的常見問題：早期實作（mem0、letta）容易讓 memory 越長越大，最後查 memory 比直接問 LLM 還慢。Deer-flow 在 apply layer 做 dedup 是正確選擇。

2.4 Message Gateway：跨 IM 平台

這個元件大概是 ByteDance 內部需求驅動的 ── 內建支援 Telegram、Slack、Feishu/Lark、WeChat、WeCom、DingTalk 六個平台，用 long-polling 或 WebSocket，不需要 public IP。

實際用途：你在 Slack 對 Deer-flow 下指令「幫我整理上週競品動態的 report」，它跑完後直接 push 結果回 Slack threading。是 Slack-bot 行為，不是 Slack-app 行為（差別在不需要 user 主動拉訊息）。

對個人使用者這個功能不是必要，但對團隊把 Deer-flow 當共享 agent service 用，message gateway 等於免費的部署層。

任務生命週期

三、安裝與部署

Deer-flow 的安裝體驗算不錯，但有幾個雷要注意。

3.1 三種安裝模式

git clone https://github.com/bytedance/deer-flow.git
cd deer-flow

# 互動式向導（約 2 分鐘）
make setup

# Docker 模式（推薦）
make docker-init    # 拉 sandbox image
make docker-start   # 啟動服務

# 本地開發
make install        # 裝 dependency 跟 git hooks
make dev            # 本機跑

啟動後 web UI 在 http://localhost:2026。

3.2 部署資源建議

文件給了很實際的數字（不是空泛的「視情況而定」）：

本地開發：4 vCPU / 8GB RAM / 20GB SSD（建議 8 / 16GB RAM）
Docker 開發：同上，但因為 bind mount 多需要額外 headroom
Production：8-16 vCPU / 16-32GB RAM / 40GB+ SSD

實測下來 8GB RAM 跑單人 deep research 任務勉強夠，但 sub-agent fan-out 多一點（5+）就會 swap，建議直接給 16GB。

3.3 LLM provider 抽象

Deer-flow 把 LLM 當 plug-in 處理，內建支援：

OpenAI（GPT-4o、GPT-5）
Anthropic Claude（API 或 Claude Code OAuth ── 後者免單獨買 API key）
DeepSeek、Kimi、Doubao（ByteDance 自家）
Open Router 跟其他 OpenAI-compatible gateway
本地部署：vLLM with reasoning model support

文件特別提到推薦模型必須有：long context（100k+ tokens）、reasoning、multimodal、reliable tool-use。Claude 跟 GPT-5 是預設，DeepSeek 的 reasoning model 也測過。

四、實測：研究 RAG 論文 + 出 slide deck

最能展現 Deer-flow 強項的任務就是這種「跨多個來源 + 要出結構化輸出」的需求。

4.1 任務指令

我給 lead agent 的 prompt 是：「研究 2026 年 1-4 月發表的 RAG 相關論文（特別是 long-context retrieval 跟 multi-hop reasoning），整理成 12 張 slide 的 deck，附 citation。」

這個任務在傳統 single-agent 上會有兩個問題：（1）一篇一篇查論文 context 會炸；（2）寫 slide 的時候已經忘記前面查到什麼。

4.2 Deer-flow 的 fan-out 行為

實際 trace 出來的執行流程：

Plan 階段（約 2 分鐘）── lead agent 把任務拆成 5 個 sub-task：(a) keyword search 跟論文清單收集、(b)-(e) 四個論文各別 deep read
Fan-out（並行）── 同時 spawn 4 個 read sub-agent，每個給定一篇論文的 arXiv URL，scoped context 只有「讀這篇 + 寫摘要 + 列關鍵 figure」
Execute（約 25 分鐘）── 4 個 sub-agent 在各自的 Docker sandbox 跑 arxiv_download tool 拉 PDF，用 markitdown 轉 markdown，寫摘要
Converge ── lead agent 收到 4 份結構化摘要，加上 keyword search 的 overview，組成 deck outline
Output ── 用內建 slide-deck skill 生成 PPTX，含每張 slide 的 speaker note 跟 citation footer

整個任務跑了大約 38 分鐘，token 用量約 180K（lead agent 50K，4 個 sub-agent 各 30-35K）。同樣任務我用 Claude Code single thread 跑會超過 250K token 而且需要我中途插手 reset context。

4.3 sandbox 隔離的價值

中途有一個 sub-agent 抓論文時 PDF 解析失敗，sandbox 內的 Python 跑壞拋了 exception，lead agent 收到「結果失敗」的訊號後啟動 retry，換 fallback tool（用 grobid 而不是 markitdown）。整個過程其他 3 個 sub-agent 完全不受影響繼續跑。

這就是 sandbox 的價值 ── 沒有 sandbox 的話一個 sub-agent 噴錯會污染整個 session，Deer-flow 的故障隔離是真的隔離。

五、跟其他 agent harness 的差異

最常被問到的對照題。

5.1 vs Symphony / OpenManus

Symphony、OpenManus 是 single-thread agent loop，沒有 sub-agent fan-out。他們適合「一條 task list 跑到底」的線性任務（例：訂機票、爬一個網站），但跑「需要並行查多個來源」的任務會慢。

Deer-flow 強在並行，但代價是 plan 階段需要一個夠強的 LLM（< 100K context、無 reasoning 的模型 plan 不出好的 fan-out 計畫）。

5.2 vs Manus / GeneralAgent / ChatGPT deep research

Manus、ChatGPT deep research 是 product，不是 framework。你不能 fork、不能改 prompt、不能換 LLM、不能塞自己的 tool。

Deer-flow 是 harness，你寫 skill（markdown 格式的 workflow 定義）、塞自己的 MCP server、換 LLM provider，所有東西都可改。代價是要自己維運。

5.3 vs LangGraph 直接寫

「我自己拿 LangGraph 寫一套行不行？」當然行，但你會花很多時間重發明 sandbox、memory dedup、message gateway 這些 Deer-flow 已經做完的東西。文件講得很實在 ── DeerFlow differs from LangGraph or CrewAI by providing opinions and defaults rather than just building blocks。

如果你的任務 80% 落在 deep research、code scaffold、slide gen 這些情境，直接用 Deer-flow 比自寫快十倍。如果你需要高度客製的 orchestration（例：你有特殊的 agent voting 機制），Deer-flow 的 opinion 反而會擋你，這時直接拿 LangGraph 寫比較好。

六、風險與生產環境提醒

實戰幾週發現的真實風險。

6.1 sub-agent token 燒爆

第一次跑 deep research 我沒設 sub-agent 上限，lead agent 一次 spawn 8 個。每個 sub-agent 跑 30 分鐘，總 token 用量超過 600K，Anthropic API 額度直接燒掉一大半。

解法：在 lead agent 的 system prompt 強制 spawn at most 4 sub-agents 或 only spawn sub-agent when token budget allows。Deer-flow config 有 max_concurrent_subagents 欄位可以設，但預設沒上限。

6.2 sandbox 配錯污染 host

Local 模式跑 sandbox 是有風險的 ── agent 拿到 bash 之後理論上能跑任何指令。文件明確說：

DeerFlow is designed by default to be deployed in a local trusted environment.

我的建議：production 一律用 Docker 或 K8s sandbox，local mode 只在開發階段用。Docker 模式預設就有 namespace 隔離跟 read-only host mount，相對安全。

6.3 memory 偏見累積

memory dedup 雖然能避免重複寫入，但對「人格偏好」這類 soft memory 還是會累積偏見。我跑了一個月後發現 agent 會自動套用我之前的 stylistic preference（例：寫程式喜歡用 type hint），但這在跨任務情境會變干擾。

定期 inspect memory 並手動清掉不該長期保留的 entry 是必要的維運動作，repo 提供 make memory-inspect 工具。

6.4 安全性：generated artifact 強制下載

文件特別提到一個 XSS mitigation ── forcing generated artifacts with active web content types to download as attachments rather than render inline。意思是 agent 生成的 HTML/JS 不會在 web UI 直接 render，會強制下載。

這是必要的設計 ── 如果 agent 生成 HTML 直接 render，惡意 prompt 可以讓 agent 寫出竊取 cookie 的 script。Deer-flow 把這條 baseline 拉到正確位置。

七、適合什麼場景

整理三個 sweet spot：

跨多來源的 deep research ── 競品分析、論文整理、市場掃描，任何「要查 N 個東西然後合成 report」的任務
長時間自動化交付 ── 從 brief 到 slide deck、從 spec 到 code scaffold、從 raw data 到 dashboard 的 end-to-end 工作
團隊共享 agent service ── 配 message gateway 之後可以變成內部 Slack bot，讓所有同事下指令

不適合的場景：低延遲互動（agent 思考 + sandbox 執行有秒級延遲）、極度客製化的 multi-agent 投票機制、單純 chat completion 取代（殺雞用牛刀）。

Deer-flow 不是要取代 Claude Code 或 Cursor 這種 IDE 級 agent，它的位置在「研究員 / 分析師 / 報告產生器」這個層 ── 你下午丟一個任務，傍晚回來看結果。它證明了 long-horizon agent 是可以工程化的，不只是 demo 級的 toy。

參考資料

發佈留言取消回覆

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料。