Hugging Face 的 Transformers Code Agent 刷新 GAIA 基準測試紀錄 🏅
Original: Our Transformers Code Agent beats the GAIA benchmark 🏅
The Hugging Face team published a blog post announcing that their Code Agent, developed using the `transformers` library, achieved a…
Hugging Face 發表全新的 Transformers Code Agent,透過讓 AI 撰寫並執行 Python 程式碼來解決複雜任務。該方法在評估通用 AI 助理能力的 GAIA 基準測試上取得了 SOTA(當前最佳)表現,證明了「程式碼執行」作為 Agent 推理工具,遠比傳統的 JSON 工具調用(Tool Calling)更具彈性與效率。此專案已完全開源,為開發者提供構建高效能 Agent 的新選擇。
The Hugging Face team published a blog post announcing that their Code Agent, developed using the `transformers` library, achieved a breakthrough score on the GAIA (General AI Assistants) benchmark — an evaluation designed to test AI assistants' ability to perform complex, multi-step, multimodal tasks in the real world, such as web browsing, file handling, coding, and mathematical computation. GAIA is extremely challenging for current large language models (LLMs).
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Hugging Face Blog →Summaries are AI-generated; the original article is authoritative.