Hugging Face BlogJul 1, 2024, 12:00 AMimportant 80

Hugging Face 的 Transformers Code Agent 刷新 GAIA 基準測試紀錄 🏅

Original: Our Transformers Code Agent beats the GAIA benchmark 🏅

The Hugging Face team published a blog post announcing that their Code Agent, developed using the `transformers` library, achieved a…

Hugging Face 發表全新的 Transformers Code Agent，透過讓 AI 撰寫並執行 Python 程式碼來解決複雜任務。該方法在評估通用 AI 助理能力的 GAIA 基準測試上取得了 SOTA（當前最佳）表現，證明了「程式碼執行」作為 Agent 推理工具，遠比傳統的 JSON 工具調用（Tool Calling）更具彈性與效率。此專案已完全開源，為開發者提供構建高效能 Agent 的新選擇。

The Hugging Face team published a blog post announcing that their Code Agent, developed using the `transformers` library, achieved a breakthrough score on the GAIA (General AI Assistants) benchmark — an evaluation designed to test AI assistants' ability to perform complex, multi-step, multimodal tasks in the real world, such as web browsing, file handling, coding, and mathematical computation. GAIA is extremely challenging for current large language models (LLMs).

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

gpt llama open-source transformers smolagents #agents #gaia #code-execution #benchmark #smolagents

Summaries are AI-generated; the original article is authoritative.