Hugging Face BlogFeb 18, 2026, 4:15 PMimportant 80

IBM 與柏克萊加州大學推出 IT-Bench 與 MAST:診斷企業級 AI Agent 失敗原因的全新基準與框架

Original: IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

### The Pain Points of Enterprise AI Agents in Production: Why Do They Keep Failing? As large language models (LLMs) have rapidly advanced…

IBM 研究中心與柏克萊加州大學(UC Berkeley)合作發表了 IT-Bench 基準測試與 MAST 診斷框架。IT-Bench 模擬了真實的企業 IT 運維環境,而 MAST 則專門用來剖析 AI Agent 在執行多步驟任務時失敗的深層原因。研究指出,企業級 Agent 的失敗往往源於工具調用錯誤、狀態追蹤失效及錯誤累積,而非單純的 LLM 能力不足,這為未來 AIOps 的優化提供了明確方向。

### The Pain Points of Enterprise AI Agents in Production: Why Do They Keep Failing?

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.