Google DeepMind 推出評估 AGI 進程的「認知框架」，並同步舉辦 Kaggle 黑客松打造全新評估標準

Original: Measuring progress toward AGI: A cognitive framework

As large language models (LLMs) advance rapidly, traditional AI evaluation benchmarks (such as MMLU, GSM8K, and others) are quickly facing…

Google DeepMind 宣布推出一套用於衡量通用人工智慧（AGI）進展的「認知框架」，旨在解決傳統靜態基準測試容易因數據污染而失效的問題。該框架專注於評估系統的底層認知能力（如推理、規劃與學習）。同時，DeepMind 也在 Kaggle 上啟動了黑客松，邀請全球開發者與研究人員共同設計更具韌性、能真實反映 AGI 進程的評估工具。

As large language models (LLMs) advance rapidly, traditional AI evaluation benchmarks (such as MMLU, GSM8K, and others) are quickly facing the twin challenges of "saturation" and "data contamination." Many models appear to achieve impressive scores on tests, but often they have simply memorized patterns from their training data rather than genuinely developing general understanding and reasoning capabilities. To more accurately measure progress toward artificial general intelligence (AGI), Google DeepMind has proposed a new "Cognitive Framework." Rather than merely testing a model's performance on specific static tasks, this framework shifts focus to evaluating the system's underlying cognitive mechanisms — such as systematic generalization, causal reasoning, long-horizon planning, active learning, and self-correction abilities.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.