Google DeepMind 推出評估 AGI 進程的「認知框架」,並同步舉辦 Kaggle 黑客松打造全新評估標準★ 85
Google DeepMind Blog·89 days ago·Release
As large language models (LLMs) advance rapidly, traditional AI evaluation benchmarks (such as MMLU, GSM8K, and others) are quickly facing the twin challenges…