重新思考如何衡量 AI 智慧:Google DeepMind 推出開源評測平台 Game Arena★ 78
Google DeepMind Blog·234 days ago·New Tool
With the rapid advancement of artificial intelligence, traditional static benchmarks (such as MMLU and GSM8K) are facing serious challenges. Many frontier…