Hugging Face BlogJun 23, 2023, 12:00 AMimportant 75
關於 Open LLM 排行榜,到底發生了什麼事?評測分數差異深度解析
Original: What's going on with the Open LLM Leaderboard?
### Background: The Gap Between Leaderboard Scores and Paper Results By mid-2023, Hugging Face's Open LLM Leaderboard had become the…
本文探討 Hugging Face Open LLM 排行榜上模型分數(特別是 MMLU)與官方論文宣稱不一致的原因。Hugging Face 指出,評測對 Prompt 格式、Few-shot 設定及 Token 機率計算方式極為敏感。為了確保公平與可重複性,排行榜統一採用 EleutherAI 的 lm-evaluation-harness,呼籲社群建立標準化評測規範。
### Background: The Gap Between Leaderboard Scores and Paper Results
Full summary
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Hugging Face Blog →Summaries are AI-generated; the original article is authoritative.