Latest in AI

Showing:leaderboardResearchersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Hugging Face 與 IBM 聯合推出 Open Agent Leaderboard：開源 AI 智能體效能評測全新基準★ 80
Hugging Face Blog27 days agoRelease
Hugging Face and IBM Research have jointly announced the launch of the "Open Agent Leaderboard," aimed at establishing an objective, standardized, and fully…
QIMMA ⛰：首個品質優先的阿拉伯語大型語言模型（LLM）排行榜
Hugging Face Blog54 days agoRelease
The Technology Innovation Institute (TII) of the United Arab Emirates — the organization behind the well-known open-source model Falcon — has officially…
Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75
Hugging Face Blog205 days agoRelease
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
Hugging Face 推出阿拉伯語 LLM 評估新標準：引入阿拉伯語指令遵循（IFEval）與更新 AraGen
Hugging Face Blog432 days agoRelease
Hugging Face recently announced a major upgrade to its Arabic Large Language Model (LLM) leaderboard, aiming to provide a more credible and comprehensive…
Hugging Face 推出 Math-Verify：修正 Open LLM Leaderboard 的數學評測偏差★ 78
Hugging Face Blog485 days agoNew Tool
Hugging Face's Open LLM Leaderboard has long served as an important barometer for measuring the capabilities of open-source large language models (LLMs)…
Hugging Face 推出第二代開源阿拉伯語大語言模型排行榜 (Open Arabic LLM Leaderboard 2)
Hugging Face Blog489 days agoRelease
Hugging Face, in collaboration with its partners, has officially launched the "Open Arabic LLM Leaderboard 2.0." With the explosive growth of Arabic large…
重新思考阿拉伯語大模型評估：AraGen 基準測試與 3C3H 評估框架上線 Hugging Face
Hugging Face Blog557 days agoRelease
### Background and Challenges: The Difficulty of Evaluating Non-English LLMs In the current landscape of large language model (LLM) development, evaluating…
Hugging Face 推出全新「開放式日語 LLM 排行榜」，加速日語大語言模型評測★ 75
Hugging Face Blog571 days agoNew Tool
Hugging Face has officially launched the "Open Japanese LLM Leaderboard," a community-driven platform dedicated to evaluating the performance of…
Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75
Hugging Face Blog618 days agoRelease
Hugging Face has officially launched the "Open FinLLM Leaderboard" — a new platform dedicated to evaluating and tracking the performance of large language…
Hugging Face 聯合 Artificial Analysis 推出「文字生成圖片」排行榜與競技場★ 75
Hugging Face Blog738 days agoNew Tool
Hugging Face has partnered with independent AI evaluation organization Artificial Analysis to officially launch the "Text to Image Leaderboard & Arena." This…
Hugging Face 推出 Open Arabic LLM 排行榜，加速阿拉伯語大語言模型評測與發展
Hugging Face Blog761 days agoRelease
Hugging Face has announced the launch of the "Open Arabic LLM Leaderboard," an important initiative aimed at advancing Arabic natural language processing (NLP)…
Hugging Face 推出希伯來語 LLM 開放排行榜，推動非英語系 AI 模型評測
Hugging Face Blog770 days agoRelease
Hugging Face has officially launched the "Open Leaderboard for Hebrew LLMs," an open-source evaluation platform specifically designed for Hebrew large language…
Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75
Hugging Face Blog772 days agoNew Tool
Hugging Face has announced a partnership with the independent AI performance analytics firm Artificial Analysis, officially integrating its "LLM Performance…
Hugging Face 推出 Open Chain of Thought (CoT) 排行榜：專注評估開源模型的推理與思考鏈能力★ 75
Hugging Face Blog782 days agoRelease
Hugging Face has announced the launch of the new "Open Chain of Thought (CoT) Leaderboard," a public platform specifically designed to evaluate and compare the…
Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75
Hugging Face Blog786 days agoRelease
Hugging Face has announced the official launch of the "Open Medical-LLM Leaderboard" in collaboration with researchers from Open Life Science AI and the…
Hugging Face 與 Upstage 推出 Open Ko-LLM 排行榜：引領韓國大語言模型評估生態系
Hugging Face Blog845 days agoRelease
Hugging Face and South Korea's leading AI startup Upstage have jointly announced the launch of the "Open Ko-LLM Leaderboard." This is a brand-new evaluation…
Hugging Face 推出「幻覺排行榜」，開源量化評估大型語言模型的幻覺率★ 75
Hugging Face Blog867 days agoRelease
While large language models (LLMs) have demonstrated remarkable generative capabilities across many domains, "hallucination" — where a model confidently…
Hugging Face 推出 AI Secure LLM 安全排行榜：基於 DecodingTrust 框架深度評估大模型信任度★ 75
Hugging Face Blog870 days agoRelease
### Introduction: Capability Is Not Safety — A New Benchmark for LLM Safety Evaluation As large language models (LLMs) are adopted more deeply across…
如何建立自己的 Hugging Face 排行榜：以 Vectara 幻覺排行榜為例的完整指南★ 75
Hugging Face Blog884 days agoTutorial
In the open-source AI community, the Hugging Face Open LLM Leaderboard serves as an important benchmark for evaluating model capabilities. However, many…
Open LLM Leaderboard：深入解析 DROP 基準測試與模型「刷榜」現象★ 75
Hugging Face Blog926 days agoCommentary
The Hugging Face Open LLM Leaderboard has long served as an important benchmark for the community to evaluate the capabilities of open-source models. However…
Hugging Face 推出全新「物件偵測排行榜」(Object Detection Leaderboard)
Hugging Face Blog1,000 days agoNew Tool
Hugging Face has officially launched the "Object Detection Leaderboard," a brand-new evaluation platform designed for the computer vision field. With the rapid…
關於 Open LLM 排行榜，到底發生了什麼事？評測分數差異深度解析★ 75
Hugging Face Blog1,087 days agoCommentary
### Background: The Gap Between Leaderboard Scores and Paper Results By mid-2023, Hugging Face's Open LLM Leaderboard had become the community's go-to platform…

Latest in AI

Hugging Face 與 IBM 聯合推出 Open Agent Leaderboard：開源 AI 智能體效能評測全新基準★ 80

QIMMA ⛰：首個品質優先的阿拉伯語大型語言模型（LLM）排行榜

Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75

Hugging Face 推出阿拉伯語 LLM 評估新標準：引入阿拉伯語指令遵循（IFEval）與更新 AraGen

Hugging Face 推出 Math-Verify：修正 Open LLM Leaderboard 的數學評測偏差★ 78

Hugging Face 推出第二代開源阿拉伯語大語言模型排行榜 (Open Arabic LLM Leaderboard 2)

重新思考阿拉伯語大模型評估：AraGen 基準測試與 3C3H 評估框架上線 Hugging Face

Hugging Face 推出全新「開放式日語 LLM 排行榜」，加速日語大語言模型評測★ 75

Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75

Hugging Face 聯合 Artificial Analysis 推出「文字生成圖片」排行榜與競技場★ 75

Hugging Face 推出 Open Arabic LLM 排行榜，加速阿拉伯語大語言模型評測與發展

Hugging Face 推出希伯來語 LLM 開放排行榜，推動非英語系 AI 模型評測

Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75

Hugging Face 推出 Open Chain of Thought (CoT) 排行榜：專注評估開源模型的推理與思考鏈能力★ 75

Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75

Hugging Face 與 Upstage 推出 Open Ko-LLM 排行榜：引領韓國大語言模型評估生態系

Hugging Face 推出「幻覺排行榜」，開源量化評估大型語言模型的幻覺率★ 75

Hugging Face 推出 AI Secure LLM 安全排行榜：基於 DecodingTrust 框架深度評估大模型信任度★ 75

如何建立自己的 Hugging Face 排行榜：以 Vectara 幻覺排行榜為例的完整指南★ 75

Open LLM Leaderboard：深入解析 DROP 基準測試與模型「刷榜」現象★ 75

Hugging Face 推出全新「物件偵測排行榜」(Object Detection Leaderboard)

關於 Open LLM 排行榜，到底發生了什麼事？評測分數差異深度解析★ 75