Hugging Face and IBM Research have jointly announced the launch of the "Open Agent Leaderboard," aimed at establishing an objective, standardized, and fully…
The Technology Innovation Institute (TII) of the United Arab Emirates — the organization behind the well-known open-source model Falcon — has officially…
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
Hugging Face recently announced a major upgrade to its Arabic Large Language Model (LLM) leaderboard, aiming to provide a more credible and comprehensive…
Hugging Face's Open LLM Leaderboard has long served as an important barometer for measuring the capabilities of open-source large language models (LLMs)…
Hugging Face, in collaboration with its partners, has officially launched the "Open Arabic LLM Leaderboard 2.0." With the explosive growth of Arabic large…
### Background and Challenges: The Difficulty of Evaluating Non-English LLMs In the current landscape of large language model (LLM) development, evaluating…
Hugging Face has officially launched the "Open Japanese LLM Leaderboard," a community-driven platform dedicated to evaluating the performance of…
Hugging Face has officially launched the "Open FinLLM Leaderboard" — a new platform dedicated to evaluating and tracking the performance of large language…
Hugging Face has partnered with independent AI evaluation organization Artificial Analysis to officially launch the "Text to Image Leaderboard & Arena." This…
Hugging Face has announced the launch of the "Open Arabic LLM Leaderboard," an important initiative aimed at advancing Arabic natural language processing (NLP)…
Hugging Face has officially launched the "Open Leaderboard for Hebrew LLMs," an open-source evaluation platform specifically designed for Hebrew large language…
Hugging Face has announced a partnership with the independent AI performance analytics firm Artificial Analysis, officially integrating its "LLM Performance…
Hugging Face has announced the launch of the new "Open Chain of Thought (CoT) Leaderboard," a public platform specifically designed to evaluate and compare the…
Hugging Face has announced the official launch of the "Open Medical-LLM Leaderboard" in collaboration with researchers from Open Life Science AI and the…
Hugging Face and South Korea's leading AI startup Upstage have jointly announced the launch of the "Open Ko-LLM Leaderboard." This is a brand-new evaluation…
While large language models (LLMs) have demonstrated remarkable generative capabilities across many domains, "hallucination" — where a model confidently…
### Introduction: Capability Is Not Safety — A New Benchmark for LLM Safety Evaluation As large language models (LLMs) are adopted more deeply across…
In the open-source AI community, the Hugging Face Open LLM Leaderboard serves as an important benchmark for evaluating model capabilities. However, many…
The Hugging Face Open LLM Leaderboard has long served as an important benchmark for the community to evaluate the capabilities of open-source models. However…
Hugging Face has officially launched the "Object Detection Leaderboard," a brand-new evaluation platform designed for the computer vision field. With the rapid…
### Background: The Gap Between Leaderboard Scores and Paper Results By mid-2023, Hugging Face's Open LLM Leaderboard had become the community's go-to platform…