As multimodal large language models (such as GPT-4o, Gemini, and various open-source audio models) continue to proliferate, AI's ability to process audio has…
### Background and Challenges: The Difficulty of Evaluating Non-English LLMs In the current landscape of large language model (LLM) development, evaluating…
Hugging Face has officially launched the "Open Japanese LLM Leaderboard," a community-driven platform dedicated to evaluating the performance of…
This article from the Hugging Face blog introduces "The First Multilingual LLM Debate Competition." As large language models (LLMs) have rapidly advanced…
As large language models (LLMs) have rapidly advanced, traditional static benchmarks (such as MMLU) have increasingly faced saturation and gaming problems. As…
Hugging Face has officially launched the "Open FinLLM Leaderboard" — a new platform dedicated to evaluating and tracking the performance of large language…
The Hugging Face team and its collaborators have jointly launched a new benchmark called "BenCzechMark," designed to evaluate the understanding and generation…
The Hugging Face team published a blog post announcing that their Code Agent, developed using the `transformers` library, achieved a breakthrough score on the…
As large language models (LLMs) have made tremendous strides in code generation, the long-standing industry gold standard — the HumanEval benchmark — has…
Hugging Face has announced the official launch of the "Open Medical-LLM Leaderboard" in collaboration with researchers from Open Life Science AI and the…
As code large language models (Code LLMs) develop rapidly, fairly and accurately evaluating their capabilities has become a major challenge. Traditional…
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…
Hugging Face recently announced the launch of "TTS Arena" (Text-to-Speech Arena), a brand-new open-source platform specifically designed for evaluating…
Hugging Face has announced the launch of the new **NPHardEval** leaderboard — a benchmark specifically designed to evaluate the reasoning capabilities of large…
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…
As large language models (LLMs) and generative AI exploded in popularity, demand for computing power surged dramatically, leaving Nvidia GPUs (such as the…
In the field of natural language processing (NLP), text embeddings — the technique of converting text into real-valued vectors — are a foundational technology…
In late 2022, as massive language models like BLOOM and OPT emerged one after another, the AI community faced a core pain point: how to effectively and…