Latest in AI

Showing:BenchmarkResearchersClear ×

🔥 Trending today

anthropic7 export-controls5 model-access3 ai-infrastructure3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

評估音訊推理能力：Hugging Face 推出 Big Bench Audio 基準測試★ 75
Hugging Face Blog541 days agoRelease
As multimodal large language models (such as GPT-4o, Gemini, and various open-source audio models) continue to proliferate, AI's ability to process audio has…
重新思考阿拉伯語大模型評估：AraGen 基準測試與 3C3H 評估框架上線 Hugging Face
Hugging Face Blog557 days agoRelease
### Background and Challenges: The Difficulty of Evaluating Non-English LLMs In the current landscape of large language model (LLM) development, evaluating…
Hugging Face 推出全新「開放式日語 LLM 排行榜」，加速日語大語言模型評測★ 75
Hugging Face Blog571 days agoNew Tool
Hugging Face has officially launched the "Open Japanese LLM Leaderboard," a community-driven platform dedicated to evaluating the performance of…
讓大型模型展開辯論：首屆多語言 LLM 辯論賽★ 75
Hugging Face Blog571 days agoRelease
This article from the Hugging Face blog introduces "The First Multilingual LLM Debate Competition." As large language models (LLMs) have rapidly advanced…
Hugging Face 與 Atla 推出「Judge Arena」：評估 LLM 作為裁判能力的全新基準測試★ 80
Hugging Face Blog572 days agoRelease
As large language models (LLMs) have rapidly advanced, traditional static benchmarks (such as MMLU) have increasingly faced saturation and gaming problems. As…
Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75
Hugging Face Blog618 days agoRelease
Hugging Face has officially launched the "Open FinLLM Leaderboard" — a new platform dedicated to evaluating and tracking the performance of large language…
🇨🇿 BenCzechMark：你的 LLM 能聽懂捷克語嗎？全新捷克語基準測試發布
Hugging Face Blog621 days agoRelease
The Hugging Face team and its collaborators have jointly launched a new benchmark called "BenCzechMark," designed to evaluate the understanding and generation…
Hugging Face 的 Transformers Code Agent 刷新 GAIA 基準測試紀錄 🏅★ 80
Hugging Face Blog713 days agoRelease
The Hugging Face team published a blog post announcing that their Code Agent, developed using the `transformers` library, achieved a breakthrough score on the…
BigCodeBench：下一代 Code LLM 評測基準 HumanEval 的繼承者★ 80
Hugging Face Blog726 days agoRelease
As large language models (LLMs) have made tremendous strides in code generation, the long-standing industry gold standard — the HumanEval benchmark — has…
Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75
Hugging Face Blog786 days agoRelease
Hugging Face has announced the official launch of the "Open Medical-LLM Leaderboard" in collaboration with researchers from Open Life Science AI and the…
推出 LiveCodeBench 排行榜：全面且無污染的程式碼大語言模型評估★ 75
Hugging Face Blog789 days agoRelease
As code large language models (Code LLMs) develop rapidly, fairly and accurately evaluating their capabilities has become a major challenge. Traditional…
Hugging Face 推出 ConTextual 排行榜：評估多模態模型在富含文本場景中的圖文聯合推理能力★ 75
Hugging Face Blog831 days agoRelease
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…
Hugging Face 推出 TTS Arena：用社群盲測群眾外包評測語音合成模型★ 75
Hugging Face Blog838 days agoNew Tool
Hugging Face recently announced the launch of "TTS Arena" (Text-to-Speech Arena), a brand-new open-source platform specifically designed for evaluating…
Hugging Face 推出 NPHardEval 排行榜：透過計算複雜度與動態更新揭示大型語言模型的推理能力★ 75
Hugging Face Blog863 days agoRelease
Hugging Face has announced the launch of the new **NPHardEval** leaderboard — a benchmark specifically designed to evaluate the reasoning capabilities of large…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog865 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…
訓練與推論速度大對決：Habana Gaudi®2 效能超越 Nvidia A100 80GB
Hugging Face Blog1,278 days agoCommentary
As large language models (LLMs) and generative AI exploded in popularity, demand for computing power surged dramatically, leaving Nvidia GPUs (such as the…
MTEB：海量文字嵌入基準測試（Massive Text Embedding Benchmark）正式推出★ 85
Hugging Face Blog1,334 days agoRelease
In the field of natural language processing (NLP), text embeddings — the technique of converting text into real-valued vectors — are a foundational technology…
超大型語言模型及其評估方法：Hugging Face 推出 Hub 上的零樣本評估★ 75
Hugging Face Blog1,350 days agoNew Tool
In late 2022, as massive language models like BLOOM and OPT emerged one after another, the AI community faced a core pain point: how to effectively and…

← PreviousPage 2

Latest in AI

評估音訊推理能力：Hugging Face 推出 Big Bench Audio 基準測試★ 75

重新思考阿拉伯語大模型評估：AraGen 基準測試與 3C3H 評估框架上線 Hugging Face

Hugging Face 推出全新「開放式日語 LLM 排行榜」，加速日語大語言模型評測★ 75

讓大型模型展開辯論：首屆多語言 LLM 辯論賽★ 75

Hugging Face 與 Atla 推出「Judge Arena」：評估 LLM 作為裁判能力的全新基準測試★ 80

Hugging Face 推出 Open FinLLM 排行榜：專為金融領域大語言模型打造的開源評測基準★ 75

🇨🇿 BenCzechMark：你的 LLM 能聽懂捷克語嗎？全新捷克語基準測試發布

Hugging Face 的 Transformers Code Agent 刷新 GAIA 基準測試紀錄 🏅★ 80

BigCodeBench：下一代 Code LLM 評測基準 HumanEval 的繼承者★ 80

Hugging Face 推出 Open Medical-LLM 排行榜：標準化評估醫療保健領域的大型語言模型★ 75

推出 LiveCodeBench 排行榜：全面且無污染的程式碼大語言模型評估★ 75

Hugging Face 推出 ConTextual 排行榜：評估多模態模型在富含文本場景中的圖文聯合推理能力★ 75

Hugging Face 推出 TTS Arena：用社群盲測群眾外包評測語音合成模型★ 75

Hugging Face 推出 NPHardEval 排行榜：透過計算複雜度與動態更新揭示大型語言模型的推理能力★ 75

Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75

訓練與推論速度大對決：Habana Gaudi®2 效能超越 Nvidia A100 80GB

MTEB：海量文字嵌入基準測試（Massive Text Embedding Benchmark）正式推出★ 85

超大型語言模型及其評估方法：Hugging Face 推出 Hub 上的零樣本評估★ 75