Latest in AI

Showing:BenchmarkDevelopersMistralClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

介紹 HELMET：全面評估長文本語言模型（Long-context LLMs）的新一代基準測試★ 80
Hugging Face Blog424 days agoRelease
### Background and Pain Points: Moving Beyond the Overly Simple "Needle in a Haystack" Test In recent years, the context window length supported by large…
Hugging Face 推出 NPHardEval 排行榜：透過計算複雜度與動態更新揭示大型語言模型的推理能力★ 75
Hugging Face Blog863 days agoRelease
Hugging Face has announced the launch of the new **NPHardEval** leaderboard — a benchmark specifically designed to evaluate the reasoning capabilities of large…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog865 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…