Latest in AI

Showing:BenchmarkOpen-sourceClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

推出 LiveCodeBench 排行榜：全面且無污染的程式碼大語言模型評估★ 75
Hugging Face Blog789 days agoRelease
As code large language models (Code LLMs) develop rapidly, fairly and accurately evaluating their capabilities has become a major challenge. Traditional…
Hugging Face 推出 ConTextual 排行榜：評估多模態模型在富含文本場景中的圖文聯合推理能力★ 75
Hugging Face Blog831 days agoRelease
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…
Hugging Face 推出 NPHardEval 排行榜：透過計算複雜度與動態更新揭示大型語言模型的推理能力★ 75
Hugging Face Blog863 days agoRelease
Hugging Face has announced the launch of the new **NPHardEval** leaderboard — a benchmark specifically designed to evaluate the reasoning capabilities of large…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog865 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…
訓練與推論速度大對決：Habana Gaudi®2 效能超越 Nvidia A100 80GB
Hugging Face Blog1,278 days agoCommentary
As large language models (LLMs) and generative AI exploded in popularity, demand for computing power surged dramatically, leaving Nvidia GPUs (such as the…
超大型語言模型及其評估方法：Hugging Face 推出 Hub 上的零樣本評估★ 75
Hugging Face Blog1,350 days agoNew Tool
In late 2022, as massive language models like BLOOM and OPT emerged one after another, the AI community faced a core pain point: how to effectively and…

← PreviousPage 2