Hugging Face BlogMar 11, 2025, 8:40 PMimportant 85

Open R1 第三次更新：Hugging Face 釋出開源推理模型與 GRPO 訓練優化細節

Original: Open R1: Update #3

Since its launch, Hugging Face's Open R1 project has been dedicated to replicating the reasoning capabilities of DeepSeek-R1 in a fully…

Hugging Face 發表開源推理模型專案 Open R1 的第三次技術更新。本次更新重點在於釋出基於 Qwen/Llama 的全新推理模型，並詳細公開了使用 TRL 進行 GRPO（群體相對策略優化）的訓練細節。團隊成功解決了訓練中的「獎勵黑客」問題，並開源了完整的訓練數據集與配方，顯著降低了社群重現 DeepSeek-R1 推理能力的門檻。

Since its launch, Hugging Face's Open R1 project has been dedicated to replicating the reasoning capabilities of DeepSeek-R1 in a fully open-source manner. In its latest third update (Update #3), the research team has delivered several breakthrough advances, including the release of new models, optimization of the training framework, and the public disclosure of key technical details.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source llama other huggingface #reasoning #rlhf #grpo #fine-tuning #open-r1

Summaries are AI-generated; the original article is authoritative.