讓 GPU 毫無閒置：利用 TRL 中協同部署的 vLLM 解鎖高效能強化學習訓練

Original: No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular…

Hugging Face 的 TRL 團隊推出與 vLLM 協同部署（Co-located）的新功能。在進行線上強化學習（如 PPO、GRPO）訓練時，生成階段常是效能瓶頸。透過在相同 GPU 上同時運行訓練與 vLLM 推理引擎，此技術能無縫共享權重並利用 vLLM 的高效生成能力，顯著提升 GPU 利用率並縮短整體訓練時間。

In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular GRPO — there are typically two main phases: the **generation phase (rollout/generation)** and the **update phase (training/optimization)**. Traditionally, the generation phase uses the standard Hugging Face `generate()` function, which is relatively slow, causing expensive GPUs to have extremely low utilization during this phase and creating a serious performance bottleneck.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.