This blog post published by the ServiceNow AI team delves into the major transition of the open-source large language model inference engine vLLM from V0 to…
In today's AI landscape, the performance gap between open-weights models (such as Meta's Llama family) and closed-source models (such as OpenAI's GPT and…
As large language models (LLMs) become increasingly widespread, more and more companies are attempting to deploy AI agents in e-commerce customer service and…
Nathan Lambert, a prominent AI expert, former Alignment Scientist at Hugging Face, and founder of the popular newsletter Interconnects, recently wrote about…
Hugging Face has officially announced the release of TRL (Transformer Reinforcement Learning) v1.0. This is a major milestone, marking TRL's transformation…
This article takes a deep dive into one of the most contentious topics in artificial intelligence: AI "self-improvement" and whether it will trigger a "fast…
With the success of reasoning models such as DeepSeek-R1, reinforcement learning (RL/RLHF) has become a critical technique for improving the alignment and…
The Hugging Face official blog has announced a collaboration with RapidFire AI, bringing a revolutionary performance improvement to its popular TRL…
This article, published on the Hugging Face Blog, explores one of the most cutting-edge topics in the AI field today: **the challenges of alignment and…
In the reinforcement learning from human feedback (RLHF) training process for large language models — whether PPO or the recently popular GRPO — there are…
Since the explosive rise of DeepSeek-R1, GRPO (Group Relative Policy Optimization) has become the most widely discussed reinforcement learning (RL) technique…
Hugging Face's Open R1 project aims to fully open-source and replicate the training pipeline of DeepSeek-R1's reasoning model. In the latest fourth update…
Since its launch, Hugging Face's Open R1 project has been dedicated to replicating the reasoning capabilities of DeepSeek-R1 in a fully open-source manner. In…
Hugging Face has officially published the second technical update (Update #2) for the Open R1 project, which aims to replicate DeepSeek-R1's reasoning model…
### Project Background: Recreating the Open-Source Miracle of DeepSeek-R1 The emergence of DeepSeek-R1 sent shockwaves through the global AI community…
### Introduction: An Important Piece of the Open-Source Image Generation Puzzle As text-to-image (T2I) technology advances rapidly, ensuring that AI-generated…
The open-source data curation and annotation platform Argilla has officially released version 2.4, with the core of this update being deep integration with…
### Background In the current development of large language models (LLMs), high-quality alignment data (such as the preference data required for RLHF and DPO)…
In recent years, methods such as Direct Preference Optimization (DPO) have become mainstream for large language model (LLM) alignment, as they eliminate the…
### Background and Challenge: The High-Quality Data Bottleneck In the current development of generative AI and large language models (LLMs), the industry…
This technical blog post from Hugging Face takes an in-depth look at the latest techniques in "preference tuning," with a particular focus on **Direct…
This technical blog post from Hugging Face takes an in-depth look at the critical "implementation details" that are routinely glossed over in academic papers…
### Background and Pain Points Traditional RLHF (Reinforcement Learning from Human Feedback), while achieving enormous success with models like ChatGPT…
In the development of large language models (LLMs), RLHF (Reinforcement Learning from Human Feedback) is the critical step for aligning models with human…
This classic blog post from Hugging Face provides an extremely valuable hands-on guide for the open-source community, detailing how to fine-tune the LLaMA…
This technical blog post from Hugging Face introduces how to combine TRL (Transformer Reinforcement Learning) and PEFT (Parameter-Efficient Fine-Tuning)…
Amid the generative AI wave sparked by ChatGPT, Hugging Face published this in-depth article exploring how to transform "base language models" — which can only…
The release of ChatGPT in late 2022 triggered an explosion in generative AI, and the most critical technology behind it is Reinforcement Learning from Human…
Proximal Policy Optimization (PPO) is a deep reinforcement learning (DRL) algorithm proposed by OpenAI in 2017. Due to its ease of implementation, training…
This classic tutorial from Hugging Face is the first part of its "Deep Reinforcement Learning Course," designed to give readers a solid foundation in…