Hugging Face 正式引入 Decision Transformers：將強化學習視為序列建模任務

Original: Introducing Decision Transformers on Hugging Face 🤗

Hugging Face has announced official support for the Decision Transformer (DT) in its renowned `transformers` library. This represents a new…

Hugging Face 宣布在其 transformers 函式庫中正式支援 Decision Transformer (DT)。此模型顛覆傳統強化學習（RL）方法，不使用價值函數或策略梯度，而是將狀態、動作與目標回報視為序列，利用類似 GPT 的自注意力機制來預測下一步動作。這項整合大幅降低了離線強化學習（Offline RL）的門檻，讓開發者能用熟悉的 Transformer 工具鏈進行決策模型的訓練。

Hugging Face has announced official support for the Decision Transformer (DT) in its renowned `transformers` library. This represents a new paradigm that fundamentally reimagines traditional reinforcement learning (RL). Conventional RL algorithms — such as DQN and PPO — typically rely on complex dynamic programming, value function estimation, or policy gradients, and can suffer from instability and divergence when trained on offline data (Offline RL). The Decision Transformer reframes the RL problem as one of "conditional sequence modeling." It combines past states, actions, and expected "returns-to-go" into a sequence, which is then fed directly into a GPT-style autoregressive Transformer architecture.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.