Hugging Face BlogJan 20, 2026, 3:20 AMimportant 80

Microsoft 推出 Differential Transformer V2:大幅提升差分注意力機制效率與長文本效能

Original: Differential Transformer V2

Microsoft's research team has officially published **Differential Transformer V2 (Diff-Transformer V2)** on Hugging Face. **Core Technical…

Microsoft 於 Hugging Face 發表 Differential Transformer V2(Diff-Transformer V2)。延續 V1 透過雙注意力地圖相減來消除雜訊的設計,V2 重點解決了計算與記憶體開銷問題。新版本引入了高度優化的 CUDA 核心與 FlashAttention 整合,並釋出預訓練模型與 Hugging Face 整合,讓開發者能以更低成本部署具備強大長文本與抗噪能力的模型。

Microsoft's research team has officially published **Differential Transformer V2 (Diff-Transformer V2)** on Hugging Face.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.