Hugging Face BlogJan 20, 2026, 3:20 AMimportant 80
Microsoft 推出 Differential Transformer V2:大幅提升差分注意力機制效率與長文本效能
Original: Differential Transformer V2
Microsoft's research team has officially published **Differential Transformer V2 (Diff-Transformer V2)** on Hugging Face. **Core Technical…
Microsoft 於 Hugging Face 發表 Differential Transformer V2(Diff-Transformer V2)。延續 V1 透過雙注意力地圖相減來消除雜訊的設計,V2 重點解決了計算與記憶體開銷問題。新版本引入了高度優化的 CUDA 核心與 FlashAttention 整合,並釋出預訓練模型與 Hugging Face 整合,讓開發者能以更低成本部署具備強大長文本與抗噪能力的模型。
Microsoft's research team has officially published **Differential Transformer V2 (Diff-Transformer V2)** on Hugging Face.
Full summary
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Hugging Face Blog →Summaries are AI-generated; the original article is authoritative.