透過 DeepSpeed 與 FairScale 的 ZeRO 技術,讓 Hugging Face 訓練容納更多參數且速度更快★ 80
Hugging Face Blog·1,972 days ago·Release
As the parameter scale of Transformer models (such as GPT, T5, etc.) grows exponentially, deep learning faces a severe "Memory Wall" challenge. With limited…