GaLore：在消費級硬體上訓練大型語言模型的突破性技術

Original: GaLore: Advancing Large Model Training on Consumer-grade Hardware

As the parameter counts of large language models (LLMs) have skyrocketed, the hardware requirements for training and fine-tuning these…

Hugging Face 介紹了 GaLore（梯度低秩投影）技術，這是一種新型的記憶體優化訓練方法。與 LoRA 不同，GaLore 透過將梯度投影到低秩空間，大幅減少了優化器狀態的記憶體佔用。這使得開發者可以直接在單張 24GB 記憶體的消費級顯卡（如 RTX 4090）上，進行 7B 模型的全參數微調甚至從頭預訓練。

As the parameter counts of large language models (LLMs) have skyrocketed, the hardware requirements for training and fine-tuning these models have risen accordingly. Traditionally, when performing full-parameter training with optimizers like Adam, the memory consumed by optimizer states is typically two to three times that of the model weights themselves. This has made training large models on consumer-grade hardware nearly impossible. While parameter-efficient fine-tuning (PEFT) techniques such as LoRA (Low-Rank Adaptation) have alleviated memory pressure during fine-tuning, they cannot be directly applied to pre-training from scratch and limit the expressive capacity of learnable parameters.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.