Hugging Face BlogNov 20, 2024, 12:00 AMimportant 75

Hugging Face 儲存架構演進：從檔案到分塊（Chunks）提升儲存效率

Original: From Files to Chunks: Improving HF Storage Efficiency

The Hugging Face Hub currently hosts millions of AI models, datasets, and applications (Spaces), with total storage reaching the hundreds…

Hugging Face 發表全新的儲存優化方案，將傳統的檔案級儲存（如 Git LFS）轉型為「分塊儲存（Chunk-based Storage）」。透過內容定義分塊（CDC）與內容定址儲存（CAS）技術，Hub 能跨儲存庫進行資料去重。這對於微調（Fine-tune）與合併（Merge）模型的儲存能節省極大空間，並顯著加快上傳與下載速度。

The Hugging Face Hub currently hosts millions of AI models, datasets, and applications (Spaces), with total storage reaching the hundreds of petabytes. As the community of open-source model fine-tuning, quantization, and model merging (for models like Llama and Mistral) has flourished, the traditional file-based storage approach (such as Git LFS) has faced enormous challenges in terms of space and bandwidth.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source other huggingface #storage #deduplication #git-lfs #infrastructure #cas

Summaries are AI-generated; the original article is authoritative.