Hugging Face BlogNov 26, 2024, 12:00 AMimportant 75

重構 Hugging Face 的上傳與下載架構

Original: Rearchitecting Hugging Face Uploads and Downloads

Hugging Face, the world's largest open-source AI platform, currently hosts over 1.2 million models, datasets, and Space applications. With…

隨著託管的模型與數據集規模暴增,Hugging Face 傳統的 Git-LFS 架構面臨元數據處理緩慢和鎖定等瓶頸。為此,他們重構了上傳與下載架構,將文件存儲與 Git 解耦,改用自研的 HTTP 傳輸機制、Rust 編寫的 `hf-transfer` 工具,並優化 S3 直連與 CDN 緩存。這項變革大幅提升了數百 GB 級大模型與數百萬文件數據集的傳輸速度與穩定性。

Hugging Face, the world's largest open-source AI platform, currently hosts over 1.2 million models, datasets, and Space applications. With the explosion of generative AI, model sizes (such as Llama 3 70B and Mixtral, often hundreds of gigabytes in size) and dataset scales (containing millions of small files) have grown exponentially. This placed unprecedented performance bottlenecks on Hugging Face's early architecture, which was built on Git and Git-LFS (Large File Storage).

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.