Hugging Face BlogMar 28, 2025, 12:00 AMimportant 75

在 Intel Gaudi 上使用 TGI 加速大型語言模型(LLM)推理

Original: 🚀 Accelerating LLM Inference with TGI on Intel Gaudi

Hugging Face's official blog has announced that its widely adopted open-source large model inference framework, Text Generation Inference…

Hugging Face 宣布其文字生成推理(TGI)框架現已整合 Intel Gaudi 加速器後端。這項合作讓開發者能直接在 Intel Gaudi 2 和 Gaudi 3 晶片上部署高效能 LLM,並享有連續批處理(Continuous Batching)與張量並行(Tensor Parallelism)等優化技術。此舉為企業在 NVIDIA 之外,提供了一個極具成本效益且易於部署的 AI 推理硬體新選擇。

Hugging Face's official blog has announced that its widely adopted open-source large model inference framework, Text Generation Inference (TGI), now officially supports Intel Gaudi AI accelerators — including Gaudi 2 and the latest Gaudi 3 — as an official backend. This integration aims to provide developers and enterprises with a powerful, high-performance, and cost-effective alternative to NVIDIA GPUs, addressing the current AI chip market's supply-demand tensions and high costs.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.