Hugging Face TGI 宣布支援多後端引擎:整合 TensorRT-LLM 與 vLLM
Original: Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference
Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural…
Hugging Face 的 Text Generation Inference (TGI) 宣布支援多後端架構,正式整合 NVIDIA TensorRT-LLM 與 vLLM。這項更新讓開發者無需在 TGI 的生產級功能(如 Tokenizer、工具調用、安全防護)與其他引擎的極致效能之間做抉擇。現在,用戶可以透過簡單的設定,直接在 TGI 中調用 TRT-LLM 的硬體優化或 vLLM 的高吞吐量優勢。
Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural update, officially announcing support for a "multi-backend" architecture. The first wave of integrated backend engines includes NVIDIA's TensorRT-LLM and the widely adopted open-source inference engine vLLM.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Hugging Face Blog →Summaries are AI-generated; the original article is authoritative.