Hugging Face BlogJan 16, 2025, 12:00 AMimportant 85

Hugging Face TGI 宣布支援多後端引擎：整合 TensorRT-LLM 與 vLLM

Original: Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural…

Hugging Face 的 Text Generation Inference (TGI) 宣布支援多後端架構，正式整合 NVIDIA TensorRT-LLM 與 vLLM。這項更新讓開發者無需在 TGI 的生產級功能（如 Tokenizer、工具調用、安全防護）與其他引擎的極致效能之間做抉擇。現在，用戶可以透過簡單的設定，直接在 TGI 中調用 TRT-LLM 的硬體優化或 vLLM 的高吞吐量優勢。

Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural update, officially announcing support for a "multi-backend" architecture. The first wave of integrated backend engines includes NVIDIA's TensorRT-LLM and the widely adopted open-source inference engine vLLM.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source tgi vllm tensorrt-llm #inference #llmops #tensorrt-llm #vllm #tgi

Summaries are AI-generated; the original article is authoritative.