Hugging Face BlogJul 18, 2024, 12:00 AMimportant 80

TGI Multi-LoRA：部署一次即可同時提供 30 個微調模型服務

Original: TGI Multi-LoRA: Deploy Once, Serve 30 Models

The Hugging Face official blog has introduced a major update to its open-source text generation inference engine, Text Generation Inference…

Hugging Face 的 Text Generation Inference (TGI) 推出 Multi-LoRA 服務功能。開發者只需在 GPU 上部署一個基礎模型（如 Llama 3），就能動態載入並同時運行多達 30 個不同的 LoRA 微調適配器（Adapters）。這項技術大幅降低了多模型部署的 GPU 顯存與硬體成本，並透過優化的批處理技術確保低延遲，是 LLMOps 領域的重大優化。

The Hugging Face official blog has introduced a major update to its open-source text generation inference engine, Text Generation Inference (TGI): the **Multi-LoRA serving feature**. This technology is designed to address the high GPU VRAM and hardware cost challenges that enterprises and developers face when deploying multiple fine-tuned models.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

llama mistral open-source tgi huggingface #lora #llmops #inference #gpu-optimization

Summaries are AI-generated; the original article is authoritative.