Hugging Face BlogMay 10, 2022, 12:00 AMimportant 75

使用 Optimum 與 Transformers Pipelines 加速模型推論

Original: Accelerated Inference with Optimum and Transformers Pipelines

When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs…

Hugging Face 介紹了如何將硬體優化工具包 Optimum 與受歡迎的 Transformers Pipelines 整合。開發者現在能直接載入 ONNX 格式模型並傳入 Pipeline 中，在 CPU 或 GPU 上實現顯著的延遲降低與吞吐量提升。這項更新免去了手動導出 ONNX 的繁瑣步驟，極大地簡化了生產環境的部署流程。

When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs under control has always been one of the greatest challenges developers face. To address this pain point, Hugging Face launched the Optimum project — an extension of the Transformers library designed to provide a toolkit specifically optimized for particular hardware (such as Intel, AMD, NVIDIA, AWS, and others).

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source huggingface #inference #onnx #quantization #optimization

Summaries are AI-generated; the original article is authoritative.