Hugging Face BlogFeb 6, 2023, 12:00 AM

使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）

Original: Accelerating PyTorch Transformers with Intel Sapphire Rapids - part 2

This article is the second installment of a Hugging Face series on accelerating PyTorch Transformer models on Intel's 4th-generation Xeon…

本文為 Hugging Face 與 Intel 合作系列文章的第二部分，聚焦於推論加速。介紹如何透過 Intel 第四代 Xeon 可擴充處理器（Sapphire Rapids）內建的 Intel AMX 技術，並結合 Hugging Face Optimum Intel 與 IPEX 工具，實現 BF16 與 INT8 的混合精度推論。測試顯示，這能為 Transformer 模型帶來數倍的效能提升，且只需修改極少量的代碼。

This article is the second installment of a Hugging Face series on accelerating PyTorch Transformer models on Intel's 4th-generation Xeon Scalable Processors (codenamed Sapphire Rapids), with a focus on inference optimization.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

other pytorch huggingface #inference #cpu-acceleration #intel-amx #quantization #transformers

Summaries are AI-generated; the original article is authoritative.