This article is the second installment of a Hugging Face series on accelerating PyTorch Transformer models on Intel's 4th-generation Xeon Scalable Processors…
"Document AI" is a key driver of enterprise digital transformation in recent years, aimed at automating the processing of unstructured documents such as…
As Transformer models become increasingly prevalent in natural language processing (NLP) and computer vision (CV), efficiently deploying these large models in…
This technical blog post from Hugging Face documents in detail the practical process of optimizing inference for BLOOM, the open-source multilingual large…
This article introduces the deep integration between Hugging Face and the bitsandbytes library, aimed at solving the enormous memory challenges posed by…
When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs under control has…
This case study focuses on the performance of "Hugging Face Infinity" — Hugging Face's high-performance inference container solution — on modern CPUs…
This blog post is the second part of a technical guide co-authored by Hugging Face and Intel, designed to show developers how to push the inference performance…
Hugging Face has officially launched a new open-source toolkit called "Optimum" — an optimization and hardware acceleration library designed specifically for…
In this technical blog post, the Hugging Face team reveals in detail how they achieved up to 100x speedup in inference for Transformer models for customers of…