A developer on Reddit shared a Dockerized implementation of Nemotron 3.5 ASR, migrating from Parakeet. The system supports over 40 languages and features a native streaming architecture that avoids full-file buffering. Using the onnxruntime-genai backend, it achieves 4.5x real-time speed on CPU, with CUDA support planned but untested.
Hugging Face officially published Transformers.js v4 on NPM, marking a major milestone for running local AI models within the JavaScript ecosystem…
Hugging Face has officially launched Transformers.js v3, the most significant update to this web-based machine learning library since its release…
During Microsoft Build 2024, Hugging Face announced a further strategic collaboration with Microsoft, aimed at providing developers with a more seamless…
SD Turbo and SDXL Turbo are single-step/few-step text-to-image models from Stability AI, with their core innovation being Adversarial Diffusion Distillation…
Hugging Face officially announced a deep collaboration with Microsoft to integrate ONNX Runtime (ORT) into the Hugging Face ecosystem. This partnership enables…
This official Hugging Face blog post explores in depth how to use the Transformers.js library to run machine learning (ML) models directly in the browser…
As the scale of deep learning models (such as Transformers) continues to grow, training these models demands enormous computational resources and time. To help…
"Document AI" is a key driver of enterprise digital transformation in recent years, aimed at automating the processing of unstructured documents such as…
When deploying Transformer models in production, latency and throughput are typically the key factors determining the quality of the user experience. ONNX…
When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs under control has…
This case study focuses on the performance of "Hugging Face Infinity" — Hugging Face's high-performance inference container solution — on modern CPUs…
In many real-world enterprise production environments, although GPUs offer extremely high throughput for deep learning inference, CPUs remain indispensable due…
In this technical blog post, the Hugging Face team reveals in detail how they achieved up to 100x speedup in inference for Transformer models for customers of…