Latest in AI

Showing:cpu-inferenceDevelopersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Offline CPU Voice Loop for Ollama and LM Studio Agents
r/LocalLLaMA top day3 days agoNew Tool
A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.
Rust-native CPU-only LFM2.5-8B-A1B inference library "bebelm" published as cargo crate
r/LocalLLaMA top day5 days agoNew Tool
Community developer maximecb has published bebelm, a Rust-native, GPU-free inference implementation of Liquid AI's LFM2.5-8B-A1B model, available on crates.io. Decode speed reaches ~37 tokens/s on a Ryzen 7950x with ~7GB memory footprint; prefill is unoptimized and currently similar in speed to decode. The library supports tool-use callbacks, weight sharing across multiple Agent instances with independent KV caches, and Agent cloning to skip repeated prefill on shared prompts.
Google Cloud C4 實例攜手 Intel 與 Hugging Face，為開源 GPT 模型降低 70% 的總體擁有成本 (TCO)★ 75
Hugging Face Blog241 days agoRelease
Google Cloud has announced a deep collaboration with Intel and Hugging Face on its new C4 instances to comprehensively optimize open-source GPT (Generative…
在 GCP 上的第五代 Intel Xeon 處理器（C4 執行個體）進行語言模型效能基準測試
Hugging Face Blog544 days agoCommentary
This technical blog post from Hugging Face provides a detailed benchmark of running large language models (LLMs) on Google Cloud Platform's (GCP) new C4…
介紹 AMD 第 5 代 EPYC™ 處理器：Hugging Face 攜手 AMD 釋放 CPU 的 AI 推論潛能★ 75
Hugging Face Blog612 days agoRelease
AMD has officially launched its 5th-generation EPYC processor, codenamed "Turin," and Hugging Face has promptly published a blog post detailing the deep…
使用 🤗 Optimum Intel 在 Xeon 處理器上實現極速 SetFit 推論
Hugging Face Blog802 days agoTutorial
SetFit (Sentence Transformer Fine-Tuning) is a few-shot text classification framework co-developed by Hugging Face, Intel Labs, and other organizations. Rather…
使用 🤗 Optimum Intel 在 Xeon 處理器上加速 StarCoder：Q8/Q4 量化與投機解碼
Hugging Face Blog866 days agoTutorial
This Hugging Face blog post explores in detail how to use the `Optimum Intel` library to accelerate inference for the StarCoder code-generation model on Intel…
使用 NNCF 與 🤗 Optimum 在 Intel CPU 上優化 Stable Diffusion
Hugging Face Blog1,116 days agoTutorial
In the current boom of generative AI, image generation models like Stable Diffusion have become widely popular thanks to their remarkable capabilities…
越小越好：Q8-Chat，在 Intel Xeon 處理器上實現高效的生成式 AI 體驗
Hugging Face Blog1,125 days agoRelease
This article introduces the latest outcome of a collaboration between Hugging Face and Intel: "Q8-Chat," a project designed to demonstrate how to efficiently…
在 Intel CPU 上加速 Stable Diffusion 推論
Hugging Face Blog1,174 days agoTutorial
This technical blog post from Hugging Face provides a detailed guide on optimizing and accelerating Stable Diffusion model inference on Intel CPUs…
案例研究：使用 Hugging Face Infinity 與現代 CPU 實現毫秒級延遲
Hugging Face Blog1,613 days agoNew Tool
This case study focuses on the performance of "Hugging Face Infinity" — Hugging Face's high-performance inference container solution — on modern CPUs…
在 CPU 上擴展 BERT 推論效能（第一部分）
Hugging Face Blog1,881 days agoTutorial
In many real-world enterprise production environments, although GPUs offer extremely high throughput for deep learning inference, CPUs remain indispensable due…