Latest in AI

Showing:local-aiResearchersClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Offline CPU Voice Loop for Ollama and LM Studio Agents
r/LocalLLaMA top day3 days agoNew Tool
A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.
AMD Highlights Unified Memory Architecture for Future AI Systems
r/LocalLLaMA top day3 days agoHardware
A Reddit post in r/LocalLLaMA links to coverage of AMD discussing unified memory architecture and its role in future product roadmaps. The post says AMD believes UMA could help shape next-generation architectures and notes Ryzen AI MAX 400 series systems, also referred to by the community as Gorgon Halo. It frames the topic as part of an ongoing LocalLLaMA discussion about whether unified-memory x86 systems could matter for local AI workloads.
Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day4 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
NVIDIA Blog4 days agoRelease
Google DeepMind released DiffusionGemma, an experimental open model built for fast text generation. NVIDIA says it optimized the model for GeForce RTX GPUs, RTX PRO platforms, and DGX Spark systems. Instead of generating text one word at a time, DiffusionGemma produces multiple words in parallel to reduce latency for single-user workloads.
TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)
r/LocalLLaMA top day5 days agoBenchmark
Reddit user UkieTechie has revamped their TTS benchmark platform with objective scoring standards and live blind voting, now covering 46 speech synthesis models. Hosted on Hugging Face Space, the arena lets users vote on audio quality without knowing the model name, generating a dynamic ELO leaderboard. The project is open-source on GitHub and welcomes community submissions of new models.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day5 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
LocalLLaMA post tier list
r/LocalLLaMA top day6 days agoOpinion
The author proposes a tier list for r/LocalLLaMA posts in response to complaints about declining post quality. Top-tier posts include new local model releases with GGUF/MLX or benchmark data, meaningful optimizations, complete hardware performance reports, and well-analyzed research. Low-tier posts include repeated toy benchmarks, unrelated cloud AI chatter, AI-generated slop, and thinly disguised ads for Claude-wrapper startups.
mtmd adds video input support in llama.cpp★ 72
r/LocalLLaMA top day6 days agoRelease
ggml-org/llama.cpp merged PR #24269, adding video input support to mtmd through mtmd-cli and /chat/completions, which also enables the web UI path. The implementation invokes a locally installed ffmpeg subprocess instead of bundling codec support, and currently extracts visual frames only, with no audio support yet. It was tested with Qwen3-VL-2B in CLI and Gemma 4 E4B in web UI, making local multimodal video experiments more accessible.
NVIDIA, KRAFTON, NC and T1 Celebrate RTX Spark at Korea’s PC Bangs
NVIDIA Blog7 days agoHardware
After unveiling RTX Spark at GTC Taipei during COMPUTEX, NVIDIA brought the platform to South Korea’s gaming community. Jensen Huang visited T1 Base Camp and PC bangs in Seoul to show how RTX Spark targets local AI, creation and high-performance gaming on slim Windows laptops and compact desktops. Demos included League of Legends, VALORANT, PUBG, Subnautica 2, CINDER CITY, AION 2 and an unreleased NVIDIA ACE-powered PUBG Ally character.
Google's Gemma 4 12B is designed to run on 16GB RAM laptops
Ars Technica AI10 days agoRelease
Google introduced Gemma 4 12B, an open model aimed at running locally on laptops with 16GB of RAM. The model uses a new encoding scheme and token prediction to improve efficiency relative to its size. Its practical importance depends on real-world benchmarks, but it could lower the barrier for private, offline, and local multimodal AI workflows.
Microsoft Build 2026 Brings Agent Development Tools to Local Workflows★ 72
INSIDE 硬塞 AI11 days agoNew Tool
At Build 2026, Microsoft announced a set of agent development tools including the GitHub Copilot desktop app, Project Rayfin backend automation, Windows terminal and container updates, and Surface RTX Spark Dev Box. The releases point to an end-to-end workflow for building and running AI agents locally. The focus is platform integration rather than a single model breakthrough.
Holo3.1: Fast & Local Computer Use Agents
Hugging Face Blog12 days agoRelease
Hugging Face Blog published a post titled “Holo3.1: Fast & Local Computer Use Agents.” From the title alone, Holo3.1 focuses on computer-use agents with speed and local execution as its stated themes. The source text was not provided, so architecture, supported platforms, benchmarks, licensing, hardware requirements, and availability cannot be confirmed.
GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95
Hugging Face Blog114 days agoBusiness
A historic milestone has arrived in the open-source AI world: GGML and llama.cpp — the open-source projects founded by Georgi Gerganov that laid the…
Smol2Operator：用於電腦操作（Computer Use）的輕量級 GUI 代理後訓練指南與模型★ 80
Hugging Face Blog264 days agoRelease
### Background and Challenge: The Rise of Local "Computer Use" With Anthropic's introduction of Computer Use and the development of various OS-level agents…
Hugging Face 推出 SmolLM：超輕量且強大的本地端小模型家族 (135M、360M 與 1.7B)★ 82
Hugging Face Blog698 days agoRelease
Hugging Face has officially launched a new family of ultra-lightweight language models called "SmolLM." As generative AI continues to evolve, while large…
在 Mac 上使用 Latent Consistency Model (LCM) 實現一秒快速生成圖片教學
Replicate Blog963 days agoTutorial
This technical guide from Replicate provides detailed instructions on how to locally deploy and run Latent Consistency Models (LCMs) on Macs equipped with…
在 M1 Mac 的 GPU 上本地運行 Stable Diffusion
Replicate Blog1,383 days agoTutorial
With the open-sourcing of Stable Diffusion, running powerful AI image generation models locally has become a real possibility. This guide published by…

Latest in AI

Offline CPU Voice Loop for Ollama and LM Studio Agents

AMD Highlights Unified Memory Architecture for Future AI Systems

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

LocalLLaMA post tier list

mtmd adds video input support in llama.cpp★ 72

NVIDIA, KRAFTON, NC and T1 Celebrate RTX Spark at Korea’s PC Bangs

Google's Gemma 4 12B is designed to run on 16GB RAM laptops

Microsoft Build 2026 Brings Agent Development Tools to Local Workflows★ 72

Holo3.1: Fast & Local Computer Use Agents

GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95

Smol2Operator：用於電腦操作（Computer Use）的輕量級 GUI 代理後訓練指南與模型★ 80

Hugging Face 推出 SmolLM：超輕量且強大的本地端小模型家族 (135M、360M 與 1.7B)★ 82

在 Mac 上使用 Latent Consistency Model (LCM) 實現一秒快速生成圖片教學

在 M1 Mac 的 GPU 上本地運行 Stable Diffusion