Latest in AI

Showing:GeneralOpen-sourceClear ×

🔥 Trending today

anthropic7 export-controls5 model-access3 ai-infrastructure3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Open-Source Desktop GUI Brings Claude Code CLI Workflows Into a Visual Interface
INSIDE 硬塞 AI2 days agoNew Tool
An open-source project has introduced a desktop GUI for Claude Code CLI, aiming to make terminal-based coding sessions easier to manage visually. Built with Tauri 2, the app adds multi-tab sessions, history, and visual configuration controls around the existing command-line experience. The project is positioned as a companion to Claude Code rather than a replacement for developers who prefer direct CLI use.
Offline CPU Voice Loop for Ollama and LM Studio Agents
r/LocalLLaMA top day3 days agoNew Tool
A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.
qwen3.6-27b Users Report Repeated Tool Call Loops
r/LocalLLaMA top day3 days agoIncident
A Reddit user on r/LocalLLaMA says qwen3.6-27b can fall into repeated tool-call loops during use. They report spending two days adjusting parameters such as temperature and top-k without resolving the issue. The post is a troubleshooting question rather than a confirmed bug report, asking whether other local model users have seen similar behavior.
Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues
r/LocalLLaMA top day3 days agoBenchmark
A LocalLLaMA user tried to benchmark Google’s new fully local dictation app, Eloquent, against open ASR models such as Qwen3-ASR and NVIDIA Parakeet V3. The tester reported that roughly half of dictations returned only fragments, even during manual use. When Eloquent produced complete transcripts, its word error rate was competitive, but the missing-output behavior made the app unreliable for evaluation and practical use.
LocalLLaMA User Weighs QAT Gemma 31B GGUF Quants for RTX 3060
r/LocalLLaMA top day3 days agoCommentary
A Reddit user with an RTX 3060 12GB and 32GB DDR3 RAM is evaluating new QAT-based Gemma 31B GGUF quantizations. They currently run an older Unsloth Gemma 31B IQ3_XXS build at long context, with some tensor and mmproj offloading to CPU. The post asks which Q2-Q3 quant to choose, whether QAT changes quality expectations, and whether MTP would help or hurt under tight VRAM limits.
πfs: the data-free filesystem that “stores” data in π
Hacker News (AI keywords)3 days agoNew Tool
πfs is an open-source FUSE-style filesystem built around a deliberately absurd idea: data does not need to be stored if it can be located in pi. It records metadata such as file names and positions in pi, then reconstructs content from those locations. The project is more technical humor and conceptual demonstration than practical storage or AI tooling.
Seeking the Best Open-Source Coding AI for an RTX 5070 PC
r/LocalLLaMA top day3 days agoOpinion
A Reddit user on r/LocalLLaMA is looking for the most powerful open-source AI coding model that can run on their Windows 11 desktop. Their system includes an AMD Ryzen 7 7700 CPU, RTX 5070 GPU, and 32GB of DDR5 RAM. The intended use cases are writing, coding, and debugging, but the post itself does not include benchmark results, candidate models, or community recommendations.
Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day4 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
Charting Local LLM Releases: 2025 Was the Peak, Not 2026
r/LocalLLaMA top day4 days agoCommentary
A r/LocalLLaMA community member shared visualizations tracking the volume of local LLM releases over time. Contrary to the perception that 2026 has been an unusually prolific year, the data indicates the actual release peak occurred in 2025. The poster attributes the misperception to the outsized quality improvements in 2026 making it feel more eventful than it quantitatively was.
Without Open Source LLMs, US AI Companies Could Have Monopolized the Technology
r/LocalLLaMA top day4 days agoOpinion
This r/LocalLLaMA post argues that open-source LLMs are an ethical duty because AI has broad social impact. The author worries that without open models, US AI companies could have monopolized access and potentially limited availability to US firms. They also frame China’s release of powerful open-source LLMs as a contribution to humanity, despite political disagreements.
Without open LLM competition, closed-source LLM companies will become insatiable
r/LocalLLaMA top day4 days agoOpinion
A r/LocalLLaMA user criticizes closed-source LLM providers, singling out Anthropic and its $200/month users. The post argues that without open-source model competition, proprietary AI companies could become more arrogant and less accountable to customers. The source offers little concrete context beyond an image and opinionated commentary, so it is best read as a community sentiment post rather than a verified product incident.
TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)
r/LocalLLaMA top day5 days agoBenchmark
Reddit user UkieTechie has revamped their TTS benchmark platform with objective scoring standards and live blind voting, now covering 46 speech synthesis models. Hosted on Hugging Face Space, the arena lets users vote on audio quality without knowing the model name, generating a dynamic ELO leaderboard. The project is open-source on GitHub and welcomes community submissions of new models.
Rick & Morty
r/LocalLLaMA top day5 days agoCommentary
This r/LocalLLaMA top-day post is a short image meme titled “Rick & Morty.” The only accompanying text says, “nobody expected HF there,” suggesting surprise at HF appearing in the image’s context. There are no technical claims, model details, releases, or benchmarks, so its value is mainly as a small signal of community culture around Hugging Face / HF and local LLM discussions.
How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces★ 72
Hugging Face Blog5 days agoTutorial
This Hugging Face blog post demonstrates how AI agents can use Spaces as modular tools. By chaining an image generation Space with a 3D rendering Space, an agent automatically generated art assets and placed them inside a virtual 3D gallery. This highlights the power of Hugging Face's ecosystem, where any Space can serve as an API for agentic workflows.
NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain
Hugging Face Blog5 days agoNew Tool
NeuroBait is a Hugging Face community project built to help with ADHD task-initiation freeze rather than diagnosis or to-do planning. It fine-tunes google/gemma-3-12b-it with LoRA to produce short, warm, context-aware nudges. The project uses Unsloth and Modal for training, then deploys on a Hugging Face Space with Gradio, transformers, peft, and a runtime LoRA adapter.
A 4B Edge-Deployable Cognitive Model Built in China
量子位 QbitAI5 days agoRelease
QbitAI’s headline says a domestic Chinese team has built a 4B-parameter “cognitive model” suitable for edge deployment. The framing links it to a model direction previously associated with Andrej Karpathy. Since the article body was not provided, details such as the model name, architecture, benchmark results, hardware requirements, open-source status, and licensing remain unverified.
Siri AI at WWDC 2026★ 72
Simon Willison's Weblog5 days agoCommentary
Simon Willison says Apple’s 2024 Apple Intelligence rollout made him cautious, so he will believe the WWDC 2026 Siri AI claims only after seeing results. He notes the new features look more feasible, especially with a custom Gemini-derived model running on Private Cloud Compute. He also highlights vision LLM screen understanding and the new Core AI library for running PyTorch-derived models on Apple hardware.
LocalLLaMA post tier list
r/LocalLLaMA top day5 days agoOpinion
The author proposes a tier list for r/LocalLLaMA posts in response to complaints about declining post quality. Top-tier posts include new local model releases with GGUF/MLX or benchmark data, meaningful optimizations, complete hardware performance reports, and well-analyzed research. Low-tier posts include repeated toy benchmarks, unrelated cloud AI chatter, AI-generated slop, and thinly disguised ads for Claude-wrapper startups.
When every other post is an AI benchmark, best-model question, or slop app
r/LocalLLaMA top day5 days agoCommentary
This r/LocalLLaMA post is a meme-like complaint about the subreddit’s recent content quality. The author points to repeated AI-generated benchmark reports, recurring “best model” questions, and hastily built apps or engines presented as groundbreaking. It is not a technical release or evidence-based analysis, but it reflects frustration with noise, hype, and low-effort AI-generated discussion in local model communities.
Show HN: Gitdot – a better GitHub, open-source, anti-AI, written in Rust
Hacker News (AI keywords)6 days agoNew Tool
Gitdot appeared on Hacker News as a Show HN project claiming to be “a better GitHub.” The title says it is open-source, written in Rust, and explicitly anti-AI. No article body was provided, so details about features, licensing, deployment, maturity, and how it differs from GitHub cannot be confirmed from the source.
Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem
Hugging Face Blog6 days agoNew Tool
Pakistan Notice Helper is a Build Small Hackathon project focused on suspicious notices in Pakistan, including bank, courier, tax, telecom, police, and government-style messages. It accepts text or screenshots, supports English and Urdu, and returns risk labels, red flags, explanations, and safer next steps. The author discusses choosing Qwen3.5 4B Q8 with llama.cpp, Modal, Gradio, and Hugging Face Spaces after balancing quality, cost, latency, cold starts, and safety constraints.
Thoughts on Gemma4 12B vs 26A4B: Which Is Better?
r/LocalLLaMA top day6 days agoOpinion
The post asks the LocalLLaMA community to compare Gemma4 12B and 26A4B, explicitly excluding the 31B model from discussion. The user is mainly interested in creative tasks, writing, and chatting, with coding treated as optional rather than central. No benchmarks or examples are provided, so the post is best read as a model-selection question about subjective quality and practical use.
Best Local TTS Solution
r/LocalLLaMA top day6 days agoCommentary
A r/LocalLLaMA user says they have tested many local TTS tools, but none match ElevenLabs for expressiveness, voices, and cloning. They list moss-nano and Kokoro as the best edge-device candidates so far, with edgeTTS as a free/cloud option. The post asks for community experience connecting agents such as Hermes, openclaw, or opencode to Telegram voice notes or real-time voice conversations.
User Shares Gemma 4 QAT Experience: Improved Quality and MTP Speedups
r/LocalLLaMA top day6 days agoOpinion
A Reddit user shared their experience with the Gemma 4 31B QAT (Quantization-Aware Training) model. Compared to traditional GGUF quants like Q6_K_L, the QAT version delivers noticeable quality improvements in roleplay and long-context tasks. Additionally, combining the QAT model with Multi-Token Prediction (MTP) yielded massive speedups, boosting generation speeds from ~20 t/s to up to 50 t/s.
"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day6 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM
r/LocalLLaMA top day6 days agoCommentary
A Reddit user highlighted a limitation in llama-server's router mode (`--models-preset`): child processes spawn and initialize CUDA contexts on all available GPUs, even when pinned to a single card. When other GPUs are fully utilized by a large model, launching a smaller model fails with a CUDA OOM error because it cannot allocate the context stub on the maxed-out cards. Currently, child processes inherit the base environment, preventing per-model `CUDA_VISIBLE_DEVICES` configuration.
Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?
r/LocalLLaMA top day6 days agoCommentary
A popular Reddit thread on r/LocalLLaMA discusses the potential of 2-bit Quantization Aware Training (QAT) for large MoE models (120B to 400B). While current QAT efforts focus on 4-bit, users speculate whether a 2-bit QAT model could fit into consumer hardware (64GB/128GB RAM) and outperform a 4-bit model of half its size. This approach is proposed as a practical alternative to training ternary (1.58-bit) LLMs from scratch.
NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization
r/LocalLLaMA top day6 days agoCommentary
Following the merge of native NVFP4 (NVIDIA FP4) support in llama.cpp, users are exploring how to leverage this format on Blackwell GPUs (such as the RTX 50-series). The discussion focuses on converting NVFP4 safetensors (like Gemma 4 QAT) to GGUF format and whether importance matrices (imatrix) are required. This enablement promises significant performance gains for local LLM execution on next-gen hardware.
GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78
r/LocalLLaMA top day7 days agoHardware
GMKtec has announced its EVO-X3 mini PC with upgraded I/O, including OCuLink and Wi-Fi 7. More importantly for local AI enthusiasts, the company teased a future model powered by AMD's flagship "Strix Halo" Ryzen AI MAX+ 495 APU. This upcoming monster will support up to 192GB of LPDDR5X memory, offering a highly anticipated, cost-effective alternative to Apple Silicon for running large local LLMs.
Clustering 3x Jetson Nano Orin Supers for Distributed AI
r/LocalLLaMA top day7 days agoTutorial
A developer has shared a practical guide on clustering three NVIDIA Jetson Nano Orin Super boards, leveraging their Ampere CUDA cores and unified memory. This project is part of 'smolcluster,' an initiative to make distributed AI training and inference accessible using everyday hardware like Macs, Raspberry Pis, and Jetsons. The series aims to explore whether heterogeneous clusters (mixing different hardware architectures) can effectively run local LLMs.

Page 1Next →

Latest in AI

Open-Source Desktop GUI Brings Claude Code CLI Workflows Into a Visual Interface

Offline CPU Voice Loop for Ollama and LM Studio Agents

qwen3.6-27b Users Report Repeated Tool Call Loops

Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues

LocalLLaMA User Weighs QAT Gemma 31B GGUF Quants for RTX 3060

πfs: the data-free filesystem that “stores” data in π

Seeking the Best Open-Source Coding AI for an RTX 5070 PC

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

Charting Local LLM Releases: 2025 Was the Peak, Not 2026

Without Open Source LLMs, US AI Companies Could Have Monopolized the Technology

Without open LLM competition, closed-source LLM companies will become insatiable

TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)

Rick & Morty

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces★ 72

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

A 4B Edge-Deployable Cognitive Model Built in China

Siri AI at WWDC 2026★ 72

LocalLLaMA post tier list

When every other post is an AI benchmark, best-model question, or slop app

Show HN: Gitdot – a better GitHub, open-source, anti-AI, written in Rust

Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem

Thoughts on Gemma4 12B vs 26A4B: Which Is Better?

Best Local TTS Solution

User Shares Gemma 4 QAT Experience: Improved Quality and MTP Speedups

"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts

llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM

Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?

NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization

GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78

Clustering 3x Jetson Nano Orin Supers for Distributed AI