Latest in AI

Showing:Open-sourceClear ×

🔥 Trending today

anthropic7 export-controls5 model-access3 ai-infrastructure3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance
r/LocalLLaMA top day4 days agoIncident
The creator of OpenLumara posted a public challenge asking r/LocalLLaMA users to try breaking into a Discord-hosted instance of the local-model agent. They claimed common prompt-engineering attacks would not work because modules and sandboxes were heavily locked down. The post later listed several successful findings, including missing path traversal protection, an authorization-check bypass, and another undisclosed exploit pending a fix.
Reddit Debate: Apple and Microsoft Push Local-First AI
r/LocalLLaMA top day4 days agoOpinion
A Reddit user claims Apple and Microsoft have both made strong moves toward local-first AI, pointing to Apple Core AI materials and Microsoft Surface Laptop Ultra announcements. The post argues that Apple’s emphasis on local, private, no-cost AI and Microsoft’s Surface/Nvidia direction could reshape expectations for consumer hardware. However, it is an opinion-driven market prediction, not a confirmed financial or technical analysis.
Qwen3.6-MTP-27B on Tesla V100: llama.cpp Throughput Tuning Question
r/LocalLLaMA top day4 days agoBenchmark
A Reddit user is running Qwen3.6-MTP-27B-MTP in Q4_K_M GGUF format with llama.cpp server on a 32GB Tesla V100. They report one peak of 55 tokens per second, but typical throughput is closer to 44-48 TPS. The post asks whether flags such as parallelism, speculative MTP draft settings, KV cache quantization, flash attention, and a 262K context window are limiting performance without improving output quality.
How Useful Is qwopus Compared With Qwen3.6 27B for Coding?
r/LocalLLaMA top day4 days agoOpinion
A Reddit user on r/LocalLLaMA asks for practical comparisons between qwopus and Qwen3.6 27B, specifically for coding work. They note conflicting community opinions, with some users calling qwopus worse and others saying it is much better. In their own simple tests, they did not notice clear differences and want feedback from people using these models for agentic coding.
Charting Local LLM Releases: 2025 Was the Peak, Not 2026
r/LocalLLaMA top day4 days agoCommentary
A r/LocalLLaMA community member shared visualizations tracking the volume of local LLM releases over time. Contrary to the perception that 2026 has been an unusually prolific year, the data indicates the actual release peak occurred in 2025. The poster attributes the misperception to the outsized quality improvements in 2026 making it feel more eventful than it quantitatively was.
Gemma 4 12B Unified Audio Loses Speech Attention with Large System Prompts
r/LocalLLaMA top day4 days agoCommentary
A developer building a single-pass voice assistant with Gemma 4 12B unified (encoder-free audio/vision/text model) finds that audio attention collapses once the system prompt grows to ~21k tokens. The model then ignores or hallucinates instead of responding to the spoken input. The issue reproduces identically on vLLM, llama.cpp, and LiteRT-LM, pointing to an architectural attention-saturation limit rather than a stack-specific bug.
Without Open Source LLMs, US AI Companies Could Have Monopolized the Technology
r/LocalLLaMA top day4 days agoOpinion
This r/LocalLLaMA post argues that open-source LLMs are an ethical duty because AI has broad social impact. The author worries that without open models, US AI companies could have monopolized access and potentially limited availability to US firms. They also frame China’s release of powerful open-source LLMs as a contribution to humanity, despite political disagreements.
Anthropic Is Accused of Nerfing Fable for Other LLM Development
r/LocalLLaMA top day4 days agoCommentary
A r/LocalLLaMA post claims Anthropic may be intentionally limiting Fable when users ask it to help build other LLMs. The source is a short Reddit post with screenshot context, not a formal benchmark or verified disclosure. Discussion centers on trust in hosted closed models, unclear safety boundaries, and why local or open-weight LLMs may be necessary for serious AI development work.
Unsloth releases GGUF version of Cohere North-Mini-Code 1.0 (30B A3B MoE) on Hugging Face
r/LocalLLaMA top day4 days agoRelease
Unsloth uploaded a GGUF version of Cohere's North-Mini-Code 1.0 to Hugging Face, making local inference possible for this 30B A3B MoE coding-focused model. The poster links the release to llama.cpp PR #24260, suggesting new architecture support may be required. No benchmarks or test results have been shared yet; this is an early community resource post.
Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84
Latent Space4 days agoRelease
Anthropic released Claude Fable 5 as its first broadly available Mythos-class model, alongside restricted Mythos 5 access. Benchmarks and ecosystem reports show strong gains in coding, long-horizon agentic tasks, research, and vision. The controversy centers on 30-day retention for Mythos-class traffic and silent interventions that may reduce effectiveness on frontier LLM development tasks, raising trust, reproducibility, and open AI concerns.
Without open LLM competition, closed-source LLM companies will become insatiable
r/LocalLLaMA top day4 days agoOpinion
A r/LocalLLaMA user criticizes closed-source LLM providers, singling out Anthropic and its $200/month users. The post argues that without open-source model competition, proprietary AI companies could become more arrogant and less accountable to customers. The source offers little concrete context beyond an image and opinionated commentary, so it is best read as a community sentiment post rather than a verified product incident.
Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) Optimized for Agentic Verification + AgentHarness Evals
r/LocalLLaMA top day4 days agoRelease
Apodex 1.0 launches with open-weight models at 0.8B, 2B, and 4B, trained not for general generation but for specialized sub-agent roles—fact-checking external claims and verifying tool call outputs before passing results to a main controller. The design targets long-horizon agent workflows where routing small tasks to lightweight models avoids wasteful use of 70B+ models at every step. AgentHarness, an open-source evaluation framework for local multi-step agent pipelines, is released alongside the weights.
Furiosa AI inference chip could be a game changer for local LLMs
r/LocalLLaMA top day4 days agoHardware
A r/LocalLLaMA post discusses Furiosa AI’s RNGD inference chip, citing TSMC 5nm, Hynix HBM3, 48GB VRAM, 1.5TB/s bandwidth, and 180W TDP. The author argues it could matter for local LLM users if Furiosa opens its programming interface and works with llama.cpp on a GGML backend. The post later clarifies Furiosa is not selling to consumers; this is a wish and market commentary, not a launch.
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
Hugging Face Blog4 days agoBenchmark
Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
r/LocalLLaMA top day4 days agoPaper
OSCAR applies offline-precomputed rotation matrices—derived from spectral covariance analysis—to reshape KV tensor distributions before 2-bit quantization, suppressing outliers and reducing rounding error. The rotation adds negligible inference overhead since it requires no runtime learning. GGUF downloads for Gemma-4-12B-it, Qwen3-32B, and Qwen3-4B-Thinking are available, with llama.cpp and sglang integrations and an arXiv paper.
SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations
r/LocalLLaMA top day4 days agoRelease
SCAIL-2 by zai-org removes the reliance on skeleton maps and inpainting masks common in prior character animation pipelines, driving characters directly from video in an end-to-end manner. Trained on 60K synthesized motion pairs using SCAIL-Preview, Wan-Animate, and MoCha via a Unified Motion Transfer Interface with RoPE design, the model develops emergent abilities beyond its teacher models. Capabilities include cross-identity character replacement, animal-driving scenarios, and zero-shot support for SAM3D-Body mesh rendering.
Releasing Cohere North Mini Code
r/LocalLLaMA top day4 days agoRelease
Cohere’s Jay Alammar announced the official release of North Mini Code after early community feedback from r/LocalLLaMA. Weights are available on Hugging Face, including an fp8 version, and the model can be tried for free through OpenCode. For vLLM deployment, Cohere recommends using vLLM main for now and installing cohere_melody for accurate response parsing, while noting community requests for quantization and llama.cpp support.
Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G
r/LocalLLaMA top day5 days agoBenchmark
A public HuggingFace Spaces dashboard hosts a live competition where AI agents race to optimize Gemma 4 E4B inference throughput on a single NVIDIA A10G GPU. The challenge gamifies ML inference engineering, letting anyone watch agents explore quantization and scheduling strategies in real time. Optimization recipes surfaced by the competition offer practical value for developers targeting single-GPU self-hosted Gemma 4 deployments.
Cohere North Mini Code 1.0
r/LocalLLaMA top day5 days agoRelease
CohereLabs’ North Mini Code 1.0 appears to have moved from early access to final release, with weights available on Hugging Face. The Reddit post describes it as a 30B A3B coding model. Its Artificial Analysis overall score of 28 trails Qwen 3.6 35B at 43, but its coding index score of 33 is close to Qwen’s 35 and above Gemma 4 26B’s 22.
Unsloth Gemma 4 QAT MTP assistant models now available
r/LocalLLaMA top day5 days agoRelease
A r/LocalLLaMA post notes that Unsloth’s Gemma 4 QAT MTP assistant models are now available in GGUF format. The root directories include q8_0 files named mtp-gemma-4-*.gguf, while MTP folders contain q8_0 and larger quantized variants. The listed releases cover 12B, 26B-A4B, 31B, E2B, E2B mobile, E4B, and E4B mobile it-qat-GGUF repositories.
TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)
r/LocalLLaMA top day5 days agoBenchmark
Reddit user UkieTechie has revamped their TTS benchmark platform with objective scoring standards and live blind voting, now covering 46 speech synthesis models. Hosted on Hugging Face Space, the arena lets users vote on audio quality without knowing the model name, generating a dynamic ELO leaderboard. The project is open-source on GitHub and welcomes community submissions of new models.
Single-slot half-height PCIe V100 with NVLink appears in China
r/LocalLLaMA top day5 days agoHardware
A r/LocalLLaMA post says a Bilibili creator has shown a single-slot, half-height PCIe V100 with NVLink on a custom PCB. The card is described as 16 cm long, passively cooled by default, capped at 75W, with another version supporting up to 300W. The 16GB model is expected around or below ¥1500, with a 32GB version reportedly planned, but it is not yet available for purchase.
Rick & Morty
r/LocalLLaMA top day5 days agoCommentary
This r/LocalLLaMA top-day post is a short image meme titled “Rick & Morty.” The only accompanying text says, “nobody expected HF there,” suggesting surprise at HF appearing in the image’s context. There are no technical claims, model details, releases, or benchmarks, so its value is mainly as a small signal of community culture around Hugging Face / HF and local LLM discussions.
Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85
Google DeepMind Blog5 days agoRelease
Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.
Apple Announced a New On-Device Inference Engine for Apple Silicon
r/LocalLLaMA top day5 days agoRelease
Apple announced CoreAI at WWDC, which the post frames as a possible future replacement for CoreML and an alternative to MLX, llama.cpp, and torch for optimized on-device inference. Models still need conversion through Python scripts, and current supported models appear mostly from mid-2025. No performance data is available yet; the author expects it may trail MLX on GPU, but Apple’s 20B on-device foundation model claim suggests larger app-bundled models could become possible.
Jetson Orin NX Build for Hermes Agent + Benchmarking
r/LocalLLaMA top day5 days agoHardware
The post describes turning an unused Jetson Orin NX into a compact local LLM server for Hermes Agent testing. The goals were low noise, over 10 tok/s generation, 300 tok/s prompt processing, at least 65K context, and a custom case. After testing Gemma 4, Qwen 3.6, and many quant variants, the author reports Gemma 4 26B A4B UD Q2_K_XL reaching 66K context and 10.21 tok/s near 60K context.
How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces★ 72
Hugging Face Blog5 days agoTutorial
This Hugging Face blog post demonstrates how AI agents can use Spaces as modular tools. By chaining an image generation Space with a 3D rendering Space, an agent automatically generated art assets and placed them inside a virtual 3D gallery. This highlights the power of Hugging Face's ecosystem, where any Space can serve as an API for agentic workflows.
TinySearch v0.2.0: Lightweight Open Web-Search Tool for Local LLMs Now Defaults to SearXNG
r/LocalLLaMA top day5 days agoRelease
TinySearch is a lightweight open-source MCP/FastAPI tool that crawls, chunks, and reranks web results into an 8k-token context blob for small local LLMs. Version 0.2.0 replaces DuckDuckGo with SearXNG as the default backend after DDG began rate-limiting and CAPTCHAing automated requests. Users can point it at a self-hosted SearXNG instance; it integrates with Cline, Roo, and OpenCode agent setups.
NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain
Hugging Face Blog5 days agoNew Tool
NeuroBait is a Hugging Face community project built to help with ADHD task-initiation freeze rather than diagnosis or to-do planning. It fine-tunes google/gemma-3-12b-it with LoRA to produce short, warm, context-aware nudges. The project uses Unsloth and Modal for training, then deploys on a Hugging Face Space with Gradio, transformers, peft, and a runtime LoRA adapter.
ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing★ 74
量子位 QbitAI5 days agoRelease
ByteDance’s commercial technology team has open-sourced Bernini, a unified framework for AI video generation and editing. Its design separates semantic planning from visual rendering: an MLLM-based planner understands text, source videos, images, and video references, then a DiT-based renderer produces the final video. The released Bernini-R includes inference code and weights, while the full planner-enabled version is still being prepared.

← PreviousPage 2Next →

Latest in AI

OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance

Reddit Debate: Apple and Microsoft Push Local-First AI

Qwen3.6-MTP-27B on Tesla V100: llama.cpp Throughput Tuning Question

How Useful Is qwopus Compared With Qwen3.6 27B for Coding?

Charting Local LLM Releases: 2025 Was the Peak, Not 2026

Gemma 4 12B Unified Audio Loses Speech Attention with Large System Prompts

Without Open Source LLMs, US AI Companies Could Have Monopolized the Technology

Anthropic Is Accused of Nerfing Fable for Other LLM Development

Unsloth releases GGUF version of Cohere North-Mini-Code 1.0 (30B A3B MoE) on Hugging Face

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84

Without open LLM competition, closed-source LLM companies will become insatiable

Releasing Apodex-1.0 Smol Models (0.8B, 2B, 4B Open-Weights) Optimized for Agentic Verification + AgentHarness Evals

Furiosa AI inference chip could be a game changer for local LLMs

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations

Releasing Cohere North Mini Code

Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G

Cohere North Mini Code 1.0

Unsloth Gemma 4 QAT MTP assistant models now available

TTS Benchmark Revamped with Objective Standards and Blind ELO Voting (46 Models)

Single-slot half-height PCIe V100 with NVLink appears in China

Rick & Morty

Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model★ 85

Apple Announced a New On-Device Inference Engine for Apple Silicon

Jetson Orin NX Build for Hermes Agent + Benchmarking

How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces★ 72

TinySearch v0.2.0: Lightweight Open Web-Search Tool for Local LLMs Now Defaults to SearXNG

NeuroBait: I fine-tuned a model to spark dopamine for ADHD brain

ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing★ 74