Latest in AI

Showing:DevelopersClear ×

🔥 Trending today

ai-policy2 anthropic2 open-source2 ipo1 ai-investment1 public-markets1 startup-ecosystem1 venture-capital1 national-security1 export-controls1

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day7 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
club-3090 Adds Experimental FP8 Support for Qwen3.6-27B
r/LocalLLaMA top day7 days agoNew Tool
The open-source project club-3090 has rolled out experimental FP8 quantization support for Qwen3.6-27B. This update is highly anticipated by dual RTX 3090 users, allowing them to run the model with significantly reduced VRAM requirements. According to reports, the official Qwen3.6-27B-FP8 model performs virtually identically to the original unquantized BF16 version.
How much do amd64 microarchitecture levels help in Go?
Hacker News (AI keywords)7 days agoBenchmark
Daniel Lemire tests Go’s GOAMD64 levels using Roaring Bitmaps on a modern Intel Xeon. v2 brings strong gains where popcnt matters, while v3 adds further speedups in dense bitmap and set-operation workloads through AVX2. v4, despite implying AVX-512 support, shows no meaningful improvement in these benchmarks, likely due to current Go compiler limitations.
llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM
r/LocalLLaMA top day7 days agoCommentary
A Reddit user highlighted a limitation in llama-server's router mode (`--models-preset`): child processes spawn and initialize CUDA contexts on all available GPUs, even when pinned to a single card. When other GPUs are fully utilized by a large model, launching a smaller model fails with a CUDA OOM error because it cannot allocate the context stub on the maxed-out cards. Currently, child processes inherit the base environment, preventing per-model `CUDA_VISIBLE_DEVICES` configuration.
Is this the dawn of the Tokenpocalypse?
TechCrunch AI7 days agoBusiness
TechCrunch discusses Microsoft’s GitHub Copilot pricing changes as a sign that subsidized AI usage may be ending. As Anthropic and other major AI companies prepare for public-market scrutiny, profitability and usage-cost risks will become harder to ignore. The piece argues that higher prices, usage caps, and broader business-model changes may be necessary if AI labs want to survive beyond investor-subsidized growth.
Qwen 3.6 27B DeepSWE Benchmark Results Highlight Gap Between Local and Closed-Source Models
r/LocalLLaMA top day7 days agoBenchmark
A community benchmark of Qwen 3.6 27B on DeepSWE yielded a score of 1.79% (18/20th place), slightly outperforming Haiku 4.5. Run on a single RTX 6000 Blackwell GPU via vLLM with reasoning enabled, the test averaged 32 minutes and 44k output tokens per task. The author notes that while Qwen 3.6 27B represents a 'poor man's local SOTA,' the massive gap compared to frontier closed models suggests local LLMs are struggling to keep pace in complex coding.
Amazing Digital Dentures (a failed project)
Hugging Face Blog7 days agoCommentary
The post appears to discuss a project called “Amazing Digital Dentures,” explicitly framed as a failed project. Because the article body was not provided, the specific technical stack, models, tools, datasets, and reasons for failure cannot be verified. Based on the title and URL path, it may be a hackathon-style project retrospective focused on prototyping challenges and lessons learned.
Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?
r/LocalLLaMA top day7 days agoCommentary
A popular Reddit thread on r/LocalLLaMA discusses the potential of 2-bit Quantization Aware Training (QAT) for large MoE models (120B to 400B). While current QAT efforts focus on 4-bit, users speculate whether a 2-bit QAT model could fit into consumer hardware (64GB/128GB RAM) and outperform a 4-bit model of half its size. This approach is proposed as a practical alternative to training ternary (1.58-bit) LLMs from scratch.
Mythograph Atelier #1 - Abstract Art That Means Something to You
Hugging Face Blog7 days agoCommentary
Only the title is available, so this summary is necessarily inferential. The post appears to be the first entry in a Mythograph Atelier series about abstract art that carries personal meaning. It may interest designers, creators, and AI art users exploring ways to turn memory, emotion, or symbolism into generative visual work.
MTP and QAT: What is the Relation? Running Gemma 4 31B in llama.cpp
r/LocalLLaMA top day7 days agoCommentary
A popular Reddit thread addresses user confusion over running Gemma 4 31B locally. It distinguishes between MTP (Multi-Token Prediction for inference speedup) and QAT (Quantization-Aware Training for preserving 4-bit quality). It also confirms that llama.cpp's new MTP support requires updated GGUF files and a secondary draft model file for acceleration.
If LLMs Have Human-Like Attributes, Then So Does Age of Empires II
Hacker News (AI keywords)7 days agoPaper
The paper argues that claims about LLMs having human-like attributes, such as morality or language understanding, can be methodologically fragile. By building and training a simple neural network on Age of Empires II, the author suggests such attributes may not be empirically unique to LLMs. The key recommendation is to define explicit measurement criteria and use a null assumption of LLM non-uniqueness before drawing anthropomorphic conclusions.
Building from Zero After Addiction, Prison, and a Felony
Hacker News (AI keywords)7 days agoCommentary
Gavin Ray recounts entering juvenile prison at 14, becoming a felon at 19, and losing stability to addiction. The essay follows his path back through software work, open source, Hasura, and people willing to judge him by future contribution rather than only past record. AI is not the focus; Claude Code is only mentioned as the tool used to generate the OpenGraph SVG image.
NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization
r/LocalLLaMA top day7 days agoCommentary
Following the merge of native NVFP4 (NVIDIA FP4) support in llama.cpp, users are exploring how to leverage this format on Blackwell GPUs (such as the RTX 50-series). The discussion focuses on converting NVFP4 safetensors (like Gemma 4 QAT) to GGUF format and whether importance matrices (imatrix) are required. This enablement promises significant performance gains for local LLM execution on next-gen hardware.
Notion restores access to Anthropic after service disruption
TechCrunch AI7 days agoIncident
Notion restored access to Anthropic following a service disruption that affected availability. The report notes that Notion’s head of product was surprised by how widely the update was reposted. The incident highlights how dependent AI-enabled products have become on upstream model providers and reliability planning.
Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version
r/LocalLLaMA top day7 days agoBenchmark
A LocalLLaMA user highlighted that the newly released QAT (Quantization-Aware Training) variant of Google's Gemma-4-26B-A4B model underperforms compared to its non-QAT predecessor. Testing via llama.cpp on a chessboard SVG generation task showed significant rendering errors in the QAT version. The non-QAT GGUF version, however, produced highly accurate results under identical settings.
Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas
Hacker News (AI keywords)7 days agoNew Tool
office-open-xml-viewer is an open-source browser viewer for Office Open XML documents, rendering DOCX, XLSX, and PPTX files to HTML Canvas. Its parsers are written in Rust and compiled to WebAssembly, while rendering uses the Canvas 2D API. The README also says the full codebase was implemented by Claude through iterative prompting, making it notable as an AI-assisted software development case.
Control 3D Avatars with Natural Language Using "Program as Weights" (programasweights)
r/LocalLLaMA top day7 days agoNew Tool
Developer Yuntian Deng introduced "programasweights," a framework that compiles plain-English descriptions into tiny, local action programs (loops, parallel tracks) to control 3D avatars. Instead of pre-defined buttons, users can command complex sequences like "wave while walking, then jump." The runtime code is open-source and runs entirely offline in the browser or via Python.
OpenAI is still working on that ‘super app’
TechCrunch AI7 days agoBusiness
OpenAI is reportedly preparing a revamped ChatGPT in the coming weeks, positioned as a “super app” with coding tools and AI agents. The strategy aims to improve competitiveness with Anthropic, especially for business users, while moving OpenAI closer to profitability before an IPO. TechCrunch frames this as a continued shift away from standalone “side quests” and toward ChatGPT as the central product gateway.
GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78
r/LocalLLaMA top day7 days agoHardware
GMKtec has announced its EVO-X3 mini PC with upgraded I/O, including OCuLink and Wi-Fi 7. More importantly for local AI enthusiasts, the company teased a future model powered by AMD's flagship "Strix Halo" Ryzen AI MAX+ 495 APU. This upcoming monster will support up to 192GB of LPDDR5X memory, offering a highly anticipated, cost-effective alternative to Apple Silicon for running large local LLMs.
End of an Era for Budget LLM Rigs: User's X99 Motherboard Dies
r/LocalLLaMA top day7 days agoHardware
A popular Reddit post on r/LocalLLaMA highlights a user's X99 motherboard finally dying. The Intel X99 platform, paired with cheap recycled Xeon CPUs, has long been a legendary budget choice for running local LLMs with multiple GPUs. The post triggered a wave of nostalgic "F" comments, marking the gradual end of these classic DIY budget rigs.
Show HN: GentleOS – A Pair of Hobby OSes for Vintage 32-bit and 16-bit PCs
Hacker News (AI keywords)7 days agoNew Tool
GentleOS is an open-source hobby project by a solo developer, consisting of two minimal operating systems targeting vintage 32-bit and 16-bit x86 PC hardware. Posted as a Show HN submission, the project is purely a retro computing and systems programming exercise with no AI or ML components. This article is not AI-related and holds minimal relevance for an AI-focused audience.
Qwen3.6 35B-A3B on a Laptop: A Local LLM "Zero to One" Milestone
r/LocalLLaMA top day7 days agoOpinion
A Reddit user detailed running Qwen3.6 35B-A3B (IQ3_XXS quantization) on an ASUS Zenbook Pro 14 (RTX 4060 8GB VRAM, 64GB RAM). Using llama.cpp, they achieved 27 TPS at 32k context and 18 TPS at 256k context. This setup serves as a highly capable, fully private local agent for file operations, CLI execution, and brainstorming, bypassing cloud privacy concerns.
Managing Multiple MCP Servers: How to Prevent Context Pollution and Token Waste
r/LocalLLaMA top day7 days agoCommentary
A popular Reddit thread on r/LocalLLaMA addresses the challenge of loading multiple Model Context Protocol (MCP) servers at startup, which floods the context window with tool definitions. Users are discussing potential solutions, including using MCP proxies/hubs to route requests through a single endpoint or implementing lazy-loading. This highlights a growing need for better orchestration tools as the local MCP ecosystem expands.
start-llama: A Handy CLI Launcher for llama-server with Easy Customization
r/LocalLLaMA top day8 days agoNew Tool
A developer has released 'start-llama', a command-line utility designed to simplify launching llama-server (llama.cpp). It allows users to manage sensible default configurations, support multiple server binaries, and apply per-model or command-line overrides. This tool streamlines local LLM deployment into a single, easily configurable step.
sqlite: A CGo-free port of SQLite/SQLite3
Hacker News (AI keywords)8 days agoRelease
This project provides a CGo-free SQLite/SQLite3 implementation for Go, useful when developers want pure-Go builds and simpler cross-platform deployment. It keeps the familiar SQLite embedded database model while integrating with Go’s database/sql workflow. Recent releases upgraded SQLite, improved text/time scanning performance, added backup progress helpers, and expanded virtual table and sqlite-vec related support.
Reddit Discusses: What is Your Most Unusual Non-LLM AI Tool for Daily Use?
r/LocalLLaMA top day8 days agoCommentary
A popular thread on Reddit's r/LocalLLaMA asks users to share their most unusual or underrated non-LLM AI tools used in daily workflows. While LLMs dominate the spotlight, many developers and power users emphasize that single-purpose models—such as Whisper for transcription, Demucs for audio separation, and Segment Anything (SAM) for vision—offer superior efficiency and lower costs. The discussion highlights a growing trend toward practical, lightweight, and local AI solutions for specific tasks.
Anthropic, please ship an official Claude Desktop for Linux
Hacker News (AI keywords)8 days agoOpinion
The available source only provides the title, which asks Anthropic to ship an official Claude Desktop app for Linux. It appears to be a community feature request rather than a confirmed product announcement. Without the issue body or official response, there is no basis to infer Anthropic’s plans, timeline, or technical reasoning.
Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them
Hacker News (AI keywords)8 days agoBusiness
The author uses a Claude Code coding experiment to estimate the API-equivalent cost of serious LLM coding. They argue simple chats are cheap, but complex reasoning and multi-file coding can burn large amounts of visible and hidden tokens. The piece is skeptical and estimate-driven, concluding that current $100/month plans may be heavily subsidized and economically fragile.
llama.cpp Gemma4 MTP Support Merged
r/LocalLLaMA top day8 days agoRelease
llama.cpp PR #23398 was merged on June 7, 2026, adding MTP support for Gemma4 models. The author reports over 2x average speedup on dense models, no observed speedup on MoE, and replicated AIME-26 results around 87%. Support currently covers 31B and 26B-4B variants, while E4B and E2B are not supported yet; multi-GPU may need extra draft-device configuration.
LLMs are eroding my software engineering career and I don't know what to do
Hacker News (AI keywords)8 days agoOpinion
The author argues that LLMs are eroding three pillars of his software engineering career: domain knowledge, debugging skill, and architecture judgment. Tools like ChatGPT, Claude, Claude Code, Codex, MCP, Sentry MCP, and DataDog MCP increasingly handle design, implementation, and difficult production bugs. The essay frames this as a labor-market concern, not just a tooling debate: if expertise becomes promptable, engineers may struggle to remain differentiated.

← PreviousPage 16Next →

Latest in AI

"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts

club-3090 Adds Experimental FP8 Support for Qwen3.6-27B

How much do amd64 microarchitecture levels help in Go?

llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM

Is this the dawn of the Tokenpocalypse?

Qwen 3.6 27B DeepSWE Benchmark Results Highlight Gap Between Local and Closed-Source Models

Amazing Digital Dentures (a failed project)

Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?

Mythograph Atelier #1 - Abstract Art That Means Something to You

MTP and QAT: What is the Relation? Running Gemma 4 31B in llama.cpp

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

Building from Zero After Addiction, Prison, and a Felony

NVFP4 Support Merged in llama.cpp: How to Use 4-bit Blackwell Quantization

Notion restores access to Anthropic after service disruption

Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version

Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas

Control 3D Avatars with Natural Language Using "Program as Weights" (programasweights)

OpenAI is still working on that ‘super app’

GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78

End of an Era for Budget LLM Rigs: User's X99 Motherboard Dies

Show HN: GentleOS – A Pair of Hobby OSes for Vintage 32-bit and 16-bit PCs

Qwen3.6 35B-A3B on a Laptop: A Local LLM "Zero to One" Milestone

Managing Multiple MCP Servers: How to Prevent Context Pollution and Token Waste

start-llama: A Handy CLI Launcher for llama-server with Easy Customization

sqlite: A CGo-free port of SQLite/SQLite3

Reddit Discusses: What is Your Most Unusual Non-LLM AI Tool for Daily Use?

Anthropic, please ship an official Claude Desktop for Linux

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

llama.cpp Gemma4 MTP Support Merged

LLMs are eroding my software engineering career and I don't know what to do