Latest in AI

Showing:DevelopersOtherClear ×

🔥 Trending today

open-source3 anthropic3 amazon3 ai-regulation2 government-policy2 export-controls2 geopolitics2 privacy2 python-packaging2 webassembly2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

MTP and QAT: What is the Relation? Running Gemma 4 31B in llama.cpp
r/LocalLLaMA top day7 days agoCommentary
A popular Reddit thread addresses user confusion over running Gemma 4 31B locally. It distinguishes between MTP (Multi-Token Prediction for inference speedup) and QAT (Quantization-Aware Training for preserving 4-bit quality). It also confirms that llama.cpp's new MTP support requires updated GGUF files and a secondary draft model file for acceleration.
If LLMs Have Human-Like Attributes, Then So Does Age of Empires II
Hacker News (AI keywords)7 days agoPaper
The paper argues that claims about LLMs having human-like attributes, such as morality or language understanding, can be methodologically fragile. By building and training a simple neural network on Age of Empires II, the author suggests such attributes may not be empirically unique to LLMs. The key recommendation is to define explicit measurement criteria and use a null assumption of LLM non-uniqueness before drawing anthropomorphic conclusions.
GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78
r/LocalLLaMA top day7 days agoHardware
GMKtec has announced its EVO-X3 mini PC with upgraded I/O, including OCuLink and Wi-Fi 7. More importantly for local AI enthusiasts, the company teased a future model powered by AMD's flagship "Strix Halo" Ryzen AI MAX+ 495 APU. This upcoming monster will support up to 192GB of LPDDR5X memory, offering a highly anticipated, cost-effective alternative to Apple Silicon for running large local LLMs.
Reddit Discusses: What is Your Most Unusual Non-LLM AI Tool for Daily Use?
r/LocalLLaMA top day7 days agoCommentary
A popular thread on Reddit's r/LocalLLaMA asks users to share their most unusual or underrated non-LLM AI tools used in daily workflows. While LLMs dominate the spotlight, many developers and power users emphasize that single-purpose models—such as Whisper for transcription, Demucs for audio separation, and Segment Anything (SAM) for vision—offer superior efficiency and lower costs. The discussion highlights a growing trend toward practical, lightweight, and local AI solutions for specific tasks.
llama.cpp Gemma4 MTP Support Merged
r/LocalLLaMA top day7 days agoRelease
llama.cpp PR #23398 was merged on June 7, 2026, adding MTP support for Gemma4 models. The author reports over 2x average speedup on dense models, no observed speedup on MoE, and replicated AIME-26 results around 87%. Support currently covers 31B and 26B-4B variants, while E4B and E2B are not supported yet; multi-GPU may need extra draft-device configuration.
LLMs are eroding my software engineering career and I don't know what to do
Hacker News (AI keywords)7 days agoOpinion
The author argues that LLMs are eroding three pillars of his software engineering career: domain knowledge, debugging skill, and architecture judgment. Tools like ChatGPT, Claude, Claude Code, Codex, MCP, Sentry MCP, and DataDog MCP increasingly handle design, implementation, and difficult production bugs. The essay frames this as a labor-market concern, not just a tooling debate: if expertise becomes promptable, engineers may struggle to remain differentiated.
Dockerized Nemotron 3.5 ASR: Better Multilingual Support & Streaming (4.5x CPU Speed)
r/LocalLLaMA top day7 days agoNew Tool
A developer on Reddit shared a Dockerized implementation of Nemotron 3.5 ASR, migrating from Parakeet. The system supports over 40 languages and features a native streaming architecture that avoids full-file buffering. Using the onnxruntime-genai backend, it achieves 4.5x real-time speed on CPU, with CUDA support planned but untested.
NVIDIA, KRAFTON, NC and T1 Celebrate RTX Spark at Korea’s PC Bangs
NVIDIA Blog7 days agoHardware
After unveiling RTX Spark at GTC Taipei during COMPUTEX, NVIDIA brought the platform to South Korea’s gaming community. Jensen Huang visited T1 Base Camp and PC bangs in Seoul to show how RTX Spark targets local AI, creation and high-performance gaming on slim Windows laptops and compact desktops. Demos included League of Legends, VALORANT, PUBG, Subnautica 2, CINDER CITY, AION 2 and an unreleased NVIDIA ACE-powered PUBG Ally character.
Major P2P Issues in Israel and Possibly Other Middle East Countries
Hacker News (AI keywords)7 days agoIncident
A GitHub issue in ValveSoftware/GameNetworkingSockets reports major P2P issues affecting Israel and possibly other Middle East countries. No issue body was provided, so details such as root cause, versions, reproduction steps, and maintainer response are unknown. Developers using P2P networking should treat this as a regional connectivity incident worth monitoring, especially for games or real-time applications with Middle East users.
Show HN: Oproxy - inspect and modify network traffic from the browser
Hacker News (AI keywords)8 days agoNew Tool
Oproxy is a local HTTP, HTTPS, and SOCKS5 proxy with a browser-based management UI. It captures requests and responses, supports replay and Compose workflows, and can export HAR, cURL, Fetch, and Python snippets. Advanced features include HTTPS MITM, mock responses, throttling, breakpoints, DNS overrides, Lua scripts, and an OpenAI-compatible assistant for preparing confirmed proxy changes.
Sem: A Git-Based Primitive for Code Understanding, Not LSPs
Hacker News (AI keywords)8 days agoNew Tool
Sem is a CLI from Ataraxy Labs that layers semantic code understanding on top of Git. Instead of line-based diffs, it reports changed functions, classes, methods, and types. It offers diff, blame, impact, log, entities, and context commands, with JSON output and AI-oriented context generation, though its accuracy claims still need independent validation.
Five labs, five minds: building a multi-model finance drama on small models
Hugging Face Blog8 days agoCommentary
Based only on the title, the post likely describes a multi-model experiment where five model-like roles collaborate or clash in a finance-themed scenario. The emphasis appears to be on using small models rather than one large model, possibly to create a staged analytical or narrative experience. Without the article text, specific models, tools, architecture, and results cannot be verified.
Meta confirms thousands of Instagram accounts hacked via AI chatbot abuse★ 76
Hacker News (AI keywords)8 days agoIncident
Meta confirmed a vulnerability in Instagram’s AI-assisted account recovery system that let attackers redirect password reset links to attacker-controlled emails. At least 20,225 users were notified, with compromised accounts potentially exposing profile data, posts, direct messages, and activity. Meta says it has disabled the affected chatbot flow, removed the vulnerable code path, and asked impacted users to reset passwords through verified channels.
Police in England and Wales told to halt AI use in court statements
Hacker News (AI keywords)8 days agoRegulation
Based only on the headline, police in England and Wales have been told to halt AI use in court statements. The article text is unavailable, so the issuing authority, scope, rationale, and any specific incident cannot be confirmed. The topic points to broader concerns around accuracy, auditability, accountability, and procedural fairness when AI is used in legal or policing documents.
Nvidia is proposing a beast of a CPU system for Windows PCs
Hacker News (AI keywords)8 days agoHardware
Based only on the title, Nvidia appears to be proposing a high-end CPU system for Windows PCs. That could signal deeper ambitions beyond GPUs and AI accelerators into the core PC platform. However, no article text is available, so the architecture, specs, partners, timing, and product positioning remain unconfirmed.
Meta Keeps Delaying the Release of Its New AI Model to Developers
Hacker News (AI keywords)8 days agoRelease
The WSJ reports that Meta has repeatedly delayed the developer release of a new AI model after previously signaling it would arrive “soon.” Public summaries say the delay has stretched for nearly two months, with no scheduled API launch date at the time of reporting. The story matters less as a benchmark claim and more as a signal about Meta’s AI execution, developer ecosystem strategy, and monetization timeline.
Persona Atlas: Mapping How Famous Minds Think
Hugging Face Blog8 days agoNew Tool
The title suggests Persona Atlas is a project focused on representing or exploring the thinking styles of famous figures. The source text is unavailable, so its format, methods, data, model use, and results cannot be verified. It may be relevant to persona modeling, AI role-play, conversational agents, or thought-style visualization, but the practical impact remains unclear without the full post.
LLM Research Papers: The 2026 List (January to May)
Ahead of AI (Raschka)8 days agoPaper
Sebastian Raschka compiles a curated reference list of LLM papers he bookmarked from January through May 2026. The list is not comprehensive, but organized around topics useful for future articles, lectures, code examples, and research work. Public sections emphasize reasoning, RL, efficient inference, long context, agent systems, tool use, coding agents, diffusion language models, and serving infrastructure.
The Smart TV in Your Living Room Is a Node in the AI Scraping Economy★ 74
Hacker News (AI keywords)8 days agoEthics
Include Security examines how Bright Data’s SDK supplies residential proxy capacity through partner apps on phones and connected TVs. The post argues smart TVs are especially attractive because they are always powered, often on fast Wi-Fi, and rarely monitored. It details public configuration endpoints, peer tunnel behavior, telemetry, VPN visibility bypasses, bandwidth limits, and practical DNS or network-blocking defenses.
Running Python code in a sandbox with MicroPython and WASM
Simon Willison's Weblog8 days agoNew Tool
Simon Willison describes his latest attempt to safely run Python plugin-style code inside his own applications. The alpha package micropython-wasm uses MicroPython compiled to WebAssembly, executed through the maintained wasmtime Python library. His goals include clean PyPI installation, CPU and memory limits, controlled file and network access, host functions, and reliable documentation.
Thousand Token Wood: shipping a multi-agent economy on a 3B model
Hugging Face Blog9 days agoTutorial
Based on the title, this Hugging Face Blog post presents Thousand Token Wood, a project shipping a multi-agent economy on a 3B model. The likely focus is practical system design under small-model constraints, rather than a new frontier-scale model release. Without the original text, details such as the exact model, architecture, benchmarks, code availability, and results cannot be confirmed.
Hermes Agent – Open-source AI agent with persistent memory
Hacker News (AI keywords)9 days agoNew Tool
Hermes Agent is an open-source autonomous agent by Nous Research, designed to run on your own server or machine with persistent local memory. It offers messaging gateways, scheduled automations, browser control, parallel sub-agents, reusable skills, and multiple LLM provider options. The project also targets MLOps and research workflows, including tool-calling trajectory generation, RL experiments, and exportable fine-tuning data.
Warren's Abstract Machine: A Tutorial Reconstruction
Hacker News (AI keywords)9 days agoTutorial
This repository preserves Hassan Ait-Kaci’s out-of-print tutorial on the Warren Abstract Machine, a key execution model for Prolog and logic programming systems. It is not a new AI model or product launch, but a useful historical and educational resource. The material is most relevant to developers and researchers interested in symbolic AI, compilers, unification, backtracking, and logic language runtimes.
Transformers are inherently succinct★ 74
Hacker News (AI keywords)9 days agoPaper
This paper studies transformer expressivity through succinctness: how compactly a formalism describes a language. It proves fixed-precision transformers can be exponentially more succinct than LTL and RNNs, and doubly exponentially more succinct than finite automata. The same succinctness makes verification hard, with basic problems such as emptiness and equivalence shown to be EXPSPACE-complete.
How to Stop Shipping Low-Quality RL Environments (with Examples)
Latent Space9 days agoTutorial
The post argues that low-quality RL environments are not harmless infrastructure bugs; they can make models worse by feeding them broken learning signals. Based on years of inspecting trajectories, the author highlights recurring environment and harness failures that teams need to fix. The practical lesson is to debug the training environment, grader, and interaction traces before blaming the model or scaling training.
Mantine DataTable source repo compromised; owner account suspended★ 74
Hacker News (AI keywords)9 days agoIncident
A GitHub security notice says Mantine DataTable and other repositories received unauthorized commits through the github-actions bot. The npm packages were reported safe; the risk targets developers who recently cloned or pulled the source and open it in VS Code, Cursor, Claude Code, Gemini, or run npm test. A later update links the payload to the Miasma / Shai-Hulud worm family and says a stolen credential is the likely path.
Launch HN: General Instinct (YC P26) - Frontier models on edge devices
Hacker News (AI keywords)9 days agoNew Tool
General Instinct is a YC P26 company introduced through a Launch HN post. Its headline positioning is bringing frontier models to edge devices, suggesting local or embedded AI deployment rather than purely cloud-based inference. Since no article body is available, details such as supported models, hardware, benchmarks, pricing, and developer tooling cannot be verified from the provided source.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)9 days agoRelease
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.
pg_durable: Microsoft open sources in-database durable execution
Hacker News (AI keywords)9 days agoRelease
Microsoft has open sourced pg_durable on GitHub, described in the title as an in-database durable execution project. From the name, it likely relates to PostgreSQL and persistence of execution state inside the database. Since no article body or README content was provided, details such as architecture, maturity, licensing, and production readiness cannot be confirmed.
Ask HN: What is your (AI) dev tech stack / workflow?
Hacker News (AI keywords)9 days agoCommentary
An Ask HN thread asks developers to share their current AI-assisted development setup for upcoming in-person workshops. The author wants guidance for beginners and working developers, with use cases ranging from static sites to FastAPI tools and Linux home automation. Replies cover Claude Code, Cursor, GitHub Copilot, VSCode, spec-driven development, TDD, multi-agent workflows, reviews, and quality control.

← PreviousPage 7Next →

Latest in AI

MTP and QAT: What is the Relation? Running Gemma 4 31B in llama.cpp

If LLMs Have Human-Like Attributes, Then So Does Age of Empires II

GMKtec Announces EVO-X3 Mini PC, Teases 192GB Ryzen AI MAX+ 495 "Strix Halo" Monster★ 78

Reddit Discusses: What is Your Most Unusual Non-LLM AI Tool for Daily Use?

llama.cpp Gemma4 MTP Support Merged

LLMs are eroding my software engineering career and I don't know what to do

Dockerized Nemotron 3.5 ASR: Better Multilingual Support & Streaming (4.5x CPU Speed)

NVIDIA, KRAFTON, NC and T1 Celebrate RTX Spark at Korea’s PC Bangs

Major P2P Issues in Israel and Possibly Other Middle East Countries

Show HN: Oproxy - inspect and modify network traffic from the browser

Sem: A Git-Based Primitive for Code Understanding, Not LSPs

Five labs, five minds: building a multi-model finance drama on small models

Meta confirms thousands of Instagram accounts hacked via AI chatbot abuse★ 76

Police in England and Wales told to halt AI use in court statements

Nvidia is proposing a beast of a CPU system for Windows PCs

Meta Keeps Delaying the Release of Its New AI Model to Developers

Persona Atlas: Mapping How Famous Minds Think

LLM Research Papers: The 2026 List (January to May)

The Smart TV in Your Living Room Is a Node in the AI Scraping Economy★ 74

Running Python code in a sandbox with MicroPython and WASM

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Hermes Agent – Open-source AI agent with persistent memory

Warren's Abstract Machine: A Tutorial Reconstruction

Transformers are inherently succinct★ 74

How to Stop Shipping Low-Quality RL Environments (with Examples)

Mantine DataTable source repo compromised; owner account suspended★ 74

Launch HN: General Instinct (YC P26) - Frontier models on edge devices

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72

pg_durable: Microsoft open sources in-database durable execution

Ask HN: What is your (AI) dev tech stack / workflow?