Latest in AI

Showing:ResearchersOpen-sourceClear ×

🔥 Trending today

anthropic5 open-source3 amazon3 export-controls3 national-security2 model-access2 ai-regulation2 government-policy2 geopolitics2 privacy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Introducing Mistral Small 4★ 78
Mistral AI News6 days agoRelease
Mistral Small 4 is the next major release in the Mistral Small family, unifying Magistral-style reasoning, Pixtral-style multimodality, and Devstral-style coding agents. It uses a MoE architecture with 119B total parameters, 6B active parameters per token, a 256k context window, and configurable reasoning effort. The model is available via Mistral API, AI Studio, Hugging Face, open-source serving stacks, and NVIDIA deployment options.
VAST Raises Nearly $200M and Reveals Its Project Eden World Model Roadmap★ 74
量子位 QbitAI6 days agoBusiness
VAST completed nearly $200 million in A+ and A++ financing after its March 2026 Series A. The company also unveiled Project Eden, a world model approach that separates persistent state transition from generative visual rendering. The roadmap targets persistent virtual environments, multiplayer interaction, reusable scenes, AI-native sandbox creation, and embodied AI simulation, while acknowledging unresolved challenges in complex physics and autonomous state maintenance.
World’s First Robot Training Property: 300,000 Chinese Homes for Robots
量子位 QbitAI6 days agoRelease
Daxiao Robot and CUHK MMLab introduced Kairos-Homeworld, an open project with 300,000 Chinese residential floor plans and 5,000 interactive 3D home scenes. It can generate full household environments from prompts, including layouts, furniture, objects, and physical properties. The article frames it alongside Kairos 3.0-4B as part of a broader embodied AI stack: world model, data, and environment.
Huawei Cloud Launches Agentic AI Products for Enterprise AI Infrastructure★ 72
量子位 QbitAI6 days agoRelease
Huawei Cloud announced an Agentic Infra framework at its INSPIRE event, covering token generation, persistent memory, unified scheduling, and secure autonomous runtime. The release includes AICS, AMS, CCE Volcano Next, AgentSphere, ModelArts Next, AgentArts, and the open-source openJiuwen project. It also introduced industry AI zones, CloudRobo for embodied AI, security offerings, and an ecosystem plan with major Chinese model vendors.
CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76
量子位 QbitAI6 days agoPaper
CVPR 2026 named Google DeepMind’s D4RT as Best Paper for fast dynamic 4D scene reconstruction from video. Honorable mentions included Meta’s SAM 3D and NVIDIA’s NitroGen, while TRELLIS.2 won Best Student Paper. The article emphasizes Chinese researcher visibility, ResNet and YOLO receiving the Longuet-Higgins Prize, and a GDUT-led undergraduate-heavy ChordEdit team breaking through among major labs and elite universities.
JoyAI-Echo open-source framework targets stable 5-minute AI long videos★ 72
量子位 QbitAI6 days agoNew Tool
QbitAI reports that JD’s team has open-sourced JoyAI-Echo, a long audio-video generation framework for multi-minute AI videos. It targets character drift, unstable voice, slow inference, and blurry output through cross-modal memory, memory-driven post-training, and lightweight real-time super-resolution. The system also includes a Director Agent for script planning, shot-level generation, localized edits, and iterative video production.
Thoughts on Gemma4 12B vs 26A4B: Which Is Better?
r/LocalLLaMA top day6 days agoOpinion
The post asks the LocalLLaMA community to compare Gemma4 12B and 26A4B, explicitly excluding the 31B model from discussion. The user is mainly interested in creative tasks, writing, and chatting, with coding treated as optional rather than central. No benchmarks or examples are provided, so the post is best read as a model-selection question about subjective quality and practical use.
Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75
r/LocalLLaMA top day6 days agoBenchmark
A Reddit user shared benchmark results showing Google's Gemma 4 31B (FP8) performing on par with Claude Sonnet 4.6 Medium. The custom evaluation harness tested complex tasks including Neo4j Cypher queries, entity extraction, agentic tool calling, Python coding, and multi-vector retrieval synthesis. This highlights how quantized mid-sized open-source models are closing the gap with leading proprietary frontier models.
Best Local TTS Solution
r/LocalLLaMA top day6 days agoCommentary
A r/LocalLLaMA user says they have tested many local TTS tools, but none match ElevenLabs for expressiveness, voices, and cloning. They list moss-nano and Kokoro as the best edge-device candidates so far, with edgeTTS as a free/cloud option. The post asks for community experience connecting agents such as Hermes, openclaw, or opencode to Telegram voice notes or real-time voice conversations.
A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W
Hacker News (AI keywords)6 days agoHardware
This GitHub repository collects Rust Embassy examples for Raspberry Pi Pico 2 and Pico 2 W. Its Matter Wi-Fi light example uses rs-matter, BLE commissioning, and Wi-Fi connectivity so the board can appear as a standard smart bulb in Home Assistant, Apple Home, or Google Home. The project is mainly relevant to embedded Rust and smart-home developers, not AI model users.
The Open Source Community is backing OpenEnv for Agentic RL
Hugging Face Blog6 days agoCommentary
The title indicates that OpenEnv is being positioned around agentic reinforcement learning. The confirmed signal is community support from the open-source ecosystem, not specific technical claims. Without the full article, details such as contributors, features, integrations, benchmarks, or adoption status should be treated as unknown.
"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day6 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?
r/LocalLLaMA top day7 days agoCommentary
A popular Reddit thread on r/LocalLLaMA discusses the potential of 2-bit Quantization Aware Training (QAT) for large MoE models (120B to 400B). While current QAT efforts focus on 4-bit, users speculate whether a 2-bit QAT model could fit into consumer hardware (64GB/128GB RAM) and outperform a 4-bit model of half its size. This approach is proposed as a practical alternative to training ternary (1.58-bit) LLMs from scratch.
Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version
r/LocalLLaMA top day7 days agoBenchmark
A LocalLLaMA user highlighted that the newly released QAT (Quantization-Aware Training) variant of Google's Gemma-4-26B-A4B model underperforms compared to its non-QAT predecessor. Testing via llama.cpp on a chessboard SVG generation task showed significant rendering errors in the QAT version. The non-QAT GGUF version, however, produced highly accurate results under identical settings.
Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas
Hacker News (AI keywords)7 days agoNew Tool
office-open-xml-viewer is an open-source browser viewer for Office Open XML documents, rendering DOCX, XLSX, and PPTX files to HTML Canvas. Its parsers are written in Rust and compiled to WebAssembly, while rendering uses the Canvas 2D API. The README also says the full codebase was implemented by Claude through iterative prompting, making it notable as an AI-assisted software development case.
Control 3D Avatars with Natural Language Using "Program as Weights" (programasweights)
r/LocalLLaMA top day7 days agoNew Tool
Developer Yuntian Deng introduced "programasweights," a framework that compiles plain-English descriptions into tiny, local action programs (loops, parallel tracks) to control 3D avatars. Instead of pre-defined buttons, users can command complex sequences like "wave while walking, then jump." The runtime code is open-source and runs entirely offline in the browser or via Python.
llama.cpp Gemma4 MTP Support Merged
r/LocalLLaMA top day7 days agoRelease
llama.cpp PR #23398 was merged on June 7, 2026, adding MTP support for Gemma4 models. The author reports over 2x average speedup on dense models, no observed speedup on MoE, and replicated AIME-26 results around 87%. Support currently covers 31B and 26B-4B variants, while E4B and E2B are not supported yet; multi-GPU may need extra draft-device configuration.
Clustering 3x Jetson Nano Orin Supers for Distributed AI
r/LocalLLaMA top day7 days agoTutorial
A developer has shared a practical guide on clustering three NVIDIA Jetson Nano Orin Super boards, leveraging their Ampere CUDA cores and unified memory. This project is part of 'smolcluster,' an initiative to make distributed AI training and inference accessible using everyday hardware like Macs, Raspberry Pis, and Jetsons. The series aims to explore whether heterogeneous clusters (mixing different hardware architectures) can effectively run local LLMs.
Persona Atlas: Mapping How Famous Minds Think
Hugging Face Blog8 days agoNew Tool
The title suggests Persona Atlas is a project focused on representing or exploring the thinking styles of famous figures. The source text is unavailable, so its format, methods, data, model use, and results cannot be verified. It may be relevant to persona modeling, AI role-play, conversational agents, or thought-style visualization, but the practical impact remains unclear without the full post.
LLM Research Papers: The 2026 List (January to May)
Ahead of AI (Raschka)8 days agoPaper
Sebastian Raschka compiles a curated reference list of LLM papers he bookmarked from January through May 2026. The list is not comprehensive, but organized around topics useful for future articles, lectures, code examples, and research work. Public sections emphasize reasoning, RL, efficient inference, long context, agent systems, tool use, coding agents, diffusion language models, and serving infrastructure.
Thousand Token Wood: shipping a multi-agent economy on a 3B model
Hugging Face Blog8 days agoTutorial
Based on the title, this Hugging Face Blog post presents Thousand Token Wood, a project shipping a multi-agent economy on a 3B model. The likely focus is practical system design under small-model constraints, rather than a new frontier-scale model release. Without the original text, details such as the exact model, architecture, benchmarks, code availability, and results cannot be confirmed.
Hermes Agent – Open-source AI agent with persistent memory
Hacker News (AI keywords)8 days agoNew Tool
Hermes Agent is an open-source autonomous agent by Nous Research, designed to run on your own server or machine with persistent local memory. It offers messaging gateways, scheduled automations, browser control, parallel sub-agents, reusable skills, and multiple LLM provider options. The project also targets MLOps and research workflows, including tool-calling trajectory generation, RL experiments, and exportable fine-tuning data.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)9 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)9 days agoRelease
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)9 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
Fine-tuning an LLM to write docs like it's 1995
Hacker News (AI keywords)9 days agoTutorial
The author builds a corpus from old Microsoft manuals, cleans OCR text, generates instruction-style JSONL examples, and fine-tunes Llama 3.1 8B and Qwen 2.5 7B with QLoRA. Tests cover malloc(), a fictional Win32 API, and a deliberately anachronistic REST API prompt. Qwen fine-tunes transfer the period documentation style best, but the experiment also shows hallucination risks, tuning complexity, and why these models augment rather than replace technical writers.
Magenta RealTime 2: An Open, Locally Runnable Real-Time Music Model★ 74
Hacker News (AI keywords)9 days agoRelease
Magenta RealTime 2 is an open-weights live music model designed for interactive performance rather than offline prompt-to-song generation. It supports real-time control through MIDI, audio, and text, and can run as standalone apps, DAW plugins, or embedded music software. Google Magenta also released a Python library, C++ MLX inference engine, models, and example applications for musicians and developers.
Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining
Hugging Face Blog10 days agoTutorial
The post appears to focus on generating synthetic Q&A data from task seeds for Nemotron pretraining. Rather than a model launch, it likely emphasizes data generation and pretraining corpus design. Because the original article text is unavailable here, concrete claims about dataset scale, benchmarks, or implementation details should not be inferred.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space10 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
Designing the hf CLI as an agent-optimized way to work with the Hub
Hugging Face Blog10 days agoCommentary
Based only on the title, this Hugging Face post appears to explain how the hf CLI is being designed for AI agents working with the Hub. It likely focuses on command-line ergonomics, automation, and predictable interactions with Hub resources. Without the full text, specific features, supported agents, or implementation details should not be inferred.

← PreviousPage 4Next →

Latest in AI

Introducing Mistral Small 4★ 78

VAST Raises Nearly $200M and Reveals Its Project Eden World Model Roadmap★ 74

World’s First Robot Training Property: 300,000 Chinese Homes for Robots

Huawei Cloud Launches Agentic AI Products for Enterprise AI Infrastructure★ 72

CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76

JoyAI-Echo open-source framework targets stable 5-minute AI long videos★ 72

Thoughts on Gemma4 12B vs 26A4B: Which Is Better?

Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75

Best Local TTS Solution

A Matter Wi-Fi Light Bulb in Rust on the Raspberry Pi Pico 2 W

The Open Source Community is backing OpenEnv for Agentic RL

"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts

Exploring 2-bit QAT: Can Ultra-Compressed Large Models Outperform 4-bit Models Half Their Size?

Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version

Office-open-xml-viewer: Office XML document viewer rendering to HTML Canvas

Control 3D Avatars with Natural Language Using "Program as Weights" (programasweights)

llama.cpp Gemma4 MTP Support Merged

Clustering 3x Jetson Nano Orin Supers for Distributed AI

Persona Atlas: Mapping How Famous Minds Think

LLM Research Papers: The 2026 List (January to May)

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Hermes Agent – Open-source AI agent with persistent memory

Tiny hackable CUDA language model implementation

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72

Arithmetic Without Numbers: How LLMs Do Math

Fine-tuning an LLM to write docs like it's 1995

Magenta RealTime 2: An Open, Locally Runnable Real-Time Music Model★ 74

Task-Seeded Synthetic Q&A Generation for Nemotron Pretraining

Reve 2 and Ideogram 4: Layouts in Imagegen

Designing the hf CLI as an agent-optimized way to work with the Hub