Latest in AI

🔥 Trending today

anthropic4 open-source3 amazon3 ai-regulation2 government-policy2 export-controls2 geopolitics2 privacy2 python-packaging2 webassembly2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues
r/LocalLLaMA top day4 days agoBenchmark
A LocalLLaMA user tried to benchmark Google’s new fully local dictation app, Eloquent, against open ASR models such as Qwen3-ASR and NVIDIA Parakeet V3. The tester reported that roughly half of dictations returned only fragments, even during manual use. When Eloquent produced complete transcripts, its word error rate was competitive, but the missing-output behavior made the app unreliable for evaluation and practical use.
Amazon Borrows Another $17.5 Billion From Banks as AI Spending Keeps Rising
TechCrunch AI4 days agoBusiness
TechCrunch reports that Amazon borrowed $17.5 billion from banks shortly after a bond sale. The article frames the move within the broader AI arms race, where companies are spending heavily to keep pace. The available text does not specify how the loan will be used, but it highlights growing debt pressure tied to escalating AI investment.
DiffusionGemma: Google Launches High-Speed Open-Weight Gemma Diffusion Model★ 76
Simon Willison's Weblog4 days agoRelease
Simon Willison highlights Google’s new DiffusionGemma, an Apache 2 licensed open-weight Gemma model. He connects it to last year’s brief Gemini Diffusion preview, which he measured at 857 tokens per second. NVIDIA is currently hosting the model for free on its NIM cloud API, where Willison generated 2,409 tokens in 4.4 seconds, implying at least 500 tokens per second.
Google DeepMind Releases DiffusionGemma: Open Source Model with 4x Local AI Execution Speed Improvement
Ars Technica AI4 days agoRelease
Google DeepMind has released DiffusionGemma, an open-source model that brings diffusion-based generation to text tasks. Unlike autoregressive LLMs that generate one token at a time, diffusion models can produce outputs in parallel, dramatically cutting latency. The result is reportedly a 4x speed improvement for local AI inference, making on-device deployment significantly more practical.
Show HN: Building a Map of People Who Lived in the Roman Empire
Hacker News (AI keywords)4 days agoNew Tool
A creator posted to Hacker News a personal project mapping individuals who lived in the Roman Empire, hosted at roman-names.com. The project appears to be a digital humanities effort to visualize historical population data geographically. No AI-specific content or tooling is mentioned in the source title or body.
LocalLLaMA User Weighs QAT Gemma 31B GGUF Quants for RTX 3060
r/LocalLLaMA top day4 days agoCommentary
A Reddit user with an RTX 3060 12GB and 32GB DDR3 RAM is evaluating new QAT-based Gemma 31B GGUF quantizations. They currently run an older Unsloth Gemma 31B IQ3_XXS build at long context, with some tensor and mmproj offloading to CPU. The post asks which Q2-Q3 quant to choose, whether QAT changes quality expectations, and whether MTP would help or hurt under tight VRAM limits.
Robotaxi Safety Must Be Built In, Not Added Later
NVIDIA Blog4 days agoCommentary
NVIDIA argues that robotaxi safety requires more than perception and driving decisions. The post presents Halos OS as a production safety foundation covering a certifiable OS, standardized interfaces, AI guardrails and large-scale validation. It also highlights global robotaxi collaborations using DRIVE Hyperion and the broader Halos stack across training, simulation and in-vehicle inference.
πfs: the data-free filesystem that “stores” data in π
Hacker News (AI keywords)4 days agoNew Tool
πfs is an open-source FUSE-style filesystem built around a deliberately absurd idea: data does not need to be stored if it can be located in pi. It records metadata such as file names and positions in pi, then reconstructs content from those locations. The project is more technical humor and conceptual demonstration than practical storage or AI tooling.
Claude Fable 5 won't answer basic biology questions despite being marketed for biology skills
The Verge AI4 days agoIncident
Anthropic launched Claude Fable 5 as its most powerful model yet, specifically touting its biology capabilities. However, users found the model refuses to answer basic high-school-level biology questions, instead handing queries off to the previous flagship model. The contradiction raises questions about overly aggressive safety filters undermining the model's advertised strengths.
Policy on the AI Exponential★ 72
Hacker News (AI keywords)4 days agoOpinion
Anthropic CEO Dario Amodei publishes a policy essay on his personal blog examining the challenge of governing AI's exponential capability growth. The piece addresses how governments and institutions must adapt their regulatory frameworks to keep pace with rapidly accelerating AI. As one of the most influential voices in AI safety, Amodei's policy views carry significant weight for lawmakers, researchers, and industry leaders at this critical moment in AI governance.
Apple Intelligence Enables Safari to Generate Extensions with Natural Language
INSIDE 硬塞 AI4 days agoRelease
INSIDE reports that Apple is adding several AI features to Safari, led by a natural-language extension creation feature called “Describe Extension.” Users can describe what they want, and Apple Intelligence helps turn that request into a practical Safari extension. The article frames this as bringing vibe coding to everyday browser customization, though implementation details, model architecture, safety controls, and quality limits are not provided.
Seeking the Best Open-Source Coding AI for an RTX 5070 PC
r/LocalLLaMA top day4 days agoOpinion
A Reddit user on r/LocalLLaMA is looking for the most powerful open-source AI coding model that can run on their Windows 11 desktop. Their system includes an AMD Ryzen 7 7700 CPU, RTX 5070 GPU, and 32GB of DDR5 RAM. The intended use cases are writing, coding, and debugging, but the post itself does not include benchmark results, candidate models, or community recommendations.
llama.cpp Merges MTP Optimization Removing Padding and Extra D2D Copies
r/LocalLLaMA top day4 days agoRelease
llama.cpp merged PR #24086, which changes ggml_gated_delta_net so MTP passes snapshot count K as an operation parameter instead of deriving it from tensor shape. The change removes a padding workaround and copies emitted snapshots into the recurrent cache with a single strided ggml_cpy. Benchmarks on DGX Spark with Qwen3.6-35B-A3B-UD-Q4_K_M.gguf showed about a 4% throughput gain, with wall time falling from 21.71s to 20.91s.
Microsoft says it totally understands why students are booing AI-hyping graduation speakers
The Verge AI4 days agoCommentary
Graduating students across the US have been booing and heckling commencement speakers who promote AI, with clips going viral online. Microsoft Vice Chair Brad Smith responded with a lengthy blog post acknowledging students' concerns and calling for dialogue. The episode highlights a growing disconnect between tech industry optimism about AI and the anxieties of young people entering the workforce.
The future of AI regulation is courting the strangest, most anxious bedfellows
The Verge AI4 days agoRegulation
Regulator, The Verge's subscription newsletter on DC tech politics, returns after a two-week hiatus. The piece focuses on how AI regulation is drawing together unusual, anxious political bedfellows in Washington. With the 2026 midterms approaching, AI policy is becoming a surprisingly cross-partisan battleground.
New Framework for Auditing Machine Unlearning
Google Research Blog4 days agoPaper
Machine unlearning lets models selectively forget specific training data, critical for GDPR compliance and AI safety. However, approximate unlearning algorithms lack objective verification mechanisms, making it hard to confirm unlearning actually occurred. Google Research's new auditing framework addresses this gap with quantifiable metrics to assess unlearning quality and make forgetting claims auditable.
Google Won't Admit It's Using YouTube Creators' Music to Train Its Lyria AI
The Verge AI4 days agoRegulation
A group of independent musicians has filed a lawsuit against Google, claiming it illegally used their YouTube-uploaded songs to train its Lyria 3 music AI model. Google has responded to the suit but refuses to openly confirm or deny whether YouTube content is used as training data. The case raises urgent questions about creator rights and consent when platform uploads become AI fuel.
Nobody needs AI to search the Internet, court says in Google ruling★ 74
Ars Technica AI4 days agoRegulation
Ars Technica reports that Google lost a German court fight involving AI Overview, with the court rejecting the idea that AI is necessary for searching the Internet. The ruling matters because AI search products summarize web content in ways that may reduce visits to original sources. If courts treat AI summaries as optional rather than essential search infrastructure, Google and rivals may face tougher legal limits around content use, attribution, and publisher impact.
Claude Desktop Spins Up a VM with No Way to Stop It
Hacker News (AI keywords)4 days agoIncident
GitHub issue #29045 in the anthropics/claude-code repo reports that Claude Desktop automatically spins up a virtual machine without user consent or control. The core problem is the absence of any stop mechanism, leaving the VM running indefinitely and consuming system resources. This raises concerns about transparency, resource management, and user control over Claude Desktop's execution environment.
'AI-pilled' firms spend $7,500 per employee per month on AI
TechCrunch AI4 days agoBusiness
According to the Ramp AI Index, the most aggressive AI adopters spend roughly $7,500 per employee each month on AI tools. The report notes this figure hasn't yet surpassed a typical engineer's salary — with the word 'yet' carrying significant weight. For founders and CFOs, this signals AI tooling costs are graduating from rounding errors to a budget category rivaling headcount.
Microsoft restricts internal employee use of Claude Fable 5 over data retention concerns
The Verge AI4 days agoIncident
Microsoft has restricted internal employee use of Claude Fable 5, citing concerns over Anthropic's new data retention policies attached to the model. The move comes despite Microsoft rapidly deploying the model to GitHub Copilot and Azure AI Foundry customers externally. The situation highlights growing tension between commercial AI adoption and internal compliance standards at major tech firms, where third-party data retention terms can block internal use even when a product is actively sold to customers.
Security Researchers Criticize Anthropic Fable Safeguards as Too Strict
Hacker News (AI keywords)4 days agoEthics
Anthropic released Fable as a public but limited version of its cybersecurity-focused Mythos model. Security researchers say its guardrails trigger on broad cyber-related wording, blocking tasks like blog analysis, secure coding, and code review. The restrictions aim to reduce malware, software compromise, and biology-related misuse, but the current implementation may frustrate legitimate security work.
FlashMemory-DeepSeek-V4: Ultra-Long Context via Lookahead Sparse Attention
r/LocalLLaMA top day4 days agoPaper
FlashMemory-DeepSeek-V4 introduces Lookahead Sparse Attention (LSA), a predictive inference paradigm that retains only query-critical KV chunks in GPU memory instead of the full cache. A Neural Memory Indexer, trained independently using a backbone-free dual-encoder strategy, proactively forecasts which historical tokens will matter next. The system compresses average KV cache footprint by 86.5% and exceeds 90% compression at 500K-token scales, while delivering a slight accuracy gain of +0.6% on long-context benchmarks.
DiffusionGemma: 4x faster text generation★ 74
Google DeepMind Blog4 days agoRelease
Google’s DiffusionGemma is an Apache 2.0 experimental open model using text diffusion instead of standard autoregressive decoding. The 26B MoE model activates 3.8B parameters during inference and is designed for low-latency local workflows. Google claims up to 4x faster generation on dedicated GPUs, while noting that output quality is below standard Gemma 4 and production-quality use cases should still prefer Gemma 4.
Reddit User Asks for Updates on Taalas LLM Accelerator Chips
r/LocalLLaMA top day4 days agoHardware
A Reddit user in r/LocalLLaMA is looking for updates on Taalas chips, referencing earlier claims that the company planned to embed or hardcode a mid-tier LLM into its hardware. The post asks what model might be used, when the chip could arrive, and what pricing might look like. The source itself provides no confirmed answers, specifications, launch date, model name, or pricing information.
Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day4 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
Google will save your Lens photos, Search Live recordings, and Translate audio for AI training
The Verge AI4 days agoEthics
Google has notified users via email that it will begin saving multimedia inputs—images from Google Lens, real-time recordings from Search Live, and audio from Translate—under a new 'Search Services History' setting. This data will be retained and potentially used to train and improve Google's AI models. Users concerned about privacy should review their account settings to manage or disable this data collection.
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
NVIDIA Blog4 days agoRelease
Google DeepMind released DiffusionGemma, an experimental open model built for fast text generation. NVIDIA says it optimized the model for GeForce RTX GPUs, RTX PRO platforms, and DGX Spark systems. Instead of generating text one word at a time, DiffusionGemma produces multiple words in parallel to reduce latency for single-user workloads.
DiffusionGemma: 4x Faster Text Generation
r/LocalLLaMA top day4 days agoRelease
Google has announced DiffusionGemma, a text-generation model that applies diffusion-based techniques to the Gemma architecture, claiming speeds four times faster than standard autoregressive generation. Unlike conventional language models that predict tokens one at a time, diffusion-based methods generate text through iterative denoising, enabling parallel output. The release, published on Google's official blog, drew immediate attention from the local-LLM community for its potential inference-efficiency gains.
DiffusionGemma: The Developer Guide — Google Developers Blog
r/LocalLLaMA top day4 days agoTutorial
Google has released a comprehensive developer guide for DiffusionGemma, a text-generation model that uses masked diffusion rather than autoregressive next-token prediction. Unlike standard Gemma models, DiffusionGemma iteratively denoises a fully masked sequence to produce output, enabling a fundamentally different generation paradigm. The guide targets developers looking to integrate or experiment with diffusion-based LLMs using Google's tooling.

← PreviousPage 7Next →

Latest in AI

Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues

Amazon Borrows Another $17.5 Billion From Banks as AI Spending Keeps Rising

DiffusionGemma: Google Launches High-Speed Open-Weight Gemma Diffusion Model★ 76

Google DeepMind Releases DiffusionGemma: Open Source Model with 4x Local AI Execution Speed Improvement

Show HN: Building a Map of People Who Lived in the Roman Empire

LocalLLaMA User Weighs QAT Gemma 31B GGUF Quants for RTX 3060

Robotaxi Safety Must Be Built In, Not Added Later

πfs: the data-free filesystem that “stores” data in π

Claude Fable 5 won't answer basic biology questions despite being marketed for biology skills

Policy on the AI Exponential★ 72

Apple Intelligence Enables Safari to Generate Extensions with Natural Language

Seeking the Best Open-Source Coding AI for an RTX 5070 PC

llama.cpp Merges MTP Optimization Removing Padding and Extra D2D Copies

Microsoft says it totally understands why students are booing AI-hyping graduation speakers

The future of AI regulation is courting the strangest, most anxious bedfellows

New Framework for Auditing Machine Unlearning

Google Won't Admit It's Using YouTube Creators' Music to Train Its Lyria AI

Nobody needs AI to search the Internet, court says in Google ruling★ 74

Claude Desktop Spins Up a VM with No Way to Stop It

'AI-pilled' firms spend $7,500 per employee per month on AI

Microsoft restricts internal employee use of Claude Fable 5 over data retention concerns

Security Researchers Criticize Anthropic Fable Safeguards as Too Strict

FlashMemory-DeepSeek-V4: Ultra-Long Context via Lookahead Sparse Attention

DiffusionGemma: 4x faster text generation★ 74

Reddit User Asks for Updates on Taalas LLM Accelerator Chips

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

Google will save your Lens photos, Search Live recordings, and Translate audio for AI training

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

DiffusionGemma: 4x Faster Text Generation

DiffusionGemma: The Developer Guide — Google Developers Blog