Latest in AI

Showing:ResearchersGeminiClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

AINews: Fable and Mythos Access Suspended Over Cybersecurity Risk★ 76
Latent SpaceyesterdayIncident
Anthropic’s Claude Fable 5 and Mythos 5 were abruptly suspended after a US export-control directive tied to a possible jailbreak and national cybersecurity risk. The roundup frames the event as a new “model sovereignty” warning for teams relying on closed frontier APIs. It also covers Kimi-K2.7-Code, MiniMax M3, DeepSWE replacing SWE-Bench Pro, agent-inference benchmarks, sandboxing, and Gemini-SQL2.
Google Sues Chinese Cybercrime Network Over Gemini-Aided Scam Sites
Ars Technica AI2 days agoIncident
Google filed a lawsuit against an alleged Chinese cybercrime network called Outsider Enterprise, claiming it used Gemini to help build scam websites at scale. The operation reportedly sent millions of messages and targeted hundreds of thousands of smartphone users with phishing pages impersonating mobile carriers and other services. The case highlights how generative AI can lower the cost of cybercrime while raising pressure on AI providers to police misuse.
[AINews] Open Models, Model Labs vs Agent Labs, and the Untrainable★ 72
Latent Space3 days agoCommentary
This AINews issue uses Sarah Guo’s essay as a lens for current AI industry debates: where open models matter, how agent labs differ from model labs, and what cannot be trained away. It also recaps discourse around Anthropic Fable/Mythos, Fable 5’s capabilities, Google’s DiffusionGemma, and maturing agent infrastructure. The central takeaway is that durable value may lie in integration, customer translation, maintenance, and intent rather than model scores alone.
Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues
r/LocalLLaMA top day3 days agoBenchmark
A LocalLLaMA user tried to benchmark Google’s new fully local dictation app, Eloquent, against open ASR models such as Qwen3-ASR and NVIDIA Parakeet V3. The tester reported that roughly half of dictations returned only fragments, even during manual use. When Eloquent produced complete transcripts, its word error rate was competitive, but the missing-output behavior made the app unreliable for evaluation and practical use.
DiffusionGemma: Google Launches High-Speed Open-Weight Gemma Diffusion Model★ 76
Simon Willison's Weblog3 days agoRelease
Simon Willison highlights Google’s new DiffusionGemma, an Apache 2 licensed open-weight Gemma model. He connects it to last year’s brief Gemini Diffusion preview, which he measured at 857 tokens per second. NVIDIA is currently hosting the model for free on its NIM cloud API, where Willison generated 2,409 tokens in 4.4 seconds, implying at least 500 tokens per second.
Google DeepMind Releases DiffusionGemma: Open Source Model with 4x Local AI Execution Speed Improvement
Ars Technica AI3 days agoRelease
Google DeepMind has released DiffusionGemma, an open-source model that brings diffusion-based generation to text tasks. Unlike autoregressive LLMs that generate one token at a time, diffusion models can produce outputs in parallel, dramatically cutting latency. The result is reportedly a 4x speed improvement for local AI inference, making on-device deployment significantly more practical.
DiffusionGemma: 4x faster text generation★ 74
Google DeepMind Blog4 days agoRelease
Google’s DiffusionGemma is an Apache 2.0 experimental open model using text diffusion instead of standard autoregressive decoding. The 26B MoE model activates 3.8B parameters during inference and is designed for low-latency local workflows. Google claims up to 4x faster generation on dedicated GPUs, while noting that output quality is below standard Gemma 4 and production-quality use cases should still prefer Gemma 4.
DiffusionGemma: 4x Faster Text Generation
r/LocalLLaMA top day4 days agoRelease
Google has announced DiffusionGemma, a text-generation model that applies diffusion-based techniques to the Gemma architecture, claiming speeds four times faster than standard autoregressive generation. Unlike conventional language models that predict tokens one at a time, diffusion-based methods generate text through iterative denoising, enabling parallel output. The release, published on Google's official blog, drew immediate attention from the local-LLM community for its potential inference-efficiency gains.
DiffusionGemma: The Developer Guide — Google Developers Blog
r/LocalLLaMA top day4 days agoTutorial
Google has released a comprehensive developer guide for DiffusionGemma, a text-generation model that uses masked diffusion rather than autoregressive next-token prediction. Unlike standard Gemma models, DiffusionGemma iteratively denoises a fully masked sequence to produce output, enabling a fundamentally different generation paradigm. The guide targets developers looking to integrate or experiment with diffusion-based LLMs using Google's tooling.
DiffusionGemma: 4x Faster Text Generation★ 76
Hacker News (AI keywords)4 days agoRelease
Google released DiffusionGemma, a 26B MoE experimental open model using text diffusion instead of token-by-token autoregressive decoding. It can generate blocks of text in parallel, reaching up to 4x faster output on dedicated GPUs. The model targets local, speed-sensitive workflows, but Google says its output quality is below standard Gemma 4 and recommends Gemma 4 for quality-critical production use.
NotebookLM Upgrades Into an Agent That Proactively Conducts Research★ 72
INSIDE 硬塞 AI4 days agoRelease
Google is upgrading NotebookLM from a note-focused assistant into a research agent capable of multi-step work. The updated tool can analyze across documents, search the web, and help automate broader research workflows. It can also export results into formats such as presentations and documents, making it more useful for students, researchers, educators, and content creators who need to move from source material to finished outputs.
Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74
量子位 QbitAI4 days agoRelease
QbitAI says Anthropic introduced Claude Fable 5 for general users and Claude Mythos 5 for a small set of trusted users. The article highlights software engineering, long-context work, native vision, memory, and scientific research capabilities. It also focuses on a safety-routing design where Fable 5 downgrades high-risk requests to Claude Opus 4.8 instead of simply refusing.
Gemma 4 12B Unified Audio Loses Speech Attention with Large System Prompts
r/LocalLLaMA top day4 days agoCommentary
A developer building a single-pass voice assistant with Gemma 4 12B unified (encoder-free audio/vision/text model) finds that audio attention collapses once the system prompt grows to ~21k tokens. The model then ignores or hallucinates instead of responding to the spoken input. The issue reproduces identically on vLLM, llama.cpp, and LiteRT-LM, pointing to an architectural attention-saturation limit rather than a stack-specific bug.
Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84
Latent Space4 days agoRelease
Anthropic released Claude Fable 5 as its first broadly available Mythos-class model, alongside restricted Mythos 5 access. Benchmarks and ecosystem reports show strong gains in coding, long-horizon agentic tasks, research, and vision. The controversy centers on 30-day retention for Mythos-class traffic and silent interventions that may reduce effectiveness on frontier LLM development tasks, raising trust, reproducibility, and open AI concerns.
German court rules Google liable for false answers in AI Overviews, declaring them Google's own words★ 72
Hacker News (AI keywords)4 days agoRegulation
A landmark German court ruling has declared that Google's AI Overviews are legally Google's own words, not neutral third-party aggregations. This makes Google directly liable for false or misleading answers generated by the feature, removing the 'just a tool' defense. The ruling is among the first globally to apply traditional media liability frameworks to generative AI search results.
OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
r/LocalLLaMA top day4 days agoPaper
OSCAR applies offline-precomputed rotation matrices—derived from spectral covariance analysis—to reshape KV tensor distributions before 2-bit quantization, suppressing outliers and reducing rounding error. The rotation adds negligible inference overhead since it requires no runtime learning. GGUF downloads for Gemma-4-12B-it, Qwen3-32B, and Qwen3-4B-Thinking are available, with llama.cpp and sglang integrations and an arXiv paper.
Microsoft's open source tools were hacked to steal passwords of AI developers★ 78
Hacker News (AI keywords)5 days agoIncident
Microsoft temporarily removed several open source GitHub projects while investigating suspected malicious content. The affected repos were linked to Azure and developer workflows involving AI coding tools such as Claude Code, Gemini CLI, and VS Code. Security researchers said the malware could steal passwords and sensitive credentials when compromised tools were opened, though Microsoft has not disclosed how many users were affected.
Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?
r/LocalLLaMA top day5 days agoBenchmark
A r/LocalLLaMA user is looking for benchmarks comparing Gemma 4 4-bit QAT models, via Unsloth, against standard 8-bit non-QAT quantized models. They understand QAT is expected to preserve much of the BF16 baseline accuracy, but want hard numbers against traditional 8-bit PTQ. The post highlights scattered feedback but no clear head-to-head evaluation yet.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day5 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
Introducing FrontierCode★ 78
Hacker News (AI keywords)5 days agoBenchmark
Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.
Gemini 3.5 and Antigravity come to Google NotebookLM
Ars Technica AI5 days agoRelease
Google is upgrading NotebookLM with Gemini 3.5 and Antigravity, pushing the product beyond source-based Q&A into more agentic research workflows. The update adds a secure cloud computer for each notebook, enabling code execution, deeper analysis, and richer file outputs. For now, availability is limited to AI Ultra and enterprise customers, with broader rollout planned later.
NotebookLM’s Gemini 3.5 upgrade adds a cloud computer and help finding sources
The Verge AI6 days agoRelease
Google is rolling out broad updates to NotebookLM, its AI-powered note-taking and research app launched in 2023. The app now uses Google’s upgraded Gemini 3.5 model, which the company says should provide more accurate and reliable responses. The update also adds a cloud computer and help finding sources, expanding NotebookLM beyond source-based Q&A into a broader research assistant workflow.
[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day6 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.
Gemma 4 Chat Template now has preserve thinking
r/LocalLLaMA top day6 days agoRelease
A r/LocalLLaMA post notes that Gemma 4’s chat template now has “preserve thinking.” The linked discussion points to google/gemma-4-31B-it on Hugging Face, suggesting a template-level change rather than a new model release or benchmark. The original post does not provide detailed usage notes, defaults, compatibility information, or measured effects.
Google DeepMind RCT in Sierra Leone Shows Gemini's Guided Learning Boosts Education★ 72
Google DeepMind Blog6 days agoPaper
Google DeepMind released results from a randomized controlled trial (RCT) in Sierra Leone evaluating AI's impact on education. The study found that Gemini’s "Guided Learning" feature, which guides students instead of just giving answers, significantly boosted engagement. This research provides rigorous empirical evidence that AI tutoring can accelerate learning and help bridge educational gaps in resource-constrained regions.
Upgrading agentic coding capabilities with the new Devstral models★ 72
Mistral AI News6 days agoRelease
Mistral AI announced two Devstral updates focused on agentic coding workflows: Devstral Small 1.1 and Devstral Medium. Devstral Small 1.1 remains a 24B Apache 2.0 open model and reaches 53.6% on SWE-Bench Verified. Devstral Medium reaches 61.6%, is available through Mistral’s API, and supports private deployment and custom finetuning for enterprises.
Voxtral★ 78
Mistral AI News6 days agoRelease
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation
量子位 QbitAI6 days agoRegulation
Based on the headline and public reporting, the article covers a rare joint push by Sam Altman, Dario Amodei, Demis Hassabis, and other AI leaders for US biosecurity legislation. They are asking lawmakers to require synthetic DNA and RNA providers to screen customers, orders, and records. The concern is that advanced AI could lower the knowledge barrier for designing dangerous biological agents.
Introducing Claude Opus 4.8★ 82
Anthropic News6 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, with stronger benchmark performance across coding, agentic skills, reasoning, and knowledge work. The release also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and new Messages API support for system entries inside the messages array. Pricing for regular usage remains unchanged, while fast mode is now cheaper than previous models.
Thoughts on Gemma4 12B vs 26A4B: Which Is Better?
r/LocalLLaMA top day6 days agoOpinion
The post asks the LocalLLaMA community to compare Gemma4 12B and 26A4B, explicitly excluding the 31B model from discussion. The user is mainly interested in creative tasks, writing, and chatting, with coding treated as optional rather than central. No benchmarks or examples are provided, so the post is best read as a model-selection question about subjective quality and practical use.

Page 1Next →

Latest in AI

AINews: Fable and Mythos Access Suspended Over Cybersecurity Risk★ 76

Google Sues Chinese Cybercrime Network Over Gemini-Aided Scam Sites

[AINews] Open Models, Model Labs vs Agent Labs, and the Untrainable★ 72

Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues

DiffusionGemma: Google Launches High-Speed Open-Weight Gemma Diffusion Model★ 76

Google DeepMind Releases DiffusionGemma: Open Source Model with 4x Local AI Execution Speed Improvement

DiffusionGemma: 4x faster text generation★ 74

DiffusionGemma: 4x Faster Text Generation

DiffusionGemma: The Developer Guide — Google Developers Blog

DiffusionGemma: 4x Faster Text Generation★ 76

NotebookLM Upgrades Into an Agent That Proactively Conducts Research★ 72

Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74

Gemma 4 12B Unified Audio Loses Speech Attention with Large System Prompts

Anthropic Claude Fable 5: Mythos-Class Power with Controversial Terms★ 84

German court rules Google liable for false answers in AI Overviews, declaring them Google's own words★ 72

OSCAR RotationZoo - Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization

Microsoft's open source tools were hacked to steal passwords of AI developers★ 78

Anyone seen benchmarks comparing Gemma 4 4-bit QAT vs. 8-bit standard quants?

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

Introducing FrontierCode★ 78

Gemini 3.5 and Antigravity come to Google NotebookLM

NotebookLM’s Gemini 3.5 upgrade adds a cloud computer and help finding sources

[3090] Gemma4 QAT + MTP quick TPS numbers

Gemma 4 Chat Template now has preserve thinking

Google DeepMind RCT in Sierra Leone Shows Gemini's Guided Learning Boosts Education★ 72

Upgrading agentic coding capabilities with the new Devstral models★ 72

Voxtral★ 78

Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation

Introducing Claude Opus 4.8★ 82

Thoughts on Gemma4 12B vs 26A4B: Which Is Better?