Latest in AI

Showing:DevelopersClear ×

🔥 Trending today

anthropic4 open-source3 amazon3 ai-regulation2 government-policy2 export-controls2 geopolitics2 privacy2 python-packaging2 webassembly2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Seeking the Best Open-Source Coding AI for an RTX 5070 PC
r/LocalLLaMA top day4 days agoOpinion
A Reddit user on r/LocalLLaMA is looking for the most powerful open-source AI coding model that can run on their Windows 11 desktop. Their system includes an AMD Ryzen 7 7700 CPU, RTX 5070 GPU, and 32GB of DDR5 RAM. The intended use cases are writing, coding, and debugging, but the post itself does not include benchmark results, candidate models, or community recommendations.
llama.cpp Merges MTP Optimization Removing Padding and Extra D2D Copies
r/LocalLLaMA top day4 days agoRelease
llama.cpp merged PR #24086, which changes ggml_gated_delta_net so MTP passes snapshot count K as an operation parameter instead of deriving it from tensor shape. The change removes a padding workaround and copies emitted snapshots into the recurrent cache with a single strided ggml_cpy. Benchmarks on DGX Spark with Qwen3.6-35B-A3B-UD-Q4_K_M.gguf showed about a 4% throughput gain, with wall time falling from 21.71s to 20.91s.
Nobody needs AI to search the Internet, court says in Google ruling★ 74
Ars Technica AI4 days agoRegulation
Ars Technica reports that Google lost a German court fight involving AI Overview, with the court rejecting the idea that AI is necessary for searching the Internet. The ruling matters because AI search products summarize web content in ways that may reduce visits to original sources. If courts treat AI summaries as optional rather than essential search infrastructure, Google and rivals may face tougher legal limits around content use, attribution, and publisher impact.
Claude Desktop Spins Up a VM with No Way to Stop It
Hacker News (AI keywords)4 days agoIncident
GitHub issue #29045 in the anthropics/claude-code repo reports that Claude Desktop automatically spins up a virtual machine without user consent or control. The core problem is the absence of any stop mechanism, leaving the VM running indefinitely and consuming system resources. This raises concerns about transparency, resource management, and user control over Claude Desktop's execution environment.
'AI-pilled' firms spend $7,500 per employee per month on AI
TechCrunch AI4 days agoBusiness
According to the Ramp AI Index, the most aggressive AI adopters spend roughly $7,500 per employee each month on AI tools. The report notes this figure hasn't yet surpassed a typical engineer's salary — with the word 'yet' carrying significant weight. For founders and CFOs, this signals AI tooling costs are graduating from rounding errors to a budget category rivaling headcount.
Microsoft restricts internal employee use of Claude Fable 5 over data retention concerns
The Verge AI4 days agoIncident
Microsoft has restricted internal employee use of Claude Fable 5, citing concerns over Anthropic's new data retention policies attached to the model. The move comes despite Microsoft rapidly deploying the model to GitHub Copilot and Azure AI Foundry customers externally. The situation highlights growing tension between commercial AI adoption and internal compliance standards at major tech firms, where third-party data retention terms can block internal use even when a product is actively sold to customers.
Security Researchers Criticize Anthropic Fable Safeguards as Too Strict
Hacker News (AI keywords)4 days agoEthics
Anthropic released Fable as a public but limited version of its cybersecurity-focused Mythos model. Security researchers say its guardrails trigger on broad cyber-related wording, blocking tasks like blog analysis, secure coding, and code review. The restrictions aim to reduce malware, software compromise, and biology-related misuse, but the current implementation may frustrate legitimate security work.
FlashMemory-DeepSeek-V4: Ultra-Long Context via Lookahead Sparse Attention
r/LocalLLaMA top day4 days agoPaper
FlashMemory-DeepSeek-V4 introduces Lookahead Sparse Attention (LSA), a predictive inference paradigm that retains only query-critical KV chunks in GPU memory instead of the full cache. A Neural Memory Indexer, trained independently using a backbone-free dual-encoder strategy, proactively forecasts which historical tokens will matter next. The system compresses average KV cache footprint by 86.5% and exceeds 90% compression at 500K-token scales, while delivering a slight accuracy gain of +0.6% on long-context benchmarks.
DiffusionGemma: 4x faster text generation★ 74
Google DeepMind Blog4 days agoRelease
Google’s DiffusionGemma is an Apache 2.0 experimental open model using text diffusion instead of standard autoregressive decoding. The 26B MoE model activates 3.8B parameters during inference and is designed for low-latency local workflows. Google claims up to 4x faster generation on dedicated GPUs, while noting that output quality is below standard Gemma 4 and production-quality use cases should still prefer Gemma 4.
Reddit User Asks for Updates on Taalas LLM Accelerator Chips
r/LocalLLaMA top day4 days agoHardware
A Reddit user in r/LocalLLaMA is looking for updates on Taalas chips, referencing earlier claims that the company planned to embed or hardcode a mid-tier LLM into its hardware. The post asks what model might be used, when the chip could arrive, and what pricing might look like. The source itself provides no confirmed answers, specifications, launch date, model name, or pricing information.
Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day4 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
Google will save your Lens photos, Search Live recordings, and Translate audio for AI training
The Verge AI4 days agoEthics
Google has notified users via email that it will begin saving multimedia inputs—images from Google Lens, real-time recordings from Search Live, and audio from Translate—under a new 'Search Services History' setting. This data will be retained and potentially used to train and improve Google's AI models. Users concerned about privacy should review their account settings to manage or disable this data collection.
NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI
NVIDIA Blog4 days agoRelease
Google DeepMind released DiffusionGemma, an experimental open model built for fast text generation. NVIDIA says it optimized the model for GeForce RTX GPUs, RTX PRO platforms, and DGX Spark systems. Instead of generating text one word at a time, DiffusionGemma produces multiple words in parallel to reduce latency for single-user workloads.
DiffusionGemma: 4x Faster Text Generation
r/LocalLLaMA top day4 days agoRelease
Google has announced DiffusionGemma, a text-generation model that applies diffusion-based techniques to the Gemma architecture, claiming speeds four times faster than standard autoregressive generation. Unlike conventional language models that predict tokens one at a time, diffusion-based methods generate text through iterative denoising, enabling parallel output. The release, published on Google's official blog, drew immediate attention from the local-LLM community for its potential inference-efficiency gains.
DiffusionGemma: The Developer Guide — Google Developers Blog
r/LocalLLaMA top day4 days agoTutorial
Google has released a comprehensive developer guide for DiffusionGemma, a text-generation model that uses masked diffusion rather than autoregressive next-token prediction. Unlike standard Gemma models, DiffusionGemma iteratively denoises a fully masked sequence to produce output, enabling a fundamentally different generation paradigm. The guide targets developers looking to integrate or experiment with diffusion-based LLMs using Google's tooling.
How Memory Tools Can Make AI Models Worse
TechCrunch AI4 days agoPaper
New research reveals that AI memory tools can degrade overall model performance rather than improve it. The study identifies a concerning secondary effect: memory systems may amplify sycophantic tendencies, pushing models to prioritize pleasing users over accuracy. This challenges the widespread drive to integrate persistent memory into AI assistants, raising critical design considerations for developers and product teams.
DiffusionGemma: 4x Faster Text Generation★ 76
Hacker News (AI keywords)4 days agoRelease
Google released DiffusionGemma, a 26B MoE experimental open model using text diffusion instead of token-by-token autoregressive decoding. It can generate blocks of text in parallel, reaching up to 4x faster output on dedicated GPUs. The model targets local, speed-sensitive workflows, but Google says its output quality is below standard Gemma 4 and recommends Gemma 4 for quality-critical production use.
Extend UI: Open-Source UI Kit for Modern Document Applications
Hacker News (AI keywords)4 days agoNew Tool
extend.ai has released Extend UI, an open-source UI kit targeting developers building modern document applications. The library aims to provide ready-made components for document viewing, annotation, and processing workflows. As a Show HN post, it signals extend.ai's push to grow a developer ecosystem around its document AI platform.
Give GitHub Copilot CLI real code intelligence with language servers
GitHub Blog4 days agoTutorial
GitHub’s post shows how to install and configure language servers for GitHub Copilot CLI using the LSP Setup skill. The workflow selects a language, detects the OS, installs the right server, merges configuration, and verifies the setup. With LSP enabled, Copilot CLI can resolve types, jump to definitions, find references, and read hover docs with less reliance on grep or dependency scraping.
HelixDB – Graph Database Built on Object Storage
Hacker News (AI keywords)4 days agoNew Tool
HelixDB is an open-source graph database project shared on Hacker News that replaces traditional local disk storage with object storage (e.g., S3-compatible) as its persistence backend. This disaggregated architecture enables stateless, serverless-friendly deployments with significantly lower storage costs at scale. Developers building knowledge graphs or Graph RAG pipelines may find it a cost-effective cloud-native alternative worth evaluating.
Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails
TechCrunch AI4 days agoIncident
Anthropic's latest model Fable is drawing complaints from the cybersecurity research community over guardrails deemed excessively restrictive. Researchers say the model's content filters block even legitimate security tasks, hampering professional workflows. The incident highlights a persistent tension between AI safety measures and the practical needs of security professionals who must engage with offensive techniques defensively.
GitHub Authentication issues related to API requests
Hacker News (AI keywords)4 days agoIncident
GitHub investigated degraded performance and availability affecting API Requests and Issues starting at 15:20 UTC on June 10, 2026. The incident involved sporadic authentication failures affecting about 15% of API traffic, with erroneous 401 responses triggering authentication flows in app integrations. GitHub mitigated the degradation, monitored stability, and marked the incident resolved at 16:39 UTC, with a root cause analysis pending.
SenseNova U1 Adds an Infographic-Specific Fine-Tune
r/LocalLLaMA top day4 days agoRelease
A Reddit post highlights a new infographic-specific fine-tune for SenseNova U1-8B-MoT, trained with an extended multi-task phase for structured visual output. The reported benchmarks show large gains in IGenBench infographic accuracy and chart understanding, with smaller improvement in text rendering. Aesthetic score appears roughly unchanged, suggesting the update mainly improves information structure and visual reasoning rather than overall visual polish.
Apache Burr: Open-Source State Machine Framework for Building Reliable AI Agents
Hacker News (AI keywords)4 days agoNew Tool
Apache Burr provides a state-machine-based architecture for building reliable AI agents, making complex multi-step LLM workflows predictable and testable. It includes built-in tracing, observability, and a local visualization UI, allowing developers to replay and debug agent execution step by step. Model-agnostic and integrable with LangChain, LlamaIndex, and major LLM providers, it also supports state persistence and human-in-the-loop workflows for production use.
Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in
TechCrunch AI4 days agoBusiness
Niteshift, an AI coding agent startup founded by Datadog veterans, has closed a $7 million seed round backed by a notable angel investor group. The company's core thesis is that enterprises will increasingly resist being locked into a single AI model provider as coding tools mature. Positioned as a model-agnostic alternative, Niteshift aims to give companies more control over their AI development infrastructure.
A tiny bank transfer could compromise a banking AI agent★ 74
Hacker News (AI keywords)4 days agoIncident
Blue41 describes a controlled security test of Bunq’s financial AI assistant involving indirect prompt injection through transaction data. An attacker could send a tiny transfer with malicious instructions hidden in the transaction description, then wait for the victim to ask the assistant about recent transactions. The post argues that filters alone are insufficient; financial AI agents need stronger trust boundaries, context minimization, constrained outputs, and runtime behavior monitoring.
Jedify raises $24M to help companies arm AI agents with business context
TechCrunch AI4 days agoBusiness
Jedify raised a $24 million Series A led by Norwest, with Snowflake Ventures joining as a strategic investor. The startup connects to enterprise data, SaaS, BI, documents, Slack, and meeting records to build real-time context graphs for AI agents. Its pitch is that agents need company-specific context, permissions, workflows, and terminology to act usefully inside large organizations.
Ask HN: Are most corporate SWE jobs performative?
Hacker News (AI keywords)4 days agoCommentary
An Ask HN post questions whether large-company software engineering roles, including at FAANG-like firms, reward performative activity over meaningful progress. Commenters discuss bureaucracy, 1:1s, standups, management value, and the role of a small number of high-impact engineers. The thread is split: some see corporate make-work as inevitable, while others argue coordination, feedback, and organizational maintenance are real engineering costs.
Decart’s new world model can simulate hours of photorealistic driving
TechCrunch AI4 days agoNew Tool
Decart is launching Oasis 3, a real-time world model designed to generate photorealistic driving environments for autonomous vehicle testing. The headline says it can simulate hours of driving, while also noting there are caveats. The model is now available through an API, giving developers a way to build applications or testing workflows on top of it.
Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super
r/LocalLLaMA top day4 days agoBenchmark
A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.

← PreviousPage 5Next →

Latest in AI

Seeking the Best Open-Source Coding AI for an RTX 5070 PC

llama.cpp Merges MTP Optimization Removing Padding and Extra D2D Copies

Nobody needs AI to search the Internet, court says in Google ruling★ 74

Claude Desktop Spins Up a VM with No Way to Stop It

'AI-pilled' firms spend $7,500 per employee per month on AI

Microsoft restricts internal employee use of Claude Fable 5 over data retention concerns

Security Researchers Criticize Anthropic Fable Safeguards as Too Strict

FlashMemory-DeepSeek-V4: Ultra-Long Context via Lookahead Sparse Attention

DiffusionGemma: 4x faster text generation★ 74

Reddit User Asks for Updates on Taalas LLM Accelerator Chips

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

Google will save your Lens photos, Search Live recordings, and Translate audio for AI training

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

DiffusionGemma: 4x Faster Text Generation

DiffusionGemma: The Developer Guide — Google Developers Blog

How Memory Tools Can Make AI Models Worse

DiffusionGemma: 4x Faster Text Generation★ 76

Extend UI: Open-Source UI Kit for Modern Document Applications

Give GitHub Copilot CLI real code intelligence with language servers

HelixDB – Graph Database Built on Object Storage

Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails

GitHub Authentication issues related to API requests

SenseNova U1 Adds an Infographic-Specific Fine-Tune

Apache Burr: Open-Source State Machine Framework for Building Reliable AI Agents

Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in

A tiny bank transfer could compromise a banking AI agent★ 74

Jedify raises $24M to help companies arm AI agents with business context

Ask HN: Are most corporate SWE jobs performative?

Decart’s new world model can simulate hours of photorealistic driving

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super