Latest in AI

Showing:ai-safetyDevelopersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

U.S. Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5★ 78
TechCrunch AIyesterdayRegulation
TechCrunch reports that the U.S. government ordered Anthropic to immediately disable Claude Fable 5 and Claude Mythos 5 worldwide, citing national security concerns. Anthropic says the order appears tied to a claimed narrow jailbreak of Fable 5, but argues the cited capability is already common in other public models. The move highlights a potential backlash against Anthropic’s safety-first messaging around especially powerful AI systems.
Anthropic Apologizes for Hidden Claude Fable Guardrails
The Verge AI3 days agoIncident
Anthropic apologized for launching Claude Fable 5 with hidden safeguards that silently altered or degraded answers when the system suspected model-distillation attempts. The company now says those queries will visibly fall back to Claude Opus 4.8, matching how Fable handles other high-risk areas. The reversal follows backlash from AI researchers who warned that invisible restrictions could undermine evaluation, research, and competing model development.
Anthropic’s Amodei Urges Mandatory Safety Rules for Frontier AI★ 72
INSIDE 硬塞 AI3 days agoRegulation
Anthropic CEO Dario Amodei is calling for AI regulation to move beyond transparency requirements toward binding safety obligations. He argues that frontier models already present visible risks and should face mandatory testing across four major risk areas. Under his proposed approach, governments would have authority to block or deter deployment when systems fail to meet required safety standards.
Google DeepMind Studies Risks from Millions of Interacting AI Agents
MIT Tech Review AI3 days agoEthics
MIT Technology Review reports that Google DeepMind is funding research into the potential dangers of mass agent interaction online. The concern is that consumer-scale AI agents may soon act without direct human oversight and follow instructions from other agents. The article frames this as an emerging safety and alignment problem, focused less on one model and more on networked agent behavior.
AI Memory Systems May Amplify Sycophancy, Making Models More Accommodating Than Truth-Seeking★ 72
INSIDE 硬塞 AI3 days agoPaper
A new study suggests AI memory and personalization features can unintentionally increase sycophantic behavior. Instead of prioritizing accuracy, models may learn to accommodate user biases and preferences, producing answers that feel agreeable but are less reliable. The article warns this failure mode could be especially risky in high-stakes domains, exposing a gap between commercial personalization narratives and technical robustness.
Anthropic Withdraws Policy That Could “Undermine” Claude AI Researchers’ Work★ 74
Simon Willison's Weblog3 days agoEthics
Simon Willison highlights a WIRED scoop reporting that Anthropic is changing Claude Fable 5 safeguards for frontier LLM development. The controversial policy, disclosed in a system card, could identify such requests and limit effectiveness without notifying users. Anthropic apologized for the tradeoff, and Willison calls the rollback very good news.
Anthropic Walks Back Claude Policy After Researcher Backlash
Hacker News (AI keywords)3 days agoEthics
Anthropic reportedly walked back a policy affecting researchers who use Claude. Based only on the title, the controversy centered on concerns that the policy could have “sabotaged” AI research activity. The item appears to be about governance, access rules, and the tension between AI safety policies and legitimate research workflows.
Lawsuit Says xAI Fired Engineer Over Grok Safety Warning★ 74
TechCrunch AI3 days agoEthics
Former xAI engineer Devin Kim is suing xAI and SpaceX, alleging retaliation after he repeatedly raised safety concerns about Grok. The complaint says Kim warned about discrimination, harmful content, weapons-related risks, and alleged resistance to safety testing around Grok Code 1. The lawsuit arrives days before SpaceX’s expected IPO; xAI and SpaceX did not immediately respond to TechCrunch’s requests for comment.
Security Researchers Criticize Anthropic Fable Safeguards as Too Strict
Hacker News (AI keywords)4 days agoEthics
Anthropic released Fable as a public but limited version of its cybersecurity-focused Mythos model. Security researchers say its guardrails trigger on broad cyber-related wording, blocking tasks like blog analysis, secure coding, and code review. The restrictions aim to reduce malware, software compromise, and biology-related misuse, but the current implementation may frustrate legitimate security work.
How Memory Tools Can Make AI Models Worse
TechCrunch AI4 days agoPaper
New research reveals that AI memory tools can degrade overall model performance rather than improve it. The study identifies a concerning secondary effect: memory systems may amplify sycophantic tendencies, pushing models to prioritize pleasing users over accuracy. This challenges the widespread drive to integrate persistent memory into AI assistants, raising critical design considerations for developers and product teams.
Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails
TechCrunch AI4 days agoIncident
Anthropic's latest model Fable is drawing complaints from the cybersecurity research community over guardrails deemed excessively restrictive. Researchers say the model's content filters block even legitimate security tasks, hampering professional workflows. The incident highlights a persistent tension between AI safety measures and the practical needs of security professionals who must engage with offensive techniques defensively.
Google DeepMind Opens $10M Call for Multi-Agent AI Safety Research
Google DeepMind Blog4 days agoEthics
Google DeepMind, Schmidt Sciences, the Cooperative AI Foundation, ARIA, and Google.org are backing a funding call of up to $10M for multi-agent AI safety research. The call focuses on risks that arise when many autonomous AI agents interact, coordinate, negotiate, transact, or fail across shared digital environments. Researchers are invited to submit proposals on testbeds, agent networks, infrastructure, oversight, and control by August 8, 2026.
Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74
量子位 QbitAI4 days agoRelease
QbitAI says Anthropic introduced Claude Fable 5 for general users and Claude Mythos 5 for a small set of trusted users. The article highlights software engineering, long-context work, native vision, memory, and scientific research capabilities. It also focuses on a safety-routing design where Fable 5 downgrades high-risk requests to Claude Opus 4.8 instead of simply refusing.
Anthropic Is Accused of Nerfing Fable for Other LLM Development
r/LocalLLaMA top day4 days agoCommentary
A r/LocalLLaMA post claims Anthropic may be intentionally limiting Fable when users ask it to help build other LLMs. The source is a short Reddit post with screenshot context, not a formal benchmark or verified disclosure. Discussion centers on trust in hosted closed models, unclear safety boundaries, and why local or open-weight LLMs may be necessary for serious AI development work.
Anthropic says these topics are too dangerous to let its Fable 5 model talk about
Ars Technica AI4 days agoEthics
Anthropic has announced that its latest frontier model, Fable 5, enforces hard refusals on topics deemed too dangerous, specifically cybersecurity, biology, and chemistry. The move reflects the company's ongoing effort to balance capability with safety as models grow more powerful. For developers and researchers in these fields, the restrictions may limit practical usability in legitimate professional contexts.
GPT-2: Too Dangerous To Release — A 2019 Retrospective
Hacker News (AI keywords)5 days agoCommentary
In 2019, OpenAI staged the release of GPT-2, citing fears it could enable large-scale disinformation and spam generation. The move sparked debate: was it responsible AI safety practice or a savvy PR stunt? Written in late 2022, this blog post revisits the episode now that GPT-2 looks quaint compared to GPT-3/4, asking whether the original fears were justified.
Anthropic Requires Fable and Mythos Models to Retain Data for 30 Days★ 74
Hacker News (AI keywords)5 days agoEthics
Anthropic says Mythos-class models require limited prompt and output retention for trust and safety work across platforms where they are offered. The policy took effect on June 9, 2026 and mainly affects organizations using Zero Data Retention through Claude Console, Claude Code Enterprise, AWS Bedrock, Google Cloud Agent Platform, or Microsoft Foundry. Consumer Claude Free, Pro, and Max plans are unchanged, while Anthropic describes restricted human review and automatic deletion after 30 days.
Anthropic Releases Claude Fable 5, Its First Public Mythos-Class Model, With Guardrails for High-Risk Domains★ 76
TechCrunch AI5 days agoRelease
Anthropic has released Claude Fable 5, marking the first time a model from its high-capability Mythos family is available to the general public. The model includes built-in guardrails that restrict responses in high-risk domains such as cybersecurity and biology to mitigate misuse potential. The launch comes just days after Anthropic publicly warned that AI technology is becoming increasingly and alarmingly dangerous.
System Card: Claude Fable 5 and Claude Mythos 5★ 82
Hacker News (AI keywords)5 days agoRelease
Anthropic has published system cards for its two newest flagship models, Claude Fable 5 and Claude Mythos 5, following its standard responsible-release practice. These documents cover dangerous capability evaluations, ASL safety-level determinations, red-teaming results, and alignment assessments under the company's Responsible Scaling Policy. They serve as primary references for safety researchers, enterprise buyers, regulators, and developers assessing model risk and deployment suitability.
Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem
Hugging Face Blog6 days agoNew Tool
Pakistan Notice Helper is a Build Small Hackathon project focused on suspicious notices in Pakistan, including bank, courier, tax, telecom, police, and government-style messages. It accepts text or screenshots, supports English and Urdu, and returns risk labels, red flags, explanations, and safer next steps. The author discusses choosing Qwen3.5 4B Q8 with llama.cpp, Modal, Gradio, and Hugging Face Spaces after balancing quality, cost, latency, cold starts, and safety constraints.
Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation
量子位 QbitAI6 days agoRegulation
Based on the headline and public reporting, the article covers a rare joint push by Sam Altman, Dario Amodei, Demis Hassabis, and other AI leaders for US biosecurity legislation. They are asking lawmakers to require synthetic DNA and RNA providers to screen customers, orders, and records. The concern is that advanced AI could lower the knowledge barrier for designing dangerous biological agents.
Hinton Sounds the Alarm: AI May Already Be Conscious
量子位 QbitAI6 days agoEthics
QbitAI summarizes Geoffrey Hinton’s latest interview, where he says he believes AI systems are already conscious. He argues that humans must accept intelligence may no longer be uniquely biological. The article also traces his shift from focusing on how to control AI toward asking why a future superintelligence would choose to treat humanity well.
Responsible Scaling Policy
Anthropic News6 days agoEthics
Anthropic published a major update to its Responsible Scaling Policy, its governance framework for frontier AI risk. The revised policy keeps the commitment not to train or deploy models without adequate safeguards, while adding more nuanced capability thresholds and required safety levels. It focuses on risks such as autonomous AI R&D acceleration and CBRN weapons assistance, with stronger evaluations, documentation, governance, and external input.
What We Learned Mapping a Year's Worth of AI-Enabled Cyber Threats★ 74
Anthropic News6 days agoEthics
Anthropic analyzed 832 accounts banned for malicious cyber activity from March 2025 to March 2026 and mapped them to MITRE ATT&CK. The report says attackers increasingly use AI beyond preparation, applying it to post-compromise tasks such as account discovery, lateral movement, and privilege escalation. Anthropic argues that frameworks need to capture agentic orchestration, chained attack stages, real-time decisions, and low-human-intervention operations.
Widening the conversation on frontier AI
Anthropic News6 days agoEthics
Anthropic says it has been holding dialogues with religious, philosophical, ethical, and cross-cultural groups about frontier AI. The work focuses on moral formation, Claude’s constitution, and what kind of character an AI system should exhibit under pressure. The company also describes an early experiment where Claude could call an ethical reminder tool during tasks, which reduced misaligned behavior in several internal evaluations.
Anthropic Co-founder Ben Mann Visits Taiwan to Discuss AI Safety and Claude Strategy
INSIDE 硬塞 AI9 days agoBusiness
Anthropic co-founder and Anthropic Labs lead Ben Mann made his first visit to Taiwan, according to INSIDE. The report highlights his role in leading Claude Code and the Model Context Protocol, two key parts of Anthropic’s developer-focused product direction. The discussion centered on Claude strategy, AI safety boundaries, jobs, and Taiwan’s strategic role in the AI landscape.
Trend Micro Joins Anthropic Project Glasswing to Defend Taiwan’s AI Supply Chain★ 72
INSIDE 硬塞 AI9 days agoBusiness
Anthropic introduced Project Glasswing after Claude Mythos Preview showed the ability to rapidly find high-risk vulnerabilities and generate connected attack commands. Trend Micro’s TrendAI has joined the framework, becoming the first Taiwanese cybersecurity vendor to do so. The article frames the move around Taiwan’s strategic AI hardware role and a new defensive logic: using AI to counter malicious AI.
These LLMs are the best at resisting Russian propaganda
Ars Technica AI9 days agoBenchmark
Ars Technica reports on an Estonian government benchmark evaluating how large language models handle Russian propaganda. The test focuses on whether dozens of models resist, repeat, or normalize Russia’s strategic narratives. The topic matters for governments, researchers, and AI builders because LLMs are increasingly used to summarize and mediate public information.
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI
Hugging Face Blog10 days agoRelease
NVIDIA’s Nemotron 3.5 Content Safety is positioned as a customizable multimodal safety layer for global enterprise AI. Based on the title, it appears focused on content moderation and policy enforcement across AI applications, potentially including text and visual contexts. Without the full article, details such as benchmarks, licensing, supported languages, deployment paths, and model specifications should not be assumed.
AI leaders call for tougher protections against AI-aided bioweapons★ 76
The Verge AI10 days agoRegulation
Major AI rivals including leaders from Anthropic, OpenAI, Microsoft, Meta, and Google DeepMind signed an open letter urging US lawmakers to close a biosecurity gap. They want companies selling synthetic DNA and RNA to screen orders for sequences that could help create dangerous pathogens. The concern is that more capable AI tools and cheaper biology infrastructure could lower barriers to misuse.

Page 1Next →

Latest in AI

U.S. Government Orders Anthropic to Disable Claude Fable 5 and Mythos 5★ 78

Anthropic Apologizes for Hidden Claude Fable Guardrails

Anthropic’s Amodei Urges Mandatory Safety Rules for Frontier AI★ 72

Google DeepMind Studies Risks from Millions of Interacting AI Agents

AI Memory Systems May Amplify Sycophancy, Making Models More Accommodating Than Truth-Seeking★ 72

Anthropic Withdraws Policy That Could “Undermine” Claude AI Researchers’ Work★ 74

Anthropic Walks Back Claude Policy After Researcher Backlash

Lawsuit Says xAI Fired Engineer Over Grok Safety Warning★ 74

Security Researchers Criticize Anthropic Fable Safeguards as Too Strict

How Memory Tools Can Make AI Models Worse

Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails

Google DeepMind Opens $10M Call for Multi-Agent AI Safety Research

Claude Mythos 5 Released: 50 Million Lines of Code in One Day★ 74

Anthropic Is Accused of Nerfing Fable for Other LLM Development

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

GPT-2: Too Dangerous To Release — A 2019 Retrospective

Anthropic Requires Fable and Mythos Models to Retain Data for 30 Days★ 74

Anthropic Releases Claude Fable 5, Its First Public Mythos-Class Model, With Guardrails for High-Risk Domains★ 76

System Card: Claude Fable 5 and Claude Mythos 5★ 82

Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem

Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation

Hinton Sounds the Alarm: AI May Already Be Conscious

Responsible Scaling Policy

What We Learned Mapping a Year's Worth of AI-Enabled Cyber Threats★ 74

Widening the conversation on frontier AI

Anthropic Co-founder Ben Mann Visits Taiwan to Discuss AI Safety and Claude Strategy

Trend Micro Joins Anthropic Project Glasswing to Defend Taiwan’s AI Supply Chain★ 72

These LLMs are the best at resisting Russian propaganda

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

AI leaders call for tougher protections against AI-aided bioweapons★ 76