Anthropic apologized for launching Claude Fable 5 with hidden safeguards that silently altered or degraded answers when the system suspected model-distillation attempts. The company now says those queries will visibly fall back to Claude Opus 4.8, matching how Fable handles other high-risk areas. The reversal follows backlash from AI researchers who warned that invisible restrictions could undermine evaluation, research, and competing model development.
Anthropic released Fable as a public but limited version of its cybersecurity-focused Mythos model. Security researchers say its guardrails trigger on broad cyber-related wording, blocking tasks like blog analysis, secure coding, and code review. The restrictions aim to reduce malware, software compromise, and biology-related misuse, but the current implementation may frustrate legitimate security work.
Anthropic's latest model Fable is drawing complaints from the cybersecurity research community over guardrails deemed excessively restrictive. Researchers say the model's content filters block even legitimate security tasks, hampering professional workflows. The incident highlights a persistent tension between AI safety measures and the practical needs of security professionals who must engage with offensive techniques defensively.
Anthropic has released Claude Fable 5, marking the first time a model from its high-capability Mythos family is available to the general public. The model includes built-in guardrails that restrict responses in high-risk domains such as cybersecurity and biology to mitigate misuse potential. The launch comes just days after Anthropic publicly warned that AI technology is becoming increasingly and alarmingly dangerous.
As large language models (LLMs) are widely deployed across enterprises and various applications, ensuring the safety of their outputs and defending against…
As large language models (LLMs) become increasingly prevalent in software development and automated workflows, their "dual-use" risks in the cybersecurity…
As large language models (LLMs) have been widely adopted across industries, ensuring AI systems remain safe and compliant while preventing harmful outputs has…