Anthropic NewsJun 8, 2026, 9:02 AM

Responsible Scaling Policy

Anthropic updated its Responsible Scaling Policy for managing catastrophic risks from frontier AI systems.

Anthropic published a major update to its Responsible Scaling Policy, its governance framework for frontier AI risk. The revised policy keeps the commitment not to train or deploy models without adequate safeguards, while adding more nuanced capability thresholds and required safety levels. It focuses on risks such as autonomous AI R&D acceleration and CBRN weapons assistance, with stronger evaluations, documentation, governance, and external input.

Anthropic announced an update to its Responsible Scaling Policy (RSP), the internal governance framework the company uses to manage the potential catastrophic risks of frontier AI systems. The new version of the policy continues its core commitment: it will not train or deploy relevant models before implementing safety and security measures sufficient to reduce risk to an acceptable level. The focus of this update is to make the original risk-management approach more granular and flexible, using "capability thresholds" together with "required safeguards" to decide when to raise safety standards. Anthropic currently states that all of its models operate under the ASL-2 standard, representing the safety practices commonly seen in the industry today; if a model reaches certain high-risk capabilities, it will trigger higher-tier safeguard requirements. The new RSP specifically lists two categories of capability thresholds: first, a model being able to autonomously complete complex AI research and development tasks that originally require human expertise, which could accelerate AI progress and let risk management fall behind; second, a model being able to substantially help people with basic technical backgrounds produce or deploy chemical, biological, radiological, or nuclear (CBRN) weapon threats. For CBRN risks, Anthropic will require ASL-3 level deployment and security safeguards; for autonomous AI R&D capabilities, it may require ASL-4 or higher standards. The policy also adds routine capability evaluations, safeguard-effectiveness assessments, documented decision processes, internal stress testing, and external expert feedback. Anthropic also reviews its experience executing the older RSP over the past year, acknowledging that there were a few procedural gaps—for example, some evaluations were completed later than scheduled, or the way evaluations were recorded was not clear enough—but the company judged that these cases posed very low risk to model safety. Overall, this announcement is not a new model or product launch but Anthropic's attempt to institutionalize the risk governance of frontier AI and to demonstrate to other AI companies a publicly inspectable, iterable safety policy framework.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Anthropic News →

Summaries are AI-generated; the original article is authoritative.