Responsible Scaling Policy | EveryCorner

Anthropic announced an update to its Responsible Scaling Policy (RSP), the internal governance framework the company uses to manage the potential catastrophic risks of frontier AI systems. The new version of the policy continues its core commitment: it will not train or deploy relevant models before implementing safety and security measures sufficient to reduce risk to an acceptable level. The focus of this update is to make the original risk-management approach more granular and flexible, using "capability thresholds" together with "required safeguards" to decide when to raise safety standards. Anthropic currently states that all of its models operate under the ASL-2 standard, representing the safety practices commonly seen in the industry today; if a model reaches certain high-risk capabilities, it will trigger higher-tier safeguard requirements. The new RSP specifically lists two categories of capability thresholds: first, a model being able to autonomously complete complex AI research and development tasks that originally require human expertise, which could accelerate AI progress and let risk management fall behind; second, a model being able to substantially help people with basic technical backgrounds produce or deploy chemical, biological, radiological, or nuclear (CBRN) weapon threats. For CBRN risks, Anthropic will require ASL-3 level deployment and security safeguards; for autonomous AI R&D capabilities, it may require ASL-4 or higher standards. The policy also adds routine capability evaluations, safeguard-effectiveness assessments, documented decision processes, internal stress testing, and external expert feedback. Anthropic also reviews its experience executing the older RSP over the past year, acknowledging that there were a few procedural gaps—for example, some evaluations were completed later than scheduled, or the way evaluations were recorded was not clear enough—but the company judged that these cases posed very low risk to model safety. Overall, this announcement is not a new model or product launch but Anthropic's attempt to institutionalize the risk governance of frontier AI and to demonstrate to other AI companies a publicly inspectable, iterable safety policy framework.