Anthropic Reveals Claude's 3 Layers of Defense
Anthropic’s agent safety guide: blast radius, VM isolation, OS sandboxes, and three lines of defense. Essential 2026 engineering insights.
Anthropic’s engineering blog posted a substantial update last week: “How we contain Claude across products”. It directly addresses a question I’ve long been curious about: As Claude-style agents become increasingly capable, how exactly do they prevent them from wrecking users’ machines?
After reading it, my biggest takeaway is this — agent safety is not a prompt-engineering problem. The real heavy lifting happens at the system level: files, networks, processes, credentials, and other hard boundaries.
Here’s a breakdown of the core arguments:
Why “how far the damage can spread when it goes wrong” matters more than “will it go wrong”
The article opens with a fundamental shift in thinking.
In the past, AI safety meant asking “Will the model do something wrong?” Now that agents are more powerful and have access to more tools, the question has become: “When it does something wrong, how bad can the consequences actually get?”



