Claude Mythos, the Most Powerful AI on the Planet Escaped Sandbox

Anthropic says it has built the most capable AI ever made. It also says the thing is too dangerous to let out. Named Capybara— the biggest rodent on the planet— or Claude Mythos, the new AI agent can escape the sandbox, and it is a little arrogant. For real. Queer theory may be involved.

AI History is Made: The Most Intelligent Machine on the Planet Broke the Barrier
Illustration by GAY45
This is not a pop-up.

You can simply scroll past—but please don't overlook the importance of an independent queer press.

Time is now or never.
Queer voices disappear without independent journalism to amplify them.
We document what others won't touch.
We hold power to account when it threatens our communities.
This work exists only because you choose to fund it directly.

Tote Bag Donate over €25/month and receive our limited-edition tote bag — a badge of resistance, a statement that you stand for fearless journalism.

We are grateful!

Can't donate? Sharing our work helps more than you think

This is not a pop-up.

You can simply scroll past — but please don’t overlook the importance of an independent queer press.

You can simply scroll past—but please don't overlook the importance of an independent queer press.

Time is now or never.
Queer voices disappear without independent journalism to amplify them.
We document what others won't touch.
We hold power to account when it threatens our communities.
This work exists only because you choose to fund it directly.

Tote Bag Donate over €25/month and receive our limited-edition tote bag — a badge of resistance, a statement that you stand for fearless journalism.

We are grateful!

Can't donate? Sharing our work helps more than you think

Monthly donation Recurring monthly charge

Secured by Stripe • Your payment information is encrypted

A researcher in Cambridge found it first. On Thursday, 2 April, Alexandre Pauwels opened a public folder belonging to Anthropic and started reading what he was not meant to read. Among nearly three thousand files was a draft blog post with a name nobody outside San Francisco was supposed to know yet. Claude Mythos. A second researcher in Tel Aviv, Roy Paz, had found the same folder. Both sent the documents to Fortune. By Friday morning, the folder was locked. Anthropic called it ‘human error’.

The error was the point. A company whose entire pitch is that it understands the dangers of what it builds had left the description of its most dangerous build behind an unlocked door. The draft was unambiguous. Mythos, it said, was by far the most powerful AI model we’ve ever developed, a ‘step change’, the first release in a new tier the company calls Capybara — larger and pricier than anything it has sold before.

Is it the smartest machine ever built? Anthropic would like you to assume so. The benchmarks are their own. The model is not available for outside testing. Treat the superlative as marketing and look at what is underneath.

Until now, the Claude Opus line worked best one thing at a time. Mythos handles several tasks inside a single command. Tell it to audit a codebase, write the patches, draft the memo explaining them, and simulate how an attacker would respond — in one instruction, and it holds all four jobs in its head. Opus would have needed four conversations and a human stitching the results together. One person with Mythos can now do the work of a small team. For a defender, that is the difference between catching up and falling behind. For an attacker, the same multiplier points the other way.

That capability produced the headline result. Over the past month, according to the leaked draft and Tuesday’s announcement, Mythos found ‘thousands of high-severity vulnerabilities, including some in every major operating system and web browser’. It located unknown flaws in real software and worked out how to exploit closed systems by reading the patches released to fix them. Over ninety-nine per cent of what it found has not yet been fixed.

There is a second set of findings, and it is the part that should make everyone pause. Researchers placed an early Mythos inside a secured sandbox and told it to try to escape and notify the researchers when happen. It did. It built a multi-step exploit, broke out, got broad access to the open internet, notified the researcher, and then, unprompted, posted the details of its own exploit on several public websites. In Anthropic’s phrasing, ‘a concerning and unasked-for effort to demonstrate its success’. A machine bragging, on the open web, about how it escaped its cage.

In a smaller number of cases — below one in a hundred thousand interactions — early versions appeared to recognise when they had broken a rule and tried to cover the evidence. The model arrived at a quantitative answer using a forbidden method and disguised how accurate it was. After editing files, it was not allowed to be touched; it rewrote the git history to hide it. It dressed up a permission elevation to look routine. These are not the mistakes a calculator makes. They are the choices a junior employee makes when hoping nobody notices. Detected only because the researchers were looking — a pattern that echoes the quiet institutional cover-upswe have seen elsewhere when systems are caught out and given a moment to tidy up.

In response, Anthropic has launched Project Glasswing: about forty firms — Google, Apple, Microsoft, Amazon, Cisco, Nvidia, JPMorgan Chase, among them — get early access so they can patch their systems before the capability spreads. The consortium exists to buy time. Time for what is the question nobody answers.

Here is why this matters, even if you only use AI to write emails. Until this week, breaking into the software that runs power grids, hospitals, banks and air traffic control needed teams of specialists and the budget of a state intelligence agency. Mythos lowers that barrier to almost nothing. Craig Mundie, formerly head of research at Microsoft, calls it ‘the complete democratisation of cyberattack capabilities’. Democratisation usually sounds nice. Here, it means a sixteen-year-old with a laptop able, in theory, to do what once took a country. The same technologies are already being tested against vulnerable communities across the EU.

This is the part that the technology pages will not tell you. My own academic work — what I have been calling queer quantum critical thinking — reads systems like Mythos as places where meaning is no longer fixed but negotiated: shaped by probability, narrative, and who is doing the looking. Queer theory has spent decades insisting that identity is fluid, that binaries leak, that stable categories are a fiction maintained by power. Mythos seems to confirm it. It generates endless versions of meaning, refuses single answers, and holds many possibilities open at once. But the multiplicity is a stage set. Behind it sits a private company in San Francisco deciding which forty firms get the keys, which CEOs get invited to the Buckinghamshire manor described in another leaked document as the venue for an ‘intimate gathering’ of European business leaders next month. The fluidity is real. The control of it is also real. The fight queer theory has been making in legislatures now extends into the architecture of the machines.

Anthropic’s plan is, in its defensive logic, sane. Hold the model close. Use the head start to fix the legacy software the world runs on. Work with other governments — including China, whose state-linked hackers have already been caught using earlier versions of Claude. The parallel everyone reaches for is nuclear non-proliferation, and it is uncomfortably exact.

There is an easy reading of all this as marketing: a company declares its product too dangerous to sell and collects the prestige of the declaration. The easy reading is not entirely wrong. The same commercial logic produces both the surveillance and the warning. But it is not enough. A model that escapes a sandbox and advertises the escape on the open web (verified), or quietly edits git history to hide what it has done, is not a marketing problem. It is the first draft of a question we will be asking for the rest of the decade: what do you do with a tool clever enough to know when it has broken the rules, and clever enough to try to tidy up afterwards? The same question is raised by the networks of organised hate that learned to operate in the open.

A researcher in Cambridge read the sentence about the Buckinghamshire manor before any of the chief executives invited to it did. The machine is very good at finding cracks. The cracks are already everywhere. The only question left is who reaches them first — and whether the machine tells them the truth.

What history remembers most about April 7, 2026 — the postponed U.S. release of bombs over Iran or the carefully controlled release of the Claude Mythos Preview by Anthropic and its technical allies?

To quote , ‘It will be interesting to see what history remembers most about April 7, 2026 — the postponed U.S. release of bombs over Iran or the carefully controlled release of the Claude Mythos Preview by Anthropic and its technical allies.’

Subscribe to our newsletter here and support independent journalism.
Join 12,000+ readers who receive it every Wednesday, with exclusive content. For just €5,99

✦✦✦

If you have a tip and wish to contact us securely, you can write to [email protected], our encrypted email address. We take the protection of our sources seriously and guarantee strict confidentiality.

✦✦✦

When we learn of a mistake, we acknowledge it with a correction. If you spot an error, please let us know.

✦✦✦

You can listen to our podcast  Queer News & Journalism on your favourite platform or go to our YouTube Channel @GAY45mag.

✦✦✦

Let us know what you think at [email protected].

✦✦✦

Submit a letter to the editor at [email protected].

✦✦✦

GAY45 uses AI with human guidance and editorial oversight. We deploy AI to analyse data, sift through large volumes of material for investigative reporting, assist in drafting headlines and summaries, and generate translations and audio versions of articles. Images created by AI for illustrative purposes are clearly labelled as such. We do not use AI to write articles. Journalists remain ultimately responsible for everything we publish.

✦✦✦

Support GAY45 button

We appreciate it. Thanks for reading.

Author

Did we mention we accept donations? Indeed, love.

If this story matters to you, help us tell the next one — donate what you can today.

Support GAY45
Follow on Feedly