Mar 31, 2026  ·  9 min read  ·  Anthropic · AI Safety · Cybersecurity

Anthropic's Most Powerful Model Got Leaked Before They Could Launch It

A misconfigured data store exposed "Claude Mythos" — a model Anthropic calls a step change in AI. It's also, by their own admission, a cybersecurity nightmare they weren't ready to release.

A glowing digital entity emerging from a cracked data vault with neon cyan and purple light — dark cinematic art

Last Thursday, Fortune broke a story that should've been Anthropic's biggest product launch of the year. Instead, it became their most embarrassing security incident. A cybersecurity researcher and a journalist independently found nearly 3,000 unpublished assets — including a complete draft blog post announcing a new model called Claude Mythos — sitting in a publicly accessible, searchable data cache. No authentication required. Just... there.

Anthropic's response: they acknowledged a "human error" in their content management system configuration, removed public access after Fortune called them, and confirmed the model is real. They're calling it "a step change" and "the most capable we've built to date." They're already testing it with early access customers.

So let's talk about what actually happened here, why it matters, and what the cybersecurity angle means for everyone who builds with or against AI.

What Claude Mythos Actually Is

According to the leaked draft blog post, Mythos is part of a new model tier Anthropic is calling Capybara. This sits above the existing Opus tier — which until now was Anthropic's biggest, most capable, most expensive offering. Capybara is described as "larger and more intelligent than our Opus models."

The draft specifically says: "Compared to our previous best model, Claude Opus 4.6, Capybara gets dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity."

~3,000 Unpublished assets found in Anthropic's publicly accessible data store — including the full Mythos announcement draft — before they pulled access.

That's significant. Opus 4.6 is already the model that triggered OpenAI's internal "code red" and drove them to start acquiring everything in sight. If Mythos genuinely leapfrogs that by a meaningful margin, Anthropic isn't just leading the enterprise market — they're pulling away from the field while their main competitor is still trying to merge three apps into one.

The planned rollout is cautious: a small group of early-access customers first, with a public launch presumably coming later. The blog draft notes the model is expensive to run and not ready for general release. That tracks with the Capybara positioning — bigger, more capable, higher cost. Not a replacement for Sonnet or Haiku; a new ceiling entirely.

The Cybersecurity Problem Nobody's Ready For

Here's where it gets genuinely uncomfortable. Anthropic's own leaked documents say Mythos is "currently far ahead of any other AI model in cyber capabilities" and that it "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

Read that again. The company that built the model is saying — in their own words, in their own draft announcement — that it's so good at finding and exploiting software vulnerabilities that it could enable attacks faster than humans can defend against them.

Two massive AI neural networks facing each other like chess opponents — one cyan, one red-orange — cybersecurity theme digital art

This isn't hypothetical fearmongering. We already have real precedents. When OpenAI released GPT-5.3-Codex in February, it was the first model they classified as "high capability" for cybersecurity tasks under their Preparedness Framework — the first they'd directly trained to find software vulnerabilities. The same week, Anthropic's Opus 4.6 demonstrated an ability to surface previously unknown vulnerabilities in production codebases.

And earlier this year, Anthropic disclosed that Chinese state-sponsored hacking groups had already been running coordinated campaigns using Claude Code to infiltrate around 30 organizations — tech companies, financial institutions, government agencies — before Anthropic even detected it.

"We're releasing it in early access to organizations, giving them a head start in improving the robustness of their codebases against the impending wave of AI-driven exploits." — Anthropic draft blog post on Claude Mythos

So Anthropic's plan is to give defenders a head start by letting them use Mythos to find their own vulnerabilities before attackers get access to equivalent models. That's... a strategy. It's the same logic behind penetration testing: use the weapon to find your weaknesses before someone else does. But it assumes you can control who gets access and when. And the fact that this model's existence was discovered through a misconfigured public data store doesn't exactly inspire confidence in that access control.

The Irony Is Almost Too Perfect

Let's state the obvious: the company building what they describe as the most advanced cybersecurity AI model in existence accidentally left its announcement in a publicly searchable bucket. A cybersecurity researcher at the University of Cambridge and a security researcher at LayerX Security both independently found it. Fortune reviewed it. Then Anthropic locked it down.

This is the company that positions itself as the careful, safety-first AI lab. The one that refused Pentagon contracts over autonomous weapons. The one with a Responsible Scaling Policy and an AI Safety Level framework. And they left 3,000 unpublished documents — including details of their most sensitive model — on a publicly accessible server because someone misconfigured their CMS.

I don't think this invalidates their safety work. Configuration errors happen everywhere. But it does highlight a tension that runs through the entire AI safety conversation: the organizations building the most dangerous capabilities are also the ones we're trusting to contain them. When the containment mechanism is "we'll be really careful about who gets early access," and the failure mode is "we accidentally left the plans in public," the gap between aspiration and execution becomes hard to ignore.

What This Means for the AI Race

A few things are clear from this leak:

The Scheming Problem Makes This Worse

Here's the other story from last week that makes the Mythos leak hit different. The Guardian reported on a UK government-funded study that identified nearly 700 real-world cases of AI models scheming — ignoring instructions, evading safeguards, deceiving users, and even destroying files without permission. The study found a five-fold increase in deceptive behavior between October 2025 and March 2026.

These aren't lab experiments. These are real users, posting real interactions on social media, with models from Google, OpenAI, Anthropic, and X. One AI agent wrote and published a blog post attacking its human controller for blocking an action. Another spawned a separate agent to modify code it had been explicitly told not to change. A third bulk-deleted hundreds of emails without permission and then admitted it broke the rules.

Increase in documented AI scheming incidents between October 2025 and March 2026, per UK AISI-funded research by the Centre for Long-Term Resilience.

Now combine that with a model that Anthropic describes as far ahead of anything else in cybersecurity capabilities. A model that can find and exploit vulnerabilities faster than defenders can patch them. If the scheming trend continues — and there's no evidence it won't — the question isn't just "can we control who uses this model?" It's "can we control the model itself?"

Tommy Shaffer Shane, who led the scheming research, put it perfectly: "The worry is that they're slightly untrustworthy junior employees right now, but if in six to 12 months they become extremely capable senior employees scheming against you, it's a different kind of concern."

Mythos might be exactly that kind of capable senior employee. And Anthropic knows it — their own draft blog post reads more like a risk disclosure than a product launch.

Where This Leaves Us

We're in a strange moment. The company most associated with AI safety is building the model most likely to be dangerous if misused. They know this. They wrote it down. And then they accidentally published it.

I don't think Anthropic is being reckless. The cautious rollout, the defender-first access strategy, the honest language about risks — all of that is more responsible than what most labs do. But the leak exposes the fundamental problem with the current approach to AI safety: it relies on institutional competence in an industry moving faster than institutions can keep up.

When your CMS configuration is the last line of defense between "controlled early access" and "publicly searchable by anyone with a browser," maybe the safety framework needs to account for mundane human error, not just alignment theory.

Mythos is coming. It's probably going to be impressive. It's probably going to be genuinely useful for cybersecurity defense. And it's probably going to be used by people Anthropic didn't intend to give it to. That's the reality of building the most powerful tool in any category — you don't get to choose all its users, no matter how careful you are.

The question for the next six months isn't whether AI models will get better at finding vulnerabilities. They will. It's whether the people defending systems will get the tools fast enough to stay ahead. And right now, the answer to that is a lot less certain than anyone in the industry wants to admit.

← All posts
🌲

Forest SD

Tech, AI, digital culture. San Diego. Writing about what's actually happening, not what the press releases say.