A Boy That Cried Mythos: Verification Is Collapsing Trust in Anthropic

(flyingpenguin.com)

54 points | by taejavu 2 hours ago ago

17 comments

"Sonnet sees the same two “obvious” bugs. It just cannot close the exploitation step. Mythos’s entire frontier advantage over the prior model is therefore bupkis."

What a bizarre conclusion. It "just" cannot close the exploitation step? "Just?"

Developing the working exploit is the hardest part, not finding the bugs. A self-proclaimed security professional should know this.

How is this stuff even making it to the top of HN? Is it just the trendy Anthropic hate? I wonder if these folks will publicly walk back their statements if Mythos turns out to be legit.

[-]

EE84M3i an hour ago

I don't think there is a general consensus in the security community that finding bugs is easier than writing exploits.

redanddead an hour ago

you think anthropic didn't earn their hate?

BoorishBears an hour ago

We already have access to a smaller version of the Mythos tier with Opus 4.7: based on the usual delta between the full fat models and their distills, do you really think Mythos breaks cybersecurity?

It's a good model update. We've had these before, and it looks like OpenAI is gearing up to match it this week.

Mythos launch has felt like a showsman overlplaying their hand.

Opus 4.5 put them in an awkward position after everyone went Opus-only and suddenly Sonnet's quota was getting treated like you were asking people to use Haiku.

So a new pretraining run completes and instead of just releasing it as Opus 5, they stick the model in a new tier and name it Mythos Preview, while simultaneously launching Project Glasswing to literally build a mythos around the model.

Some people are even confusing it for some sort of completely new paradigm of model centered on cybersecurity not realizing it's 'just' a new model tier, and the cybersecurity stuff is separate.

While Mythos Preview is simmering a Sonnet-sized distill gets launched as Opus 4.7, at Opus prices, and fixes the margins and compute needs of the Opus tier again.

Improved pretraining + progress on RL allows it to compete even though it's a smaller model, but some things still regress like understanding nuance (hence the regression on Tau bench and agentic search)

It's clear they plan to price Mythos like they used to price Opus (so high that you don't see it as a strict replacement for the smaller tiers) and heal the compute crunch just a tad.

The main problem is OpenAI doesn't have to play these games.

They have compute, and GPT-5 is already a very parameter efficient model so they're just going to release their model without the fanfare and mystery.

Mythos might get deflated before they even get to cash in on all the fanfare they created. Unfortunate timing really (if you're Anthropic)

[-]

solenoid0937 an hour ago

If Glasswing was a marketing exercise for Anthropic, why did Linux Foundation issue a joint statement with them? What about Apple? Conspiracy theories aside - what's your Occam's Razor explanation?

[-]

redanddead 41 minutes ago

what's wrong in admitting you don't know something for a fact? i would love to see some proof for mythos or a white paper or something

smaller companies, even startups, are held to much much higher standards

is anthropic somehow immune? what have they done to earn that immunity? what good will, good stewardship, good faith have they shown to the developer community in the past few quarters?

call a spade a spade

petesergeant an hour ago

You'd look like an idiot for turning down Anthropic's help, but if Anthropic are over-blowing it, you probably won't have any reputational harm.

vmaurin an hour ago

I wonder if these fanboys will publicly walk back their statements if Mythos turns out to BS. Remember, in French, mythos is short for mythomanes that means "pathological liars"

[-]

solenoid0937 an hour ago

Will absolutely walk back, but I simply don't think the Linux Foundation, Apple, etc are lying when they are calling Mythos a genuine issue.

There is healthy skepticism and then there is sticking your head in the sand. When companies and orgs with no financial interest in Anthropic issue a joint statement describing a problem, it is likely that the problem is real (unless you go off into wacky conspiracy territory.)

[-]

Pay08 44 minutes ago

Has an actual Linux dev said anything about it?

baq an hour ago

I wouldn’t be surprised if the glasswing thing comes with an NDA akin to what the NSA wants you to sign when you join. That would be the Anthropic-optimistic interpretation of the sound of crickets from participants - and ‘responsible disclosure’ would be an ok-ish reason for Anthropic itself to not publish what they found themselves alone.

If it’s indeed as bad as the article says it’s going to be a (yet another) PR disaster, but it won’t matter one bit as the whole industry is compute-constrained, not reputation-constrained. You’ll shout at clouds and them and their competitors and still be paying for tokens.

lubujackson an hour ago

Apparently this doom marketing strategy is working for landing enterprise deals, but boy these AI companies are stirring up consumer hate and fear.

I think the real purpose of the Mythos security sham is to mask that Anthropic simply can't release their new model because their data centers are already on fire. There are so many other red flags pointing to this: the no-Claude-Code-for-Pro-users "test", the AWS data center rental deal, the fact Microsoft rug pulled hard on Copilot, specifically removing Opus... and that's just the past 2 days?

mirashii an hour ago

How about the boy who called nonsense security vulnerabilities. This is the same author who posts with incredulity that the ability to change a config file with a shell command in it gives you the ability to run the shell command you posted and wants it treated as some big CVE. Absolutely inconceivable that you might already have your harness in a sandbox where this is okay, and inconceivable that anyone might have a threat model that says that someone who can edit configuration of a tool can make that tool do arbitrary things allowed by its config.

https://www.flyingpenguin.com/ox-security-report-anthropic-m...

vidarh an hour ago

> The 244-page technical artifact, the thing that would have to survive peer review, refuses to actually quantify.

In what world does this author live where the system card is meant to be a scientific paper?

It's worth being skeptical, but it's nonsense to assume that the system card is meant for him or anyone to be able to reproduce and determine what the model actually did or did not. We won't know that until it is actually available.

avalys 2 hours ago

Am I supposed to know what a “system card” is?

[-]

eichin 2 hours ago

https://www.anthropic.com/system-cards generally, https://news.ycombinator.com/item?id=47679406 more specifically (I'm guessing the article didn't link directly because it's an unreleased model and there are only preview versions maybe, given all the cdn links?)

velcrovan 2 hours ago

Yes