The CAPTCHA arms race: from distorted text to browser identity

(browserbase.com)

69 points | by harsehaj a day ago ago

58 comments

netik a day ago

So this is a basically a shill advertisement ending in "Your AI Agents can avoid captchas if you pay us."

The last example is a false narrative, that captchas will only happen if the "browser looks suspicious". Systems like Altcha put an end to this argument. They don't care if the browser looks suspicious, only that the browser can perform a proof-of-work to get past a captcha designed to slow down the request rate.

When applied consistently, it will effectively block and slow down AI crawlers, which is what this company wants to promote.

[-]

chrismorgan a day ago

Proof-of-work is bad rate limiting: https://news.ycombinator.com/item?id=44093918. The playing field is wildly unbalanced. Even naive attackers tend to have a lot more computing power available than a lot of your normal users, and where it’s SHA-256 (which is almost the worst choice imaginable for a proof of work scheme, yet which every single service that I know of has used), an intelligent attacker goes from being hundreds of times as powerful to millions of times as powerful.

[-]

netik a day ago

I agree with this assessment but for many applications it's a viable approach, until the attacker goes off and writes their own shader to solve the PoW. We go to back to threat modeling here, and looking at the amount of effort vs gain here.

They're now integrating Argon2ID in an attempt to squash GPU hacks but it places ridiculous demands on the client being Memory hard.

[-]

CodesInChaos a day ago

Did they evaluate good old bcrypt? I haven't looked at it in a while, but it used to be very GPU unfriendly (though still vulnerable to FPGA/ASIC).

ameliaquining a day ago

This seems like the real mechanism here is not actually proof-of-work so much as security-through-obscurity.

peeet a day ago

More advanced and targeted bots can "bypass" Proof of work as well though, e.g. using something like https://github.com/toman-tom/Incapsula-PoW

pentacent_hq a day ago

Do you have real-world experience deploying PoW captchas? I'd love to give them a try but I'm worried my forms will end up getting overwhelmed with spam if I switch away from hCaptcha.

gruez a day ago

>Systems like Altcha put an end to this argument. They don't care if the browser looks suspicious, only that the browser can perform a proof-of-work to get past a captcha designed to slow down the request rate.

That doesn't really work out in reality because bots are happy to wait 5 seconds or even 5 minutes for a PoW challenge to complete. Humans on the other hand will not, especially if they're on a mobile device with limited compute and energy.

CM30 a day ago

The issue is that anything that becomes a standard here automatically becomes a target. If the same sort of captcha protects everything from Gmail to Twitter to Cloudflare and Facebook, then bot creators and spammers have a huge incentive to bypass it no matter what. And if we've learnt anything about spam, it's that pretty much every system we can think of can be bypassed or automated away.

The solution is really a ton of different captcha like systems and anti spam solutions, all unpopular enough that an attacker may not even bother targeting them. If an attacker needs to target a few thousand different captcha style setups to get their spam through, then many of them won't bother.

It's like centralised vs decentralised communication systems. If everything is centralised, a bad actor (like a government, corporation, criminal group, etc) can go after one target to control the narrative. If it's decentralised, then suddenly they have to go after dozens or hundreds of different targets, many of which won't cooperate with them.

[-]

ktpsns a day ago

This is the reason why I implemented rather dumb but individual, hand-crafted captchas for my own websites in the past. Things like input fields which must be left empty, silly multiple choice questions only humans could properly answer at that time, etc.

I have seen communities to implement their own captchas with domain knowledge. For instance, math/STEM people showing captchas with rather easy calculus tasks (such as solving a definite integral). This can be fun to solve and as a human you feel valued. It is this handmade feeling of the "old internet".

The problem with self-made captchas is that even them are rather easy to solve nowadays with LLMs and thelike. Therefore I don't believe that decentralized individualized captchas are the solution as they tend to be rather simple.

[-]

CM30 21 hours ago

The issue is that AI bots can answer pretty much any captcha that isn't ridiculously obtuse. Your homemade captcha can be solved with an LLM, but so can the latest version of Recaptcha or whatever.

But I feel that there may be some other tools here. Spammers in general tend to be fairly predictable in their behaviour, so tracking that could help. If a member is including links to a third party site in dozens of messages, that should be a pretty big hint that their intentions are questionable. If they blast through the registration page in seconds, same thing.

Personally, I've not seen more than a single spammer on the forum I run for the last 6 months or so, so I'm still optimistic that most don't bother unless it's easy to attack and worth their time.

epgui a day ago

I thought half the point of captchas was to train vision models?

[-]

ameliaquining a day ago

Those were not "vision models" in the modern sense, but rather crude classifiers or OCR systems that were heavily dependent on human labor to handle many cases, because the vision models of the time sucked and were hardly capable of anything. The economic value of CAPTCHA-based data labeling went to zero when AlexNet (the first general-purpose vision model good enough for real-world use) was released in 2012; from then on, you could just have the machines do that work instead.

ben_w a day ago

This is in the article.

Indeed, half the point for reCAPTCHA: That how Google could justify supplying reCAPTCHA for free, but not why people wanted to use them.

[-]

chinathrow a day ago

> That how Google could justify supplying reCAPTCHA for free, but not why people wanted to use them

This and Pokemon Go for collecting videos: are there other examples of users doing the free work for $large_co?

[-]

ben_w a day ago

https://en.wikipedia.org/wiki/Self-checkout

curtisboortz a day ago

The Chrome extension angle is interesting here. We ship an extension that interacts with Gmail and have seen how much variance there is in what Google considers "bot-like" behavior from extensions vs. the browser tab. The line between "automated" and "assisted" is not well defined at the API level, which ends up being a similar underlying problem: distinguishing intent rather than pattern.

hombre_fatal a day ago

As TFA points out, a major change is that bot traffic now comes from honest users via their LLM sessions, so you don't even necessarily want to block automated bots anymore.

The game is shifting to a better ideal: how do you design a service knowing that any user/request might be automated?

Especially in place of the historical, easy solution/hack where you have some sort of gate that, once passed, puts the user in some trusted low-scrutiny tier, like a forum's registration page.

It's a similar question to designing a system so that it's resilient to account take-overs. (i.e. The user was a trusted human until now, and now it's a spammer)

Example: on a forum, run new posts through an LLM to classify it as spam which is a magic solution we always wish we had (remember akismet?) but was too rudimentary.

[-]

wildzzz a day ago

You use API tokens for things intended to be machine to machine communication and captchas for things intended to be filled out by humans. Not every site or service wants automated input, even if it's being directed by a human. I dont want forums like HN just filled with a bunch of agents talking to eachother, where's the human connection?

GL26 a day ago

Question that I've been wondering, can't attackers record human sessions and use it to attack a website to bypass cloudflare ?

[-]

bluGill a day ago

They can. They have already figured out a lot of what cloudflare is looking for and have figured out how to bypass it. (according to the article) Which is why protection is trying something else. I suppose this is why every website wants me to login with my google account (which I never use)

ra0x3 a day ago

TLDR: They're promoting a product they're working on with Cloudfare under the guise of it being an "open standard" [1]. Of course, in the docs, Step 1 is "Sign in with your Cloudfare account". Comes across a bit land-grabby.

[1] https://www.browserbase.com/blog/cloudflare-browserbase-pion...

thenthenthen a day ago

Omg. I am on various VPN’s and now and again Google Auth (for youtube) throws me a captcha. They are mostly unreadable, but there is an audio option… which is just insane and does not make any sense, anyone had that? It sounds like a recording of 300 people speaking at the same time in a call center while on various dosages of LSD

[-]

nosioptar a day ago

I've actually been in a call center with 300 intoxicated folk all talking at once. Its easier to understand than the recaptcha audio.

(Only a couple folks on hallucinogenics, most on various downers.)

moralestapia a day ago

I've got captchas that made me play a small game and I score like 3 points to go ahead, lol. For real.

willmadden a day ago

They give you that (or hieroglyphics) if you are using certain VPNs and don't leave a specific browser fingerprint.

[-]

prmoustache a day ago

There is a point where not leaving fingerprints becomes a fingerprint in itself.

[-]

willmadden 4 hours ago

The best browsers that do not leave fingerprints do so by making your fingerprint look like every other fingerprint. Cydec used to have a plugin that actually worked for randomizing everything, but support ended. Maintenance for that must have been a nightmare. The best approach now is to blend in by looking like everyone else doing the same thing.

joehabeebs a day ago

The most recent variations that force you to click the boxes containing a certain artifact are incredibly frustrating and fail half the time. The large influx of AI-SEO optimized content being created makes me question CAPTCHAs efficacy today

matteo8p a day ago

Really nice read Harsehaj!

I haven't looked deeply into Web Bot Auth, but is identification tied to the agent (one identity per agent) or is it tied to the underlying person using the agent (the user)?

Hope that question makes sense, lmk if you need clarification

[-]

peytoncasper a day ago

Hey Matt,

I would say everyone is leaning towards organization/individual right now but I would image that flips as the number of agents grow

ramify123 a day ago

Here is some fun captcha instead of that https://feralui.vercel.app/#/captcha

randrus a day ago

Always reminds me of the forces that shape the mechanisms around the exchange of genetic information that powers evolution.

See: Red Queen by Matt Ridley.

ezst a day ago

They have served to train multiple generations of ANN and ML algorithms, in that, I think they've been a resounding success!

giancarlostoro a day ago

I remember at one point in my teens, someone had made a web app that would snag the captcha and show you only the captcha, and you would just endlessly solve captchas, while the application tried different passwords on a backend, and logging any successful logins.

[-]

yieldcrv a day ago

Some of the first bitcoin faucets in 2011, 2012 were bots doing that

Users thought the captcha was antispam prevention for them to receive bitcoin

It was really just the bot forwarding a captcha to continue its spam once solved, posting the user in bitcoin

[-]

giancarlostoro a day ago

LOL I don't remember doing captcha, but I remember receiving bitcoin from a faucet, thought it was strange.

visiondude a day ago

although not perfect for other reasons, a captcha made using phone motion and device attestation like prsn.you is a more challenging bypass for today’s agent environments

throw7 a day ago

Just today a website presented me a qrcode captcha. I threw up.

SirMaster a day ago

What about those ones where you need to slide some piece of a puzzle in. I don't see those mentioned at all. Are they effective?

kgwxd a day ago

They're great for keeping humans out. Tried to setup Discord on a new phone yesterday. CAPTCHAs over and over again, just trying to log in. I uninstalled instead.

solutionB a day ago

saw this somewhere dont know if can be implemented irl but looks promising https://github.com/mortspace/playcaptcha

[-]

Zie_Mordecai 20 hours ago

[dead]

akimbostrawman a day ago

Failed? They have very successfully pushed people towards chromium browser and traceable residential IPs while also training AI.

echoangle a day ago

Oh my good I hate AI articles. Why do we have to make an interactive visualization for every single sentence? Thanks for showing me how distorted text is made in steps.

And being a cat and mouse game doesn’t mean the defenders failed.

[-]

qweqwe14 a day ago

> And being a cat and mouse game doesn’t mean the defenders failed.

It does though, in the end attackers always win. If something is a "cat and mouse game" then it's unwinnable by design from the defender side.

Sure, you can keep playing it if you feel like it, but at some point the attacker will be indistinguishable from a legitimate user and you will lose that fight.

[-]

echoangle a day ago

By that logic, every security task is doomed to fail. Spam detection and antivirus are cat and mouse games too. I wouldn’t say they fail just because they have to adapt over time.

cute_boi a day ago

It has failed because of these company like browserbase and hackers who hack smart device and TV's for residential proxy.

jmclnx a day ago

They have been around that long ? Does not seem so but the timing could be correct probably because the sites I went to had no need for CAPTCHAs until AI came around.

[-]

Zak a day ago

The name wasn't invented until 2003, but yes.

Guestbooks, contact forms, signup pages, and the like started receiving automated abuse approximately five minutes after they were invented. It didn't take long after that for people to start including a question they expected to be easy for a person and hard to automate with a script.

What's relatively new is CAPTCHAs merely to browse a site. There are few faster ways to get me to close your site, and maybe send you an unfriendly email.

[-]

nosioptar a day ago

My first guestbook asked Hagar or Roth. Answering correctly got your message added to the book. Answering Hagar got you sent to an infinite redirect loop for being either a bot or a moron.

code_duck a day ago

So in the past few years? Oh dear, no. Captchas have been in common use for much longer than that. reCAPTCHA has been around almost 20 years.

JohnFen a day ago

They were introduced in 1997, although I personally didn't start seeing them until a couple of years later.

zuzululu a day ago

so whats the solution then? get people to turn on their camera and hold up 15 fingers ?

[-]

fusslo a day ago

it sounds like the article & company are building identity based on fingerprinting/cross-domain behavior. Inferring at multiple levels, including cloudflare's

It's just more identity verification afaict

ranger_danger a day ago

PACT: https://news.ycombinator.com/item?id=48647360

throwawayffffas a day ago

The solution is login and paywalls.

[-]

kgwxd a day ago

That's crazy. People aren't going to pay to be tracked and have ads shoved in their faces! The economy would collapse!