It's a known "issue" of reCaptcha, and many other systems like it. If it thinks you're a bot, it will "fail" the first few correct solves before it lets you through.
The worst offenders will just loop you forever, no matter how many solves you get right.
Not sure it is your case but I think I sometimes had to solve many of them when I am in my daily task rush. My hypothesis is that I solve them too fast for "average human resolving duration" recaptcha seems to expect (I think solving it too fast triggers bot fingerprint). More recently when I fall on a recaptcha to solve, I consciently do not rush it and feel have no more to solve more than one anymore. I don't think I have super powers, but as tech guy I do a lot a computing things mechanically.
Didn't look a lot into this but I think the fact that humans are willing to do this in the "cents per thousand" or something range means that it's really hard to get much interest in automating it
While running this I looked at hundreds and hundreds of captchas. And I still get rejected on like 20% of them when I do them. I truly don't understand their algorithm lol
Ok and then? Those models were not trained for this purpose.
It's like the last hype over using generative AI for trading.
You might use it for sentiment analysis, summarization and data pre-processing. But classic forecast models will outperform them if you feed them the right metrics.
Same! As we talk about in the article, the failures were less from raw model intelligence/ability than from challenges with timing and dynamic interfaces
In my admittedly limited-domain tests, Gemini did _far_ better at image recognition tasks than any of the other models. (This was about 9 months ago, though, so who knows what the current state of things). Google has one of the best internal labeled image datasets, if not the best, and I suspect this is all related.
interesting results. why does reload/cross-tile have worse results? would be nice to see some examples of failed results (how close did it to solving?)
I'm also curious what the success rates are for humans. Personally I find those two the most bothersome as well. Cross-tile because it's not always clear which parts of the object count and reload because it's so damn slow.
We have an example of a failed cross-tile result in the article - the models seem like they're much better at detecting whether something is in an image vs. identifying the boundaries of those items. This probably has to do with how they're trained - if you train on descriptions/image pairs, I'm not sure how well that does at learning boundaries.
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
Indeed, captcha vs captcha bot solvers has been an ongoing war for a long time. Considering all the cybercrime and ubiquitous online fraud today, it's pretty impressive that captchas have held the line as long as they have.
So, when do we reach a level where AI is better than humans and we remove captcha from pages alltogether? If you don't want bots to read content, don't put it online, you're just inconveniencing real people now.
They can also sign up and post spam/scams. There are a lot of those spam bots on YouTube, and there probably would be a lot more without any bot protection. Another issue is aggressive scrapers effectively DOSing a website. Some defense against bots is necessary.
I hypothesize that these AI agents are all likely higher than human performance now.
I’m sure they do better than me. Sometimes I get stuck on an endless loop of buses and fire hydrants.
Also, when they ask you to identify traffic lights, do you select the post? And when it’s motor/bycicles, do you select the guy riding it?
Testing those same captcha on Google Chrome improved my accuracy by at least an order of magnitude.
Either that or it was never about the buses and fire hydrants.
It's a known "issue" of reCaptcha, and many other systems like it. If it thinks you're a bot, it will "fail" the first few correct solves before it lets you through.
The worst offenders will just loop you forever, no matter how many solves you get right.
Pro tip, select a section you know is wrong, then de select it before submitting. Seems to help prove you are not a bot.
Not sure it is your case but I think I sometimes had to solve many of them when I am in my daily task rush. My hypothesis is that I solve them too fast for "average human resolving duration" recaptcha seems to expect (I think solving it too fast triggers bot fingerprint). More recently when I fall on a recaptcha to solve, I consciently do not rush it and feel have no more to solve more than one anymore. I don't think I have super powers, but as tech guy I do a lot a computing things mechanically.
That's not due to accuracy, you're getting tarpitted for not looking human enough.
Didn't look a lot into this but I think the fact that humans are willing to do this in the "cents per thousand" or something range means that it's really hard to get much interest in automating it
While running this I looked at hundreds and hundreds of captchas. And I still get rejected on like 20% of them when I do them. I truly don't understand their algorithm lol
There's a browser extension to solve them. Buster.
Would performance improve if the tiles were stitched together and fed to a vision model, and then tiles are selected based on a bounding box?
Ok and then? Those models were not trained for this purpose.
It's like the last hype over using generative AI for trading.
You might use it for sentiment analysis, summarization and data pre-processing. But classic forecast models will outperform them if you feed them the right metrics.
To be honest I'm surprised how well it holds. I expected close-to-total collapse. It'll be a matter of time I guess, but still.
Same! As we talk about in the article, the failures were less from raw model intelligence/ability than from challenges with timing and dynamic interfaces
Seems like Google Gemini is tied for the best and is the cheapest way to solve Google's reCAPTCHA.
Will be interesting to see how Gemini 3 does later this year.
After watching hundreds of these runs, Gemini was by far the least frustrating model to observe.
In my admittedly limited-domain tests, Gemini did _far_ better at image recognition tasks than any of the other models. (This was about 9 months ago, though, so who knows what the current state of things). Google has one of the best internal labeled image datasets, if not the best, and I suspect this is all related.
Makes sense, what do you think it was trained on?
If not today, models will get better at solving captchas in the near future. IMHO, the real concern, however, is cheap captcha solving services.
interesting results. why does reload/cross-tile have worse results? would be nice to see some examples of failed results (how close did it to solving?)
I'm also curious what the success rates are for humans. Personally I find those two the most bothersome as well. Cross-tile because it's not always clear which parts of the object count and reload because it's so damn slow.
We have an example of a failed cross-tile result in the article - the models seem like they're much better at detecting whether something is in an image vs. identifying the boundaries of those items. This probably has to do with how they're trained - if you train on descriptions/image pairs, I'm not sure how well that does at learning boundaries.
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
I know people were solving CAPTCHAS with neural nets (with PHP no less!) back in 2009.
Indeed, captcha vs captcha bot solvers has been an ongoing war for a long time. Considering all the cybercrime and ubiquitous online fraud today, it's pretty impressive that captchas have held the line as long as they have.
You could definitely do better than we do here - this was just a test of how well these general-purpose systems are out-of-the-box
So, when do we reach a level where AI is better than humans and we remove captcha from pages alltogether? If you don't want bots to read content, don't put it online, you're just inconveniencing real people now.
They can also sign up and post spam/scams. There are a lot of those spam bots on YouTube, and there probably would be a lot more without any bot protection. Another issue is aggressive scrapers effectively DOSing a website. Some defense against bots is necessary.
3 models only, can we really call that a benchmark?
yes
in other words reasoning call fill the context window with crap