In the last 6 months, I've had to buy a few things that 'normal people' tend to buy (a coffee machine, fuel, ...), for which we didn't already have trusted sellers, and so checked Google.
For fuel, Google results were 90% scams, for coffee machines closer to 75%
The scams are fairly elaborate: they clone some legitimate looking sites, then offer prices that are very competitive -- between 50% and 75% of market prices -- that put them on top of SEO. It's only by looking in details at contact information that there are some things that look off (one common thing is that they may encourage bank transfers since there's no buyer protection there, but it's not always the case).
A 75% market rate is not crazy "too good to be true" thing, it's in the realm of what a legitimate business can do, and with the prices of the items being in the 1000s, that means any hooked victim is a good catch.
A particular example was a website copying the one for a massive discount appliance store chain in the Netherlands.
They had a close domain name, even though the website looked different, so any Google search linked it towards the legitimate business.
You really have to apply a high level of scrutiny, or understand that Google is basically a scam registry.
Scammers can outbid real stores on the same products for the advertising space simply because they have much better margins. And google really doesn't care about whether it is a scammer that pays them or a legit business, they do zero due diligence on the targets of the advertising.
Not quite the same thing but some non-negligable percentage of ads I see on Facebook are outright scams which purport to be selling musical instruments at a 'markdown'. First guitars supposedly from the Sam Ash bankruptcy sales linking to an obvious fake site and more lately 'free' giveaways of high end Gibson acoustic guitars. When I've reported them I got the feedback that it didn't violate community standards, but my insta account got perma-banned when I posted the original of a song on youtube from 1928 on a thread which started with a cover from 30 years ago. That was considered spam.
Smart scammers should know that peopel know if something is too good to be true ("free Gibson} etc), it is probabaly fake. But people keep clicking, for what it's worth.
This is a narrative I've heard many times, with very little evidence to back it up.
An alternative and more accurate view is that, as the world came online, people became exposed to the very low-effort scams, representative of criminal elements from around the world, which befuddled most due to their child-like naivety.
None of those confused individuals would fall for it but they require an explanation. Someone came up with a theory that it's actually a stroke of 4D genius and it stuck.
edit: ok, I bothered to look this up: Microsoft had a guy do a study on nigerian scams, the guys who wrote Freakonomics did a sequel referencing that study and drew absurb unfounded conclusions, which have been repeated over and over. Business as usual for the fig-leaf salesmen.
I had that reaction as well, but consider: clickbait is such because it takes more work (emotional or logical) to reject it than an ad which is merely not relevant to you. Thus, your (and my) recall of ads is probably biased towards clickbait, and we overestimate its prevalence.
That usually means you tend to visit trash sites. Higher quality sites have higher quality ads. In fact, for the highest quality media, people actually PAY for ads. See things like Vogue September issue or technical shopping magazines, which earn value for being 90% ads. People used to buy local newspapers because of the ads as well.
> To find the most informative examples, we separately cluster examples labeled clickbait and examples labeled benign, which yields some overlapping clusters
How can you get overlapping clusters if the two sets of labelled examples are disjoint?
Active Learning is a very tricky area to get right ... over the years I have had mixed luck with text classification, to the point that my colleague and I decided to perform a thorough empirical study [1], that normalized various experiment settings that individual papers had reported. We observed that post normalization, randomly picking instances to label is better!
> in production traffic only very few (<1%) ads are actually clickbait
That's a fascinating claim, and it does not align with my anecdotal experience using the web for many years.
In the last 6 months, I've had to buy a few things that 'normal people' tend to buy (a coffee machine, fuel, ...), for which we didn't already have trusted sellers, and so checked Google.
For fuel, Google results were 90% scams, for coffee machines closer to 75% The scams are fairly elaborate: they clone some legitimate looking sites, then offer prices that are very competitive -- between 50% and 75% of market prices -- that put them on top of SEO. It's only by looking in details at contact information that there are some things that look off (one common thing is that they may encourage bank transfers since there's no buyer protection there, but it's not always the case).
A 75% market rate is not crazy "too good to be true" thing, it's in the realm of what a legitimate business can do, and with the prices of the items being in the 1000s, that means any hooked victim is a good catch. A particular example was a website copying the one for a massive discount appliance store chain in the Netherlands. They had a close domain name, even though the website looked different, so any Google search linked it towards the legitimate business.
You really have to apply a high level of scrutiny, or understand that Google is basically a scam registry.
Scammers can outbid real stores on the same products for the advertising space simply because they have much better margins. And google really doesn't care about whether it is a scammer that pays them or a legit business, they do zero due diligence on the targets of the advertising.
didn't parent comment cited sentence about clickbait?
why did you change subject to scams?
Not quite the same thing but some non-negligable percentage of ads I see on Facebook are outright scams which purport to be selling musical instruments at a 'markdown'. First guitars supposedly from the Sam Ash bankruptcy sales linking to an obvious fake site and more lately 'free' giveaways of high end Gibson acoustic guitars. When I've reported them I got the feedback that it didn't violate community standards, but my insta account got perma-banned when I posted the original of a song on youtube from 1928 on a thread which started with a cover from 30 years ago. That was considered spam.
Smart scammers should know that peopel know if something is too good to be true ("free Gibson} etc), it is probabaly fake. But people keep clicking, for what it's worth.
it's the opposite. scammers want the people that are gullible enough to go for "free"
This is a narrative I've heard many times, with very little evidence to back it up. An alternative and more accurate view is that, as the world came online, people became exposed to the very low-effort scams, representative of criminal elements from around the world, which befuddled most due to their child-like naivety. None of those confused individuals would fall for it but they require an explanation. Someone came up with a theory that it's actually a stroke of 4D genius and it stuck.
edit: ok, I bothered to look this up: Microsoft had a guy do a study on nigerian scams, the guys who wrote Freakonomics did a sequel referencing that study and drew absurb unfounded conclusions, which have been repeated over and over. Business as usual for the fig-leaf salesmen.
I had that reaction as well, but consider: clickbait is such because it takes more work (emotional or logical) to reject it than an ad which is merely not relevant to you. Thus, your (and my) recall of ads is probably biased towards clickbait, and we overestimate its prevalence.
That usually means you tend to visit trash sites. Higher quality sites have higher quality ads. In fact, for the highest quality media, people actually PAY for ads. See things like Vogue September issue or technical shopping magazines, which earn value for being 90% ads. People used to buy local newspapers because of the ads as well.
Specifically the September issue? Is that one special?
Ad company says ads are good, water is wet, news at 11.
> it does not align with my anecdotal experience
Given I'll often see the same fraudulent ad repeated I think anecdotal experience is there are not many of them.
I can even talk to friends about the most boring fraudulent ads and they know them. i.e. Elon doubling your bitcoin scams.
For normal ads unless they are viral, there are millions out there that are never repeated or not even seen.
Because fraud ads have short lifetimes pulled out of 'production traffic' you can collect many for the training data
I assume 'clickbait' is the safety word for 'fraud'
I’m confused by the clustering step:
> To find the most informative examples, we separately cluster examples labeled clickbait and examples labeled benign, which yields some overlapping clusters
How can you get overlapping clusters if the two sets of labelled examples are disjoint?
they cluster the examples with their model and then check the predictions against the labels.
Active Learning is a very tricky area to get right ... over the years I have had mixed luck with text classification, to the point that my colleague and I decided to perform a thorough empirical study [1], that normalized various experiment settings that individual papers had reported. We observed that post normalization, randomly picking instances to label is better!
[1] https://aclanthology.org/2024.emnlp-main.1240/
Reminds how one of the winners of the 2001 Andrew Ng’s Data-Centric AI competition analyzed embeddings separation to choose training data https://rensdimmendaal.com/posts/data-centric-ai