Baidu CEO: AI 'bubble' will burst 99 percent of players

(theregister.com)

52 points | by teleforce 20 hours ago ago

46 comments

maeil 16 hours ago

> I think over the past 18 months, that problem has pretty much been solved – meaning when you talk to a chatbot, a frontier model-based chatbot, you can basically trust the answer

Yes, I'm sure if we ask a question about The Party to your (Baidu) model, we can trust the answer.

xk_id 17 hours ago

Can’t believe people still treat this kind of messaging about AI as expert opinion and not as advertising in its purest form. I guess those are the same people who see something profoundly meaningful in machine generated strings.

wkat4242 16 hours ago

Hallucinations exist because people are using LLMs as knowledge oracles which they are not. I doubt it'll ever be solved unless a new type of model is invented for this.

[-]

anonzzzies 15 hours ago

With current models it cannot be solved. Maybe it cannot be solved at all; humans lie/make up things (as in; they might not know they are telling a non truth) all the time and that's the best example of intelligence we have.

sfmz 18 hours ago

It takes so much money to be a player in this space; the ante went from ~$100k (GOOG, FB) to like $4B or 100k H100s. That's how I arrive at the statement as 99% don't have the cash.

[-]

n_ary 13 hours ago

The barrier to entry has always been the winning factor for lot of rapid growth businesses. In previous era, it was ability to afford massive cloud infra and army of expensive SWE to build global scale distributed systems. AirBnB, Uber, Lyft, Stripe, Paypal, Google, Netflix and all the “internet” tech had many competitors but only these could afford the cloud scale growth and overtake the market.

In LLM era, it is the compute cost of the hardware that is differentiating the barrier to entry for winners and anything that do not have massive budget for both training and marketing are never heard of unless they are in self-hype narrow field(Cursor).

It is also sad that, AI is hyped to progress environmental and medical research and there were some impressive feats, but all the hype and money is literally going to chatbot shops, hence marketing budget is also another barrier to entry.

dmix 17 hours ago

There's a lot of people making AI companies outside LLM model development

[-]

hn_throwaway_99 16 hours ago

There are, but then the question really becomes "what is the moat"? I.e. lots of these companies are essentially just providing wrappers around the best models coupled with some type of RAG approach.

FWIW, I believe there is a defendable moat for the players that have really good UI and are really focused on end-user solutions. E.g. I pay for Cursor.sh because I believe it is an easy net win for my productivity. But I do really wonder if these "AI application" companies can support their lofty valuations. I feel like most of them will have limited pricing power because if they try to price too high it's easy for someone to say "OK, we'll just go to a competitor, or even pull it in house."

[-]

Ekaros 12 hours ago

I am not sure if there is too many moats.

But I am starting to wonder do you always need one. You won't win startup lottery to trillion. But you can still have solid business that generates profit and sells real working reasonable solutions to real customers.

Then again this is obviously wrong site for that...

[-]

red-iron-pine a minute ago

"if it can't be a unicorn why even live?"

BillLucky 15 hours ago

Truly combine technology with the needs of industry customers and try new value gains. This is the moat.

kjellsbells 18 hours ago

I'm wondering if we are now in the Low Background Steel transition, as far as Internet content goes. I already see material on the open Internet that has very obviously been generated by AI. As the next round of Common Crawl or whatever is slurped in for training, AI ends up eating its own output. Does the quality of the material degrade at that point? Maybe we'll end up searching for an Internet that existed before AI started rewriting it.

https://en.wikipedia.org/wiki/Low-background_steel?wprov=sfl...

[-]

etrautmann 18 hours ago

Yes - this insight has been repeatedly discovered by many, but remains a great analogy.

[-]

klipt 17 hours ago

There are two main types of data for training intelligences (natural or artificial):

1. Self play

2. Data left behind by other intelligences' self play.

The Internet is 2, but is generated by the self play of humans - who were themselves trained on the self play of previous humans.

That's how civilization is bootstrapped.

Once you have bootstrapped sufficiently smart AI, they can possibly bootstrap themselves further on their own self play, instead of continuing to rely on human self play data.

[-]

8note 16 hours ago

Natural gas at least one more main type of training data, and it's much bigger than self play for natural training:

Testing against the universe.

Self play and self play ignores the whole of Empiricism

n_ary 13 hours ago

With the AI progress, we will probably finally see a slowing down of the “js framework of the month” and other rapidly changing things.

To a bunch of very junior devs using LLMs to generate code and build something, speed is now main factor, it is irrelevant whether the new Airbnb prototype or even prod version was being generated in Svelte 3 as long as it works.

The fine tuning and fixing up happens later on “if needed” basis.

Similar apply for similar knowledge where AI is also being utilized in similar manner. A slightly outdated info is sufficient for the task.

Also, internet will continue ti have decent content as there will always be passionate people who want to build and share knowledge, then there is the need to advertise your knowledge(consultant/freelancer) who still need to generate content in decent quality to be discovered.

The maximum slop will be in sales/marketing/social media sphere where content quality is irrelevant and more clicks/engagement brings profit.

Anything else will be continuously get locked behind paywall, so either exclusive contracts with AI companies to supply training data and sell on open market with strong copyright or drm to general consumers.

dinfinity 14 hours ago

Training data is preprocessed, you know. I don't understand why everybody making this argument ignores that.

It's not as if the internet before LLMs did not have tons of trash content (which also includes things written by humans who sincerely think they're right, but that is factually incorrect). Of course the input data is preprocessed and curated/weighted. Of course newer training data will also be curated.

Think for a second about what you said: "I already see material on the open Internet that has very obviously been generated by AI." Why would any AI company worth its salt _not_ ignore such data?

vdfs 18 hours ago

90% is the startup fail rate even before AI

[-]

dmix 17 hours ago

After around 7yrs of company life it's about 90%

Doesnt mean they won't make money in the prior years then still shutdown by the 7th year because markets change or competitors beat you

hn_throwaway_99 16 hours ago

Getting beyond the title, which I definitely agree with (https://news.ycombinator.com/item?id=41896346), there was this nugget about hallucinations:

> I think over the past 18 months, that problem has pretty much been solved – meaning when you talk to a chatbot, a frontier model-based chatbot, you can basically trust the answer

Can't decide if he actually believes this, or he's just spewing his own hype. While I definitely agree the best models have reduced hallucinations, going from, say, 3% hallucinations to .7% hallucinations doesn't really improve the situation much for me, because I still need to double check and verify the answers. Plus, I've found that models tend to hallucinate in these "tricky" situations where I'm most likely to want to ask AI in the first place.

For example, my taxes were more of a clusterfuck than usual this year, and so I was asking ChatGPT to clarify something for me, which was whether the "ordinary dividends" number reported on your 1040 and 1099s is a superset of "qualified dividends" (that is, whether the qualified dividends number is included in the ordinary dividends number), or if they were independent values. The correct answer is that the ordinary dividends number (3b on the 1040) does include qualified dividends (the 3a number), but ChatGPT originally gave me the wrong answer. Only when I dug further and asked ChatGPT to clarify did I get the typical "My mistake, you're right, it is a superset!" response from ChatGPT.

Anybody who says that LLM output doesn't need to be verified is either willfully bullshitting, or they're just not asking questions beyond the basics.

[-]

n_ary 13 hours ago

In your case, you ask but then verify. Most people(i.e. IRL people outside tech circle) just want to get things done and believe anything, which is why the “Best top 5/10/20 Y in 20xx” articles work or the top advert gets many hits on google, or social media misinformation spreads and works.

Before AI, people used to “google” things and vehemently believe and support the whatever content that were first and second on their search result, which is why google probably replaced all above the fold results with ad/sponsored and businesses throw plenty money and SEO to be the top.

anonzzzies 15 hours ago

He thinks you can trust the answers of chatbots? That is disturbing. Sure I use them all the time, but only for coding ; code I can review and verify after ; many other things are either a lot harder or impossible and I know it really does make up stuff with Confidence, so I do not even try. This guy must know that is how it is as CEO of this massive company?

[-]

lostmsu 15 hours ago

Ironically, you can't trust code of junior developers in exactly the same way.

[-]

anonzzzies 6 hours ago

Not of many 'seniors' either; most people are just terrible and lazy. Also, see the article yesterday about the deflation of developer titles. Everyone here basically agreed the titles are nonsenses, so how does junior mean anything suddenly? Junior vs what?

Gigachad 7 hours ago

At least the junior devs run the code before submitting it for review.

v3ss0n 15 hours ago

When did hallucinations were gone? Which model?

more_corn 18 hours ago

My bet is on unverified AI going the way of the dodo. I’m sorta sick of hallucinated nonsense. I also don’t see a business use case for a lot of “cool” ai products.

[-]

dboreham 18 hours ago

I don't know if you're familiar with how LLMs work, but after diving a bit into that space, I'm a "realistic enthusiast" after a lifetime as a deep skeptic. For me it's like the beginning of the internet (yes...old). It's clear there's something profoundly different and potentially enormously useful going on. Exactly how it will end up being used and who will make money from it is unclear. We know from experience that those questions ended up having answers different than we anticipated with the internet and I'd expect the same to happen with AI.

[-]

rossjudson 17 hours ago

The parent post is very orthogonal to yours. That isn't saying that LLMs won't be useful -- it's saying that unverified LLMs are of limited value.

This rings true. There's a low economic value for performing activities when it doesn't matter if the output is true or accurate.

Unverified LLMs can generated unlimited output that cannot be trusted; this is output that can approximate truth at times.

I suppose the long term question is whether the approximation is sufficient for value-generating purposes. It clearly is sufficient in cases where it outperforms the status quo (example: summarizing customer feedback).

[-]

Art9681 16 hours ago

This is also the case for unverified and verified humans, and one thing I can say with absolute confidence is that the majority of internet content has always, and will always be unverified opinion and ideology. As is the majority of human minds. The most valuable thing to come out of AI is the forced introspection about ourselves. It's "us" times N. For better or worse.

underseacables 18 hours ago

I don't know, remember the controversy over the wildly inaccurate historical AI images? That could serve some satirical products.

[-]

mc32 18 hours ago

Weren't the majority of those induced by manipulated politics rather than it being a base characteristic of the technology?

rodgerd 18 hours ago

In my optimistic moments, I think that.

In my realistic moments I think that we will rewrite medical law, banking law, and privacy law and so on to accomodate the shitty AI children of billionaires. Misdiagnosed by "AI"? Too bad. Lose a job because of AI? Too bad. Sentenced to death by an AI-powered criminal investigation? Too bad.

monero-xmr 18 hours ago

But YC just funded hundreds of AI companies. This would mean the world’s smartest investors are wrong. This is, frankly, impossible.

[-]

jrflowers 17 hours ago

What if they aren’t the world’s smartest investors

[-]

myprotegeai 17 hours ago

Case in point https://x.com/paulg/status/1845936708488970606

conductr 18 hours ago

Even if these younger vintage batches fail completely, the existing portfolio will benefit from AI being hyped.

18 hours ago

[deleted]

nitwit005 15 hours ago

If you asked, I'm sure they'd admit it is a gamble. The world's best gambler is still gambling.

IAmGraydon 18 hours ago

I'll have to assume you don't understand how tech investing works. Give a dollar to 100 companies and you've spent $100. 99% fail, but one ends up returning $1,000 and you've 10x'ed.

[-]

candiddevmike 17 hours ago

Why would it be 10x in this example? Isn't it 1000x? Or 900x if you remove the initial investment?

Do 1/100 companies really experience that kind of windfall?

[-]

rexreed 17 hours ago

In this example, you invested $100 to get a $1,000 return. $99 of those investments failed, but the one that succeeded made up for the losses. You still have to factor in the losses, otherwise if you could have just picked the winner from the get-go, you would have done that and invested just $1, not $100.

IAmGraydon 16 hours ago

You spent $100 and came out with $1,000. That’s 10x (you’ve returned 10 times the initial investment).

evilfred 14 hours ago

very amused at this obvious sarcasm getting grey-texted

18 hours ago

[deleted]

draw_down 18 hours ago

[dead]