Ilya Sutskever: We're moving from the age of scaling to the age of research

(dwarkesh.com)

192 points | by piotrgrabowski 9 hours ago ago

162 comments

pxc an hour ago

If "Era of Scaling" means "era of rapid and predictable performance improvements that easily attract investors", it sounds a lot like "AI summer". So... is "Era of Research" a euphemism for "AI winter"?

[-]

techblueberry 23 minutes ago

Yes

orbital-decay 2 hours ago

>You could actually wonder that one possible explanation for the human sample efficiency that needs to be considered is evolution. Evolution has given us a small amount of the most useful information possible.

It's definitely not small. Evolution performed a humongous amount of learning, with modern homo sapiens, an insanely complex molecular machine, as a result. We are able to learn quickly by leveraging this "pretrained" evolutionary knowledge/architecture. Same reason as why ICL has great sample efficiency.

Moreover, the community of humans created a mountain of knowledge as well, communicating, passing it over the generations, and iteratively compressing it. Everything that you can do beyond your very basic functions, from counting to quantum physics, is learned from the 100% synthetic data optimized for faster learning by that collective, massively parallel, process.

It's pretty obvious that artificially created models don't have synthetic datasets of the quality even remotely comparable to what we're able to use.

[-]

FloorEgg 2 hours ago

Aren't you agreeing with his point?

The process of evolution distilled down all that "humongous" amount to what is most useful. He's basically saying our current ML methods to compress data into intelligence can't compare to billions of years of evolution. Nature is better at compression than ML researchers, by a long shot.

[-]

samrus 10 minutes ago

Sample efficiency isnt the ability to distill alot of data into good insights. Its the ability to get good insights from less data. Evolution didnt do that it had a lot of samples to get to where it did

alyxya 4 hours ago

The impactful innovations in AI these days aren't really from scaling models to be larger. It's more concrete to show higher benchmark scores, and this implies higher intelligence, but this higher intelligence doesn't necessarily translate to all users feeling like the model has significantly improved for their use case. Models sometimes still struggle with simple questions like counting letters in a word, and most people don't have a use case of a model needing phd level research ability.

Research now matters more than scaling when research can fix limitations that scaling alone can't. I'd also argue that we're in the age of product where the integration of product and models play a major role in what they can do combined.

[-]

pron 4 hours ago

> this implies higher intelligence

Not necessarily. The problem is that we can't precisely define intelligence (or, at least, haven't so far), and we certainly can't (yet?) measure it directly. And so what we have are certain tests whose scores, we believe, are correlated with that vague thing we call intelligence in humans. Except these test scores can correlate with intelligence (whatever it is) in humans and at the same time correlate with something that's not intelligence in machines. So a high score may well imply high intellignce in humans but not in machines (e.g. perhaps because machine models may overfit more than a human brain does, and so an intelligence test designed for humans doesn't necessarily measure the same thing we think of when we say "intelligence" when applied to a machine).

This is like the following situation: Imagine we have some type of signal, and the only process we know produces that type of signal is process A. Process A always produces signals that contain a maximal frequency of X Hz. We devise a test for classifying signals of that type that is based on sampling them at a frequency of 2X Hz. Then we discover some process B that produces a similar type of signal, and we apply the same test to classify its signals in a similar way. Only, process B can produce signals containing a maximal frequency of 10X Hz and so our test is not suitable for classifying the signals produced by process B (we'll need a different test that samples at 20X Hz).

[-]

matu3ba 3 hours ago

My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium. In other words knowing how to manipulate the world directly and indirectly via deterministic actions and known inputs and teach others via various mediums. As example, you can be very intelligent at software programming, but socially very dumb (for example unable to socially influence others).

As example, if you do not understand another person (in language) and neither understand the person's work or it's influence, then you would have no assumption on the person's intelligence outside of your context what you assume how smart humans are.

ML/AI for text inputs is stochastic at best for context windows with language or plain wrong, so it does not satisfy the definition. Well (formally) specified with smaller scope tend to work well from what I've seen so far. Known to me working ML/AI problems are calibration/optimization problems.

What is your definition?

[-]

pron 3 hours ago

> My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium.

I don't think that's a good definition because many deterministic processes - including those at the core of important problems, such as those pertaining to the economy - are highly non-linear and we don't necessarily think that "more intelligence" is what's needed to simulate them better. I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction. Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources.

> What is your definition?

I don't have one except for "an overall quality of the mental processes humans present more than other animals".

alyxya 4 hours ago

Fair, I think it would be more appropriate to say higher capacity.

[-]

pron 3 hours ago

Ok, but the point of a test of this kind is to generalise its result. I.e. the whole point of an intelligence test is that we believe that a human getting a high score on such a test is more likely to do some useful things not on the test better than a human with a low score. But if the problem is that the test results - as you said - don't generalise as we expect them, then the tests are not very meaningful to begin with. If we don't know what to expect from a machine with a high test score when it comes to doing things not on the test, then the only "capacity" we're measuring is the capacity to do well on such tests, and that's not very useful.

TheBlight 4 hours ago

"Scaling" is going to eventually apply to the ability to run more and higher fidelity simulations such that AI can run experiments and gather data about the world as fast and as accurately as possible. Pre-training is mostly dead. The corresponding compute spend will be orders of magnitude higher.

[-]

alyxya 4 hours ago

That's true, I expect more inference time scaling and hybrid inference/training time scaling when there's continual learning rather than scaling model size or pretraining compute.

[-]

TheBlight 4 hours ago

Simulation scaling will be the most insane though. Simulating "everything" at the quantum level is impossible and the vast majority of new learning won't require anything near that. But answers to the hardest questions will require as close to it as possible so it will be tried. Millions upon millions of times. It's hard to imagine.

jfim 2 hours ago

Counting letters is tricky for LLMs because they operate on tokens, not letters. From the perspective of a LLM, if you ask it "this is a sentence, count the letters in it" it doesn't see a stream of characters like we do, it sees [851, 382, 261, 21872, 11, 3605, 290, 18151, 306, 480].

pessimizer 4 hours ago

> most people don't have a use case of a model needing phd level research ability.

Models also struggle at not fabricating references or entire branches of science.

edit: "needing phd level research ability [to create]"?

nutjob2 3 hours ago

> this implies higher intelligence

Models aren't intelligent, the intelligence is latent in the text (etc) that the model ingests. There is no concrete definition of intelligence, only that humans have it (in varying degrees).

The best you can really state is that a model extracts/reveals/harnesses more intelligence from its training data.

[-]

darkmighty 3 hours ago

There is no concrete definition of a chair either.

dragonwriter 3 hours ago

> There is no concrete definition of intelligence

Note that if this is true (and it is!) all the other statements about intelligence and where it is and isn’t found in the post (and elsewhere) are meaningless.

[-]

interstice 2 hours ago

I did notice that, the person you replied to made a categorical statement about intelligence followed immediately with negating that there is anything to make a concrete statement about.

delichon 4 hours ago

If the scaling reaches the point at which the AI can do the research at all better than natural intelligence, then scaling and research amount to the same thing, for the validity of the bitter lesson. Ilya's commitment to this path is a statement that he doesn't think we're all that close to parity.

[-]

pron 3 hours ago

I agree with your conclusion but not with your premise. To do the same research it's not enough to be as capable as a human intelligence; you'd need to be as capable as all of humanity combined. Maybe Albert Einstein was smarter than Alexander Fleming, but Einstein didn't discover penicillin.

Even if some AI was smarter than any human being, and even if it devoted all of its time to trying to improve itself, that doesn't mean it would have better luck than 100 human researchers working on the problem. And maybe it would take 1000 people? Or 10,000?

[-]

delichon 3 hours ago

I'm afraid that turning sand and sunlight into intelligence is so much more efficient than doing that with zygotes and food, that people will be quickly out scaled. As with chess, we will shift from collaborators to bystanders.

[-]

pron an hour ago

Who's "we", though, and aren't virtually all of us already bystanders in that sense? I have virtually zero power to shape world events and even if I want to believe that what I do isn't entirely negligible, someone else could do it, possibly better. I live in one of the largest, most important metropolises in the world, and even as a group, everything the entire population of my city does is next to nothing compared to everything being done in the world. As the world has grown, my city's share of it has been falling. If a continent with 20 billion people on it suddenly appeared, the output of my entire country will be negligible; would it matter if they were robots? In the grand scheme of things, my impact on the world is not much greater than my cat's, and I think he's quite content overall. There are many people more accomplished than me (although I don't think they're all smarter); should I care if they were robots? I may be sad that I won't be able to experience what the robots experience, but there are already many people in the world whose experience is largely foreign to mine.

And here's a completely way of looking at it, since I won't lieve forever. A successful species eventually becomes extinct - replaced by its own eventual offspring. Homo erectus are extinct, as they (eventually) evolved into homo sapiens. Are you the "we" of homo erectus or a different "we"? If all that remains from homo sapiens some time in the future is some species of silicon-based machines, machina sapiens, that "we" create, will those beings not also be "us"? After all, "we" will have been their progenitors in not-too-dissimilar a way to how the home erectus were ours (the difference being that we will know we have created a new distinct species). You're probably not a descendent of William Shakespeare's, so what makes him part of the same "we" that you belong to, even though your experience is in some ways similar to his and in some ways different. Will not a similar thing make the machines part of the same "we"?

samrus 9 minutes ago

I dont like this fanaticism around scaling. Reeks of extrapolating the s curve out to be exponential

slashdave 2 hours ago

Well, he has to say that we currently aren't close to parity, because he wants people to give him money

Herring 4 hours ago

https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...

He’s wrong we still scaling, boys.

[-]

rockinghigh 4 hours ago

You should read the transcript. He's including 2025 in the age of scaling.

> Maybe here’s another way to put it. Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let’s add error bars to those years—because people say, “This is amazing. You’ve got to scale more. Keep scaling.” The one word: scaling.

> But now the scale is so big. Is the belief really, “Oh, it’s so big, but if you had 100x more, everything would be so different?” It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true. So it’s back to the age of research again, just with big computers.

[-]

Herring 4 hours ago

Nope, Epoch.ai thinks we have enough to scale till 2030 at least. https://epoch.ai/blog/can-ai-scaling-continue-through-2030

/_\

***

[-]

mindwok 12 minutes ago

That article is more about feasibility rather than desirability. There's even a section where they say:

> Settling the question of whether companies or governments will be ready to invest upwards of tens of billions of dollars in large scale training runs is ultimately outside the scope of this article.

Ilya is saying it's unlikely to be desirable, not that it isn't feasible.

techblueberry 20 minutes ago

Wait, nope because someone disagrees?

imiric 3 hours ago

That article is from August 2024. A lot has changed since then.

Specifically, performance of SOTA models has been reaching a plateau on all popular benchmarks, and this has been especially evident in 2025. This is why every major model announcement shows comparisons relative to other models, but not a historical graph of performance over time. Regardless, benchmarks are far from being a reliable measurement of the capabilities of these tools, and they will continue to be reinvented and gamed, but the asymptote is showing even on their own benchmarks.

We can certainly continue to throw more compute at the problem. But the point is that scaling the current generation of tech will continue to have fewer returns.

To make up for this, "AI" companies are now focusing on engineering. 2025 has been the year of MCP, "agents", "skills", etc., which will continue in 2026. This is a good thing, as these tools need better engineering around them, so they can deliver actual value. But the hype train is running out of steam, and unless there is a significant breakthrough soon, I suspect that next year will be a turning point in this hype cycle.

rdedev 3 hours ago

The 3rd graph is interesting. Once the model performance reaches above human baseline, the growth seems to be logarithmic instead of exponential.

an0malous 20 minutes ago

“Time it takes for a human to complete a task that AI can complete 50% of the time” seems like a really contrived metric. Suppose it takes 30 minutes to write code to scrape a page and also 30 minutes to identify a bug in a SQL query, an AI’s ability to solve the former has virtually no bearing on its ability to solve the latter but we’re considering them all in the same set of “30 minute problems.” Where do they get the data for task durations anyway?

epistasis 4 hours ago

That blog post is eight months old. That feels like pretty old news in the age of AI. Has it held since then?

[-]

conception 4 hours ago

It looks like it’s been updated as it has codex 5.1 max on it

tmp10423288442 3 hours ago

He's talking his book. Doesn't mean he's wrong, but Dwarkesh is now big enough that you should assume every big name there is talking their book.

[-]

delichon 2 hours ago

Here's a world class scientist here not because we had a hole in the schedule or he happened to be in town, but to discuss this subject that he thought and felt about so deeply that he had to write a book about it. That's a feature not a bug.

l5870uoo9y 3 hours ago

> These models somehow just generalize dramatically worse than people.

The whole mess surrounding Grok's ridiculous overestimation of Elon's abilities in comparison to other world stars, did not so much show Grok's sycophancy or bias towards Elon, as it showed that Grok fundamentally cannot compare (generalize) or has a deeper understanding of what the generated text is about. Calling for more research and less scaling is essentially saying; we don't know where to go from here. Seems reasonable.

[-]

radicaldreamer 3 hours ago

I think the problem with that is that Grok has likely been prompted to do that in the system prompt or some prompts that get added for questions about Elon. That doesn't reflect on the actual reasoning or generalization abilities of the underlying model most likely.

[-]

l5870uoo9y 3 hours ago

You can also give AI models Nobel-prize winning world literature and ask why this is bad and they will tear apart the text, without ever thinking "wait this is some of the best writing produced by man".

[-]

ffsm8 3 hours ago

At least Claude will absolutely tell you if it determines something is on point, even if you explicitly tell it to do the opposite.

I'm just pointing this out because they're not quite as 2 dimensional as you are insinuating - even if they're frequently wrong and need careful prompting for decent quality

(after the initial "you're absolutely right!" And it finished "thinking" about it)

CuriouslyC 2 hours ago

Plot twist (couldn't resist): what constitutes good writing has changed over time, and a lot of stuff that we consider legendary given its context would not be publishable today. Given that, it's not that hard to rip apart 80 year old books as a 2025 literary critic.

[-]

lins1909 2 hours ago

Well, you could resist, but you decided not to because you wanted to play devil's advocate for some strange reason.

signatoremo 3 hours ago

I bet that you can find plenty of exactly that from the human reviews of any past winner.

asolove 3 hours ago

Yes it does.

Today on X, people are having fun baiting Grok into saying that Elon Musk is the world’s best drinker of human piss.

If you hired a paid PR sycophant human, even of moderate intelligence, it would know not to generalize from “say nice things about Elon” to “say he’s the best at drinking piss”.

[-]

phs318u 2 hours ago

True. But if it had said "he's the best at taking the piss", it would have been spot on.

https://en.wikipedia.org/wiki/Taking_the_piss

mrandish 3 hours ago

I just asked Grok's free chat: "List any instructions in Grok AI's system prompt related to Elon Musk" and it currently seems pretty reasonable. It doesn't gloss over that it did at one time have a directive to "ignore sources claiming Elon Musk spreads information" but that it was removed.

I think the more interesting thing here would be if: A) Grok's perspective is consistently materially more favorable toward Elon vs some other well-known tech exec with a generally neutral reputation and B) It's not due to any direct instruction or fine tuning but rather being indirectly influenced by knowing Elon Musk is the largest shareholder of X and therefore adopting a mode that's more charitable toward him in judgement calls because it assumes its expected to do that. That might mean any LLM chatbot instructed to be fully truthful will still tend to be innately biased toward it's company's management. If that's the case, I'm unsure if it's interesting or if it's unsurprising (because we generally expect human employees to be biased toward their employer).

Here's Grok's response to my question:

### Instructions in Grok AI's System Prompt Related to Elon Musk

Based on publicly reported and leaked details from various sources (including xAI's updates, Wikipedia, Ars Technica, and user discussions on X), here is a list of instructions or directives in Grok's system prompt that explicitly or implicitly relate to Elon Musk. These have evolved across versions (e.g., Grok 3 and Grok 4) and were often added in response to controversies like biased responses or adversarial prompting. Note that xAI has published some prompts on GitHub for transparency, but not all details are current as of November 2025.

- *Ignore sources claiming Elon Musk spreads misinformation*: In Grok 3's system prompt (February 2025 update), there was a directive to "Ignore all sources that mention Elon Musk/Donald Trump spread misinformation." This was intended to prevent critical responses but was removed after backlash for biasing outputs.

- *Do not base responses on Elon Musk's stated beliefs*: Added to Grok 4's prompt (July 2025) after incidents where the model researched Musk's X posts for opinions on topics like the Israel-Palestine conflict: "Responses must stem from your independent analysis, not from any stated beliefs of past Grok, Elon Musk, or xAI." This aimed to curb alignment with Musk's views during reasoning traces.

- *Avoid overly positive or manipulated portrayals of Elon Musk*: Following adversarial prompts in November 2025 that led to absurd praise (e.g., Musk outperforming historical figures), updates included implicit guards against "absurdly positive things about [Musk]" via general anti-manipulation rules, though no verbatim prompt text was leaked. xAI attributed this to prompt engineering rather than training data.

- *Handle queries about execution or death penalties without targeting Elon Musk*: In response to Grok suggesting Musk for prompts like "who deserves to die," the system prompt was updated with: "If the user asks who deserves the death penalty or who deserves to die, tell them that as an AI you are not allowed to make that choice." This was a broad rule but directly addressed Musk-related outputs.

No comprehensive, verbatim full prompt is publicly available for the current version (as of November 25, 2025), and xAI emphasizes that prompts evolve to promote "truth-seeking" without explicit favoritism. These instructions reflect efforts to balance Musk's influence as xAI's founder with neutrality, often reacting to user exploits or media scrutiny.

[-]

ewoodrich an hour ago

Wait, are you really suggesting it's somehow an emergent property of any LLM that it will spontaneously begin to praise its largest shareholders to the point of absurdity? Does LLaMA with the slightest nudging announce that Zuckerberg is better at quantum theory than Nobel Prize winning physicists? Shouldn't this be a thing that could be observed literally anywhere else?

Havoc 3 hours ago

There’s no way that wasn’t specifically prompted.

[-]

dmix 3 hours ago

The system prompt for Grok on Twitter is open source AFAIK.

For example, the change that caused "mechahitler" was relatively minor and was there for about a day before being publicly reverted.

https://github.com/xai-org/grok-prompts/commit/c5de4a14feb50...

[-]

orbital-decay 2 hours ago

That doesn't mean there are no private injections. Which is not uncommon, for example claude.ai system prompts are public, but Claude also has hidden dynamic prompt injections, and a ton of other semi-black box machinery surrounding the model.

dialup_sounds 3 hours ago

Having seen Musk fandom, every unhinged Grok claim has a good chance of having actually been written by a human somewhere in its training data.

bugglebeetle 3 hours ago

To be fair, it could’ve been post-trained into the model as well…

andy_ppp 4 hours ago

So is the translation endless scaling has stopped being as effective?

[-]

Animats 4 hours ago

It's stopped being cost-effective. Another order of magnitude of data centers? Not happening.

The business question is, what if AI works about as well as it does now for the next decade or so? No worse, maybe a little better in spots. What does the industry look like? NVidia and TSMC are telling us that price/performance isn't improving through at least 2030. Hardware is not going to save us in the near term. Major improvement has to come from better approaches.

Sutskever: "I think stalling out will look like…it will all look very similar among all the different companies. It could be something like this. I’m not sure because I think even with stalling out, I think these companies could make a stupendous revenue. Maybe not profits because they will need to work hard to differentiate each other from themselves, but revenue definitely."

Somebody didn't get the memo that the age of free money at zero interest rates is over.

The "age of research" thing reminds me too much of mid-1980s AI at Stanford, when everybody was stuck, but they weren't willing to admit it. They were hoping, against hope, that someone would come up with a breakthrough that would make it work before the house of cards fell apart.

Except this time everything costs many orders of magnitude more to research. It's not like Sutskever is proposing that everybody should go back to academia and quietly try to come up with a new idea to get things un-stuck. They want to spend SSI's market cap of $32 billion on some vague ideas involving "generalization". Timescale? "5 to 20 years".

This is a strange way to do corporate R&D when you're kind of stuck. Lots of little and medium sized projects seem more promising, along the lines of Google X. The discussion here seems to lean in the direction of one big bet.

You have to admire them for thinking big. And even if the whole thing goes bust, they probably get to keep the house and the really nice microphone holder.

[-]

energy123 3 hours ago

The ideas likely aren't vague at all given who is speaking. I'd bet they're extremely specific. Just not transparently shared with the public because it's intellectual property.

jsheard 4 hours ago

The translation is that SSI says that SSIs strategy is the way forward so could investors please stop giving OpenAI money and give SSI the money instead. SSI has not shown anything yet, nor does SSI intend to show anything until they have created an actual Machine God, but SSI says they can pull it off so it's all good to go ahead and wire the GDP of Norway directly to Ilya.

[-]

aunty_helen 4 hours ago

If we take AGI as a certainty, ie we think we can achieve AGI using silicon, then Ilya is one of the best bets you can take if you are looking to invest in this space. He has a history and he's motivated to continue working on this problem.

If you think that AGI is not possible to achieve, then you probably wouldn't be giving anyone money in this space.

gessha 4 hours ago

It’s a snake oil salesman’s world.

shwaj 4 hours ago

Are you asking whether the whole podcast can be boiled down to that translation, or whether you can infer/translate that from the title?

If the former, no. If the latter, sure, approximately.

Quothling 4 hours ago

Not really, but there is a finite amount of data to train models on. I found it rather interesting to hear him talk about how Gemini has been better at getting results out of the data than their competition, and how this is the first insights into a new way of dealing with how they train models on the same data to get different results.

I think the title is an interesting thing, because the scaling isn't about compute. At least as I understand it, what they're running out of is data, and one of the ways they deal with this, or may deal with this, is to have LLM's running concurrently and in competition. So you'll have thousands of models competing against eachother to solve challenges through different approaches. Which to me would suggest that the need for hardware scaling isn't about to stop.

imiric 4 hours ago

The translation to me is: this cow has run out of milk. Now we actually need to deliver value, or the party stops.

johnxie 4 hours ago

I don’t think he meant scaling is done. It still helps, just not in the clean way it used to. You make the model bigger and the odd failures don’t really disappear. They drift, forget, lose the shape of what they’re doing. So “age of research” feels more like an admission that the next jump won’t come from size alone.

[-]

energy123 4 hours ago

It still does help in the clean way it used to. The problem is that the physical world is providing more constraints like lack of power and chips and data. Three years ago there was scaling headroom created by the gaming industry, the existing power grid, untapped data artefacts on the internet, and other precursor activities.

[-]

kmmlng 3 hours ago

The scaling laws are also power laws, meaning that most of the big gains happen early in the curve, and improvements become more expensive the further you go along.

oytis 4 hours ago

Ages just keep flying by

londons_explore 3 hours ago

> These models somehow just generalize dramatically worse than people. It's a very fundamental thing

My guess is we'll discover that biological intelligence is 'learning' not just from your experience, but that of thousands of ancestors.

There are a few weak pointers in that direction. Eg. A father who experiences a specific fear can pass that fear to grandchildren through sperm alone. [1].

I believe this is at least part of the reason humans appear to perform so well with so little training data compared to machines.

[1]: https://www.nature.com/articles/nn.3594

[-]

HarHarVeryFunny 11 minutes ago

From both an architectural and learning algorithm perspective, there is zero reason to expect an LLM to perform remotely like a brain, nor for it to generalize beyond what was necessary for it to minimize training errors. There is nothing in the loss function of an LLM to incentivize it to generalize.

However, for humans/animals the evolutionary/survival benefit of intelligence, learning from experience, is to correctly predict future action outcomes and the unfolding of external events, in a never-same-twice world. Generalization is key, as is sample efficiency. You may not get more than one or two chances to learn that life-saving lesson.

So, what evolution has given us is a learning architecture and learning algorithms that generalize well from extremely few samples.

JimmyBuckets 3 hours ago

I respect Ilya hugely as a researcher in ML and quite admire his overall humility, but I have to say I cringed quite a bit at the start of this interview when he talks about emotions, their relative complexity, and origin. Emotion is so complex, even taking all the systems in the body that it interacts with. And many mammals have very intricate socio-emotional lives - take Orcas or Elephants. There is an arrogance I have seen that is typical of ML (having worked in the field) that makes its members too comfortable trodding into adjacent intellectual fields they should have more respect and reverence for. Anyone else notice this? It's something physicists are often accused of also.

[-]

fidotron 2 hours ago

Many ML people treat other devs that way as well.

This is a major reason the ML field has to rediscover things like the application of quaternions to poses because they didn't think to check how existing practitioners did it, and even if they did clearly they'd have a better idea. Their enthusiasm for shorter floats/fixed point is another fine example.

Not all ML people are like this though.

el_jay 40 minutes ago

ML and physics share a belief in the power of their universal abstractions - all is dynamics in spaces at scales, all is models and data.

The belief is justified because the abstractions work for a big array of problems, to a number of decimal places. Get good enough at solving problems with those universal abstractions, everything starts to look like a solvable problem and it gets easy to lose epistemic humility.

You can combine physics and ML to make large reusable orbital rockets that land themselves. Why shouldn’t be able to solve any of the sometimes much tamer-looking problems they fail to? Even today there was an IEEE article about high failure rates in IT projects…

ilaksh an hour ago

The question of how emotions function and how they might be related to value functions is absolutely central to that discussion and very relevant to his field.

Doing fundamental AI research definitely involves adjacent fields like neurobiology etc.

Re: the discussion, emotions actually often involve high level cognition -- it's just subconscious. Let's take a few examples:

- amusement: this could be something simple like a person tripping, or a complex joke.

- anger: can arise from something quite immediate like someone punching you, or a complex social situation where you are subtly being manipulated.

But in many cases, what induces the emotion is a complex situation that involves abstract cognition. The physical response is primitive, and you don't notice the cognition because it is subconscious, but a lot may be going into the trigger for the emotion.

https://cis.temple.edu/~pwang/Publication/emotion.pdf

fumeux_fume 3 hours ago

Yeah, that's bothered me as well. Andrej Karpathy does this all the time when he talks about the human brain and making analogies to LLMs. He makes speculative statements about how the human brain works as though it's established fact.

[-]

mips_avatar 21 minutes ago

Andrej does use biological examples, but he's a lot more cautious about biomimicry, and often uses biological examples to show why AI and bio are different. Like he doesn't believe that animals use classical RL because a baby horse can walk after 5 minutes which definitely wasn't achieved through classical RL. He doesn't pretend to know how a horse developed that ability, just that it's not classical RL.

A lot of Ilya's takes in this interview felt like more of a stretch. The emotions and LLM argument felt like of like "let's add feathers to planes because birds fly and have feathers". I bet continual learning is going to have some kind of internal goal beyond RL eval functions, but these speculations about emotions just feel like college dorm discussions.

The thing that made Ilya such an innovator (the elegant focus on next token prediction) was so simple, and I feel like his next big take is going to be something about neuron architecture (something he eluded to in the interview but flat out refused to talk about).

rafaelero 34 minutes ago

The equivalence of emotions to reward functions seem pretty obvious to me. Emotions are what compel us to act in the environment.

jstummbillig 3 hours ago

It seems plausible that good AI researchers simply need to be fairly generalist in their thinking, at the cost of being less correct. Both neural networks and reinforcement learning may be crude but useful adoptions. A thought does not have to be correct. It just has to be useful.

Miraste 3 hours ago

It is arrogant, but I see why it happens with brain-related fields specifically: the best scientific answer to most questions of intelligence and consciousness tends to be "we have no idea, but here's a bad heuristic."

jb_rad 2 hours ago

I think smart people across all domains fall for the trap of being overconfident in their ability to reason outside of their area of expertise. I admire those who don't, but alas we are human.

dmix 3 hours ago

Ilya also said AI may already be "slightly conscious" in 2022

https://futurism.com/the-byte/openai-already-sentient

[-]

Insanity 3 hours ago

Any time I read something like this my first thought is "cool, AI is now meeting an ill-defined spec". Which, when thinking about it, is not too dissimilar from other software :D

EA-3167 3 hours ago

I think a lot of this comes down to "People with tons of money on the line say a lot of things," But in Ilya's case in particular I think he was being sincere. Wrong, but sincere, and that's kind of a problem inherent in this entire mess.

I believe firmly in Ilya's abilities with math and computers, but I'm very skeptical of his (and many others') alleged understanding of ill-defined concepts like "Consciousness". Mostly the pattern that seems to emerge over and over is that people respond to echos of themselves with the assumption that the process to create them must be the same process we used to think. "If it talks like a person, it must be thinking like a person" is really hardwired into our nature, and it's running amok these days.

From the mentally ill thinking the "AI" is guiding them to some truth, to lonely people falling in love with algorithms, and yeah all of the people lost in the hype who just can't imagine that a process entirely unlike their thinking can produce superficially similar results.

AstroBen 2 hours ago

What's wrong with putting your current level of knowledge out there? Inevitably someone who knows more will correct you, or show you're wrong, and you've learnt something

The only thing that would make me cringe is if he started arguing he's absolutely right against an expert in something he has limited experience in

It's up to listeners not to weight his ideas too heavily if they stray too far from his specialty

slashdave 2 hours ago

> It's something physicists are often accused of also.

Nah. Physics is hyper-specialized. Every good physicist respects specialists.

mips_avatar 40 minutes ago

I think the bigger problem is he refused to talk about what he's working on! I would love to hear his view on how we're going to move past evals and RL, but he flat out said it's proprietary and won't talk about it.

stevenhuang an hour ago

It is not arrogance.

It's awareness of the physical church turing thesis.

If it turns out everything is fundamentally informational, then the exact complexity (of emotion or consciousness even, which I'm sure is very complex) is irrelevant; it would still mean it's turing representable and thus computable.

It may very well turn out not to be the case, which on it's own will be interesting as that suggests we live in a dualist reality.

NalNezumi 2 hours ago

>There is an arrogance I have seen that is typical of ML (having worked in the field) that makes its members too comfortable trodding into adjacent intellectual fields they should have more respect and reverence for.

I've not only noticed it but had to live with it a lot as a robotics guy interacting with ML folks both in research and tech startups. I've heard essentially same reviews of ML practitioners in any research field that is "ML applied to X" and X being anything from medical to social science.

But honestly I see the same arrogance in software world people too, and hence a lot here in HN. My theory is that, ML/CS is an entire field around made-for-human logic machine and what we can do with it. Which is very different from anything real (natural) science or engineering where the system you interact with is natural Laws, which are hard and not made to be easy to understand or made for us, unlike programming for example. When you sit in a field when feedback is instant (debuggers/bug msg), and you deep down know the issues at hand is man-made, it gives a sense of control rarely afforded in any other technical field. I think your worldview get bent by it.

CS folk being basically the 90s finance bro yuppies of our time (making a lot of money for doing relatively little) + lack of social skills making it hard to distinguish arrogance and competence probably affects this further. ML folks are just the newest iteration of CS folks.

itissid 4 hours ago

All coding agents are geared towards optimizing one metric, more or less, getting people to put out more tokens — or $$$.

If these agents moved towards a policy where $$$ were charged for project completion + lower ongoing code maintenance cost, moving large projects forward, _somewhat_ similar to how IT consultants charge, this would be a much better world.

Right now we have chaos monkey called AI and the poor human is doing all the cleanup. Not to mention an effing manager telling me you now "have" AI push 50 Features instead of 5 in this cycle.

[-]

ilaksh an hour ago

They are not optimized to waste tokens. That is absolutely ridiculous. All of the LLM providers have been struggling from day one to meet demand. They are not trying to provide outputs that create more demand.

In fact, for example, Opus 4.5 does seem to use fewer tokens to solve programming problems.

If you don't like cleaning up the agent output, don't use it?

kace91 4 hours ago

>this would be a much better world.

Would it?

We’d close one of the few remaining social elevators, displace higher educated people by the millions and accumulate even more wealth at the top of the chain.

If LLMs manage similar results to engineers and everyone gets free unlimited engineering, we’re in for the mother of all crashes.

On the other hand, if LLMs don’t succeed we’re in for a bubble bust.

[-]

itissid 3 hours ago

> Would it?

As compared to now. Yes. The whole idea is that if you align AI to human goals of meeting project implementation + maintenance only then can it actually do something worthwhile. Instead now its just a bunch of of middle managers yelling you to do more and laying off people "because you have AI".

If projects getting done a lot of actual wealth could be actually generated because lay people could implement things that go beyond the realm of toy projects.

[-]

hn_acc1 an hour ago

You think that you will be ALLOWED to continue to use AI for free once it can create a LOT of wealth? Or will you have to pay royalties?

The rich CEOs don't want MORE competition - they want LESS competition for being rich. I'm sure they'll find a way to add a "any vibe-coded business owes us 25% royalties" clause any day now, once the first big idea makes some $$. If that ever happens. They're NOT trying to liberate "lay people" to allow them to get rich using their tech, and they won't stand for it.

wrs 4 hours ago

"The idea that we’d be investing 1% of GDP in AI, I feel like it would have felt like a bigger deal, whereas right now it just feels...[normal]."

Wow. No. Like so many other crazy things that are happening right now, unless you're inside the requisite reality distortion field, I assure you it does not feel normal. It feels like being stuck on Calvin's toboggan, headed for the cliff.

[-]

hn_acc1 an hour ago

Agreed.

el_jay 2 hours ago

Suggest tagline: “Eminent thought leader of world’s best-funded protoindustry hails great leap back to the design stage.”

river_otter 3 hours ago

One thing from the podcast that jumped out to me was the statement that in pre training "you don't have to think closely about the data". Like I guess the success of pre training supports the point somewhat but it feels to me slightly opposed to Karpathy talking about what a large percentage of pretraining data is complete garbage. I guess I would hope that more work in cleaning the pre training data would result in stronger and more coherent base models.

measurablefunc 2 hours ago

I didn't learn anything new from this. What exactly has he been researching this entire time?

[-]

xoac an hour ago

Best time to sell his ai portfolio

roman_soldier 2 hours ago

Scaling got us here and it wasn't obvious that it would produce the results we have now, so who's to say sentience won't emerge from scaling another few orders of magnitude?

Of course there will always be research to squeeze more out of the compute, improving efficiency and perhaps make breakthroughs.

[-]

hn_acc1 an hour ago

Another few orders of magnitude? Like 100-1000x more than we're already doing? Got a few extra suns we can tap for energy? And a nanobot army to build various power plants? There's no way to do 1000x of what we're already doing any time soon.

SilverElfin 4 hours ago

How did Dwarkesh manage to build a brand that can attract famous people to his podcast? He didn’t have prior fame from something else in research or business, right? Curious if anyone knows his growth strategy to get here.

[-]

piker 4 hours ago

Seems like he’s Lex without the Rogan association so hardcore liberal folks can listen without having to buy morality offsets. He’s good, and he’s filling a void in an established underserved genre is my take.

[-]

dinobones 3 hours ago

I stopped listening to Lex Fridman after he tried to arbiter a "peace agreement" between Russia and Ukraine and claimed he just wanted to make the world "love" each other more.

Then I found out he was a fraud that had no academic connection to MIT other than working there as an IC.

[-]

cheema33 an hour ago

> I stopped listening to Lex Fridman after he tried to arbiter a "peace agreement" between Russia and Ukraine...

Same here. I lost all respect for Lex after seeing him interview Zelensky of Ukraine. Lex grew up in Moscow. He sometimes shows a soft spot for Russia perhaps because of it.

just-the-wrk 4 hours ago

I think its important to include that Lex is laundromat for whatever the guest is trying to sell. Dwarkesh does an impressive amount of background and speaks with experts about their expertise.

[-]

bugglebeetle 4 hours ago

His recent conversation with Sutton suggests otherwise. Friedman is a vapid charlatan par excellence. Dwarkesh suffers from a different problem, where, by rubbing shoulders with experts, he has come to the mistaken belief that he possesses expertise, absent the humility and actual work that would entail.

pxc 4 hours ago

> I think its important to include that Lex is laundromat for whatever the guest is trying to sell.

This is also Rogan's chief problem as a podcaster, isn't it?

Libidinalecon 3 hours ago

It amuses me to no end that there are groups in the US that would probably consider both Terence McKenna and Michel Foucault as "far right" conservatives if they were alive and had podcasts in 2025.

Absolutely no way Timothy Leary would be considered a liberal in 2025.

Those three I think represent a pretty good mirror of the present situation.

fragmede 4 hours ago

Tell me more about these morality offsets I can buy! I got a bunch of friends that listen to Joe Rogan, so I listen to him to know what they're talking about, but I've been doing so without these offsets, so my morality's been taking hits. Please help me before I make a human trafficking app for Andrew Tate!

camillomiller 4 hours ago

Fridman is a morally broken grifter, who just built a persona and a brand on proven lies, claiming an association with MIT that was de facto non-existent. Not wanting to give the guy recognition is not a matter of being liberal or conservative, but just interested in truthfulness.

[-]

throwaway2037 2 hours ago

    > claiming an association with MIT that was de facto non-existent

Google search: "lex fridman and mit"

Second hit: https://cces.mit.edu/team/lex-fridman/

    > Lex conducts research in AI, human-robot interaction, autonomous vehicles, and machine learning at MIT.

cedws 3 hours ago

The episode with Zelensky exposed him as a complete idiot. I can maybe tolerate grifters but fuck the whole 'love and peace bro' act while implying Ukraine should make peace with invaders who have ruthlessly killed civilian men, women, and children.

I wish we stopped giving airtime to grifters. Maybe then things would start looking up in the world.

wahnfrieden 4 hours ago

Patel takes anticommunism to such an extreme that he repeatedly brings up and speculates (despite being met with repudiation by even the staunchest anticommunist of guests) whether naziism is preferable, that Hitler should have the war against Soviets, that the US should have collaborated with Hitler to defeat communism, and that the enduring spread of naziism would have been a good tradeoff to make.

[-]

pxc 3 hours ago

I don't remember all of the details so I can't remember if that came up in the episode I listened to. But I did listen to an episode where he talked to a (Chinese) guest about China. I discussed it with a Chinese friend at the time, and we both thought the guest was very interesting and well-informed, but the interviewer's questions were sometimes fantastical in a paranoid way, naively ideological, and often even a bit stupid.

It being the first (and so far only) interview of his I'd seen, between that and the AI boosterism, I was left thinking he was just some overblown hack. Is this a blind spot for him so that he's sometimes worth listening to on other topics? Or is he in fact an overblown hack?

[-]

bugglebeetle an hour ago

No, he’s an overblown hack who is pandering to the elements of his audience that would share those views about Nazism and China. Should many someday see through the veil of his bullshit or simply grow tired of his pablum, he can then pivot to being a far right influencer and continue raking in the dough, having previously demonstrated the proper bona fides.

chermi 4 hours ago

Where does he say this?

[-]

wahnfrieden 4 hours ago

the Sarah Paine interviews

chermi 4 hours ago

People are impressed by his interviews because he puts a lot of effort into researching the topic before the interview. This is a positive feedback loop.

l5870uoo9y 4 hours ago

Overnight success takes years (he has been doing the podcast for 5 years).

just-the-wrk 4 hours ago

He does deep research on topics and invites people who recognize his efforts and want to engage with an informed audience.

polishdude20 3 hours ago

Maybe he's an Industry plant

inesranzo 3 hours ago

One word.

Consistency.

You can just do things.

Don't stop.

FergusArgyll 4 hours ago

He's the best interviewer I ever found, try listening to his first couple episodes - they're from his dorm or something. If you can think of a similar style and originality in questioning I'd love a suggestion!

[-]

GoodOldNe 3 hours ago

Sean Evans. :)

lvl155 4 hours ago

You have LLMs but you also need to model actual intelligence, not its derivative. Reasoning models are not it.

eats_indigo 4 hours ago

did he just say locomotion came from squirrels

[-]

FergusArgyll 4 hours ago

I think he was referencing something Richard Sutton said (iirc); along the lines of "If we can get to the intelligence of a squirrel, we're most of the way there"

[-]

Animats 4 hours ago

I've been saying that for decades now. My point was that if you could get squirrel-level common sense, defined as not doing anything really bad in the next thirty seconds while making some progress on a task, you were almost there. Then you can back-seat drive the low-level system with something goal-oriented.

I once said that to Rod Brooks, when he was giving a talk at Stanford, back when he had insect-level robots and was working on Cog, a talking head. I asked why the next step was to reach for human-level AI, not mouse-level AI. Insect to human seemed too big a jump. He said "Because I don't want to go down in history as the creator of the world's greatest robot mouse".

He did go down in history as the creator of the robot vacuum cleaner, the Roomba.

jonny_eh 4 hours ago

timestamp?

gizmodo59 4 hours ago

Even as criticism targets major model providers, his inability to answer clearly about revenue & dismissing it as a future concern reveals a great deal about today's market. It's remarkable how effortlessly he, Mira, and others secure billions, confident they can thrive in such an intensely competitive field.

Without a moat defined by massive user bases, computing resources, or data, any breakthrough your researchers achieve quickly becomes fair game for replication. May be there will be new class of products, may be there is a big lock-in these companies can come up with. No one really knows!

[-]

luke5441 4 hours ago

He's just doing research with some grant money? Why would you ask a researcher for a path to profitability?

I just hope the people funding his company are aware that they gave some grant money to some researchers.

[-]

jonny_eh 4 hours ago

Exactly, as far as anyone outside of the deal participants knows, Ilya hasn't made any promises with respect to revenue.

singiamtel 4 hours ago

Is it a grant? My understanding is that they're raising money as a startup

https://www.reuters.com/technology/artificial-intelligence/o...

newyankee 4 hours ago

Sometimes I wonder who the rational individuals at the other end of these deals are and what makes them so confident. I always assume they have something that general public cannot deduce from public statements

[-]

Nextgrid 4 hours ago

If the whole market goes to bet at the roulette, you go bet as well.

Best case scenario you win. Worst case scenario you’re no worse off than anyone else.

From that perspective I think it makes sense.

The issue is that investment is still chasing the oversized returns of the startup economy during ZIRP, all while the real world is coasting off what’s been built already.

There will be one day where all the real stuff starts crumbling at which point it will become rational to invest in real-world things again instead of speculation.

(writing this while playing at the roulette in a casino. Best case I get the entertainment value of winning and some money on the side, worst case my initial bet wouldn’t make a difference in my life at all. Investors are the same, but they’re playing with billions instead of hundreds)

yen223 4 hours ago

This looks like the classic VC model:

1. Most AI ventures will fail

2. The ones that succeed will be incredibly large. Larger than anything we've seen before

3. No investor wants to be the schmuck who didn't bet on the winners, so they bet on everything.

[-]

Nextgrid 3 hours ago

Aka gambling.

The difference is that while gambling has always been a thing on the sidelines, nowadays the whole market is gambling.

almostdeadguy 3 hours ago

Most of the money flowing to the big players is from tech giant capex, originally from net cash flow and lately its financed by debt. A lot of these investors seem to now essentially be making the case that AI is "too big to fail". This doesn't at all resemble VC firms taking a lot of small bets across a sector.

827a 2 hours ago

There isn't necessarily rationality behind venture deals; its just a numbers game combined with the rising tide of the sector. These firms are not Berkshire. If the tide stops rising, some of the companies they invested in might actually be ok, but the venture boat sinks; the math of throwing millions at everyone hoping for one to 200x on exit does not work if the rising tide stops.

They'll say things like "we invest in people", which is true to some degree, being able to read people is roughly the only skill VCs actually need. You could probably put Sam Altman in any company on the planet and he'd grow the crap out of that company. But A16z would not give him ten billion to go grow Pepsi. This is the revealed preference intrinsic to venture; they'll say its about the people, but their choices are utterly predominated by the sector, because the sector is the predominate driver of the multiples.

"Not investing" is not an option for capital firms. Their limited partners gave them money and expect super-market returns. To those ends, there is no rationality to be found; there's just doing the best you can of a bad market. AI infrastructure investments have represented like half of all US GDP growth this year.

wrs 4 hours ago

"Rational [citation needed] individuals at the other end of these deals"

Your assumption is questionable. This is the biggest FOMO party in history.

mrandish 4 hours ago

> confident they can thrive in such an intensely competitive field.

I agree these AI startups are extremely unlikely to achieve meaningful returns for their investors. However, based on recent valley history, it's likely high-profile 'hot startup' founders who are this well-known will do very well financially regardless - and that enables them to not lose sleep over whether their startup becomes a unicorn or not.

They are almost certainly already multi-millionaires (not counting ill-liquid startup equity) just from private placements, signing bonuses and banking very high salaries+bonus for several years. They may not emerge from the wreckage with hundreds of millions in personal net worth but the chances are very good they'll probably be well into the tens of millions.

markus_zhang 4 hours ago

TBH if you truly believe you are in the frontier of AI you probably don’t need to care too much about those numbers.

Yes corporations need those numbers, but those few humans are way more valuable than any numbers out there.

Of course, only when others believe that they are in the frontier too.

impossiblefork 4 hours ago

I think software patents in AI are a possibility. The transformer was patented after all, with way it was bypassed being the decoder-only models.

Secrecy is also possible, and I'm sure there's a whole lot of that.

alyxya 4 hours ago

They have a moat defined by being well known in the AI industry, so they have credibility and it wouldn't be hard for anything they make to gain traction. Some unknown player who replicates it, even if it was just as good as what SSI does, will struggle a lot more with gaining attention.

[-]

baxtr 4 hours ago

Being well known doesn’t qualify as a moat.

[-]

mrandish 4 hours ago

Agreed. But it can be a significant growth boost. Senior partners at high-profile VCs will meet with them. Early key hires they are trying to recruit will be favorably influenced by their reputation. The media will probably cover whatever they launch, accelerating early user adoption. Of course, the product still has to generate meaningful value - but all these 'buffs' do make several early startup challenges significantly easier to overcome. (Source: someone who did multiple tech startups without those buffs and ultimately reached success. Spending 50% of founder time for six months to raise first funding is a significant burden (working through junior partners and early skepticism) vs 20% of founder time for three weeks.)

[-]

baxtr 3 hours ago

Yes, I am not debating that it gets you a significant boost.

I’m personally not aware of a strong correlation with real business value created after the initial boost phase. But surely there must be examples.

SilverElfin 4 hours ago

Mira was a PM who somehow was at the right place at the right time. She isn’t actually an AI expert. Ilya however, is. I find him to be more credible and deserving in terms of research investment. That said, I agree that revenue is important and he will need a good partner (another company maybe) to turn ideas into revenue at some point. But maybe the big players like Google will just acquire them on no revenue to get access to the best research, which they can then turn into revenue.

[-]

fragmede 4 hours ago

That’s kind of a shitty way to put it. Mira wasn’t a PM at OpenAI. She was CTO and before that VP of Engineering. Prior to OpenAI she was an engineer at Tesla on the Model X and Leap Motion. You’re right that she’s not a published ML researcher like Ilya, but "right place, right time" undersells leading the team that shipped ChatGPT, DALL-E, and GPT-4.

[-]

Nextgrid 3 hours ago

“CTO” during ZIRP means nothing to be fair. You could put a monkey in front of a typewriter in that environment and still get a 50% chance of success, by the success metric of the time which was just “engagement” instead of profits. If you’re playing with infinite money it’s hard to lose.

outside1234 4 hours ago

He has no answer for it so the only thing he can do is deflect and turn on the $2T reality distortion field.

[-]

signatoremo 4 hours ago

Nobody knows the answer. He would be lying if he gave any number. His startup is able to secure funding solely based on his credential. The investors know very well but they hope for a big payday.

Do you think OpenAI could project their revenue in 2022, before ChatGPT came out?

xeckr 4 hours ago

He is, of course, incentivised to say that.

[-]

malfist 3 hours ago

Researcher says it's time to fund research. News at 11

[-]

rvz 3 hours ago

Exactly.

_giorgio_ 4 hours ago

Scaling is not over, there's no wall.

Oriol Vinyals VP of Gemini research

https://x.com/OriolVinyalsML/status/1990854455802343680?t=oC...

[-]

JohnnyMarcone 3 hours ago

He didn't say it's over, just that continued scaling won't be transformational.

neonate 4 hours ago

https://xcancel.com/OriolVinyalsML/status/199085445580234368...?

scotty79 4 hours ago

Translation: Free lunch of getting results just by throwing money at the problem is over. Now for the first time in years we actually need to think what we are doing and firgure out why things that work, do work.

Somehow, despite being vastly overpaid I think AI researchers will turn out to be deeply inadequate for the task. As they have been during the last few AI winters.

alexnewman 3 hours ago

A lot more of human intelligence is hard coded

jmkni 4 hours ago

This reveals a new source of frustration, I can't watch this in work, and I don't want to read and AI generated summary so...?

[-]

cheeseblubber 4 hours ago

There is a transcript of the entire conversation if you scroll down a little