LeCun has been giving the same talk with literally the exact same slides for the past 3 years. JEPA still hasn't delivered despite FAIR's substantial backing.
LeCun seems like an extremely smart person that suffers from an overgrown ego. I got that strong impression from seeing his Twitter feed - basically "smarter than thou".
Generative world models seem to be doing ok. Dreamer V4 looks promising. I’m not 100% sold on the necessity of EBMs.
Also I’m skeptical that self-supervised learning is sufficient for human level learning. Some of our ability is innate. I don’t believe it’s possible for statistical methods to learn language from raw audiovisual data the way children can.
Human DNA has under 1GB of information content in it. Most of which isn't even used in the brain. And the brain doesn't have a mechanism to read data out from the DNA efficiently.
This puts a severe limit on how much "innate knowledge" a human can possibly have.
Sure, human brain has a strong inductive bias. It also has a developmental plan, and it follows that plan. It guides its own learning, and ends up being better at self-supervised learning than even the very best of our AIs. But that guidance, that sequencing and that bias must all be created by the rules encoded in the DNA, and there's only this much data in the DNA.
It's quite possible that the human brain has a bunch of simple and clever learning tricks that, if we pried out and applied to our AIs, would give us x100 the learning rate and x1000 the sample efficiency. Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.
The complexity of the human body surely weighs in at over 1 GB.
I think of DNA analogously to the rules of cellular automata. The entropy of the rules is much less than the entropy of the dynamical system the rules describe.
The body is filled with innate knowledge. The organs all know what to do. The immune system learns to detect intruders (without synapses). Even a single cell organism is capable of complex and fluid goal-oriented behavior, as Michael Levine attests.
I think the assumption that all knowledge exists in the brain, and all knowledge in the brain is encoded by neuronal weights, is probably too simplistic.
Regarding language and vision, I think the cognitive scientists are right: it is better to view these as organs or “modules” suited to a function. Damage Broca’s area and you get Broca’s aphasia. Damage your lung and you get trouble breathing. Neither of these looks like the result of statistical learning from randomly initialized parameters.
Damage Broca’s area early in brain development and... nothing happens?
Human brain has specialized regions, but there's still a lot of flexibility in it. It isn't a hard fixed function system at all. A lung can't just start pumping blood to compensate for a heart issue, but similar things happen to brain regions. The regions can end up repurposed, and an impressive amount of damage can be routed around.
A lot of the "brain damage" studies seem to point at a process not too dissimilar to ablation in artificial neural networks. You can null out some of the weights in a pretrained neural network, and that can fuck it up. But if you start fine-tuning the network afterwards, or train from scratch, with those weights still pinned to zero? The resulting performance can end up quite similar to a control case.
A major difference is that human brain doesn't separate training from inference. Both are always happening - but the proportion varies. It may be nigh-impossible to fully "undo" some types of damage if it happens after a certain associated development window has closed, but easy enough if the damage happens beforehand.
It's not magic, but it's just magic enough to conclusively disprove "brain regions are fixed function" - if information-theoretic reasons somehow weren't enough for you.
Way too much weird re-routing and re-purposing can happen in the brain for that to be the case.
Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment.
> It's not magic, but it's just magic enough to conclusively disprove "brain regions are fixed function"
You cannot confidently disprove anything unless you can back your statement.
> information-theoretic reasons somehow weren't enough for you.
Your “Information-theoric reasoning” is completely pointless though.
> Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment
Nobody said otherwise. But that doesn't mean everything is being learned either. There are many things a human is born with that it doesn't have to learn. (It's pretty obvious when you have kids: as primates humans are naturally attracted to climbing trees, and they will naturally collect stones and sticks, which is what primitive tools are made of).
And all of that "innate knowledge" still fits into under 1 gigabyte of compressed DNA.
1 gigabyte. That's the absolute limit of how much "innate knowledge" a human brain can have in it! Every single instinct, every learning algorithm, every innate behavior and every little cue a brain uses to build itself has to fit into a set of data just 1 gigabyte in size.
Clearly, nature must have found some impressively large levers - to be able to build and initialize brain with 90 billion connected neurons in it off something this small.
> all of that "innate knowledge" still fits into under 1 gigabyte of compressed DNA.
Yes, the same way Turing completeness fits in 8bits, which is both perfectly true (see rule 110) and perfectly useless to derive any conclusion about the limitation of innate knowledge.
Similarly, just because you can encode the number Pi in just two bytes (the ASCII for both “p” and “i” letters) it doesn't mean the number contains only two bytes of entropy.
> Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.
Being overparameterized alone doesn't explain how fast we learn things compared to deep neural nets though. Quite the opposite actually.
I don't know why people really dislike the idea of innate knowledge so much, it's obvious other animals have tons of it, why would we be any different.
The problem with assuming tons of innate knowledge is that it needs to be stored somewhere. DNA certainly contains enough information to determine the development of various different neuron types and which kinds of other neurons they connect to, but it certainly cannot specify weights for every individual synapse, except for animals with very low neuron counts.
So the existence of a sensorimotor feedback loop for a basic behavior is innate (e.g. moving forward to seek food), but the fine-tuning for reliabily executing this behavior while adapting to changing conditions (e.g. moving over difficult terrain with an injured limb after spotting a tasty plant) needs to be learned through interacting with the environment. (Stumbling around eating random stuff to find out what is edible.)
>certainly cannot specify weights for every individual synapse
That's not the only way to one could encode innate knowledge. Besides, we have demonstrated that animals have innate knowledge experimentally many times, the only reason we can't do this to humans is that it would be horrifically unethical.
>Stumbling around eating random stuff to find out what is edible
Plenty of animals have innate knowledge about what is and isn't edible: it's why, for example, tasty things generally speaking smell good and why things that are bad (rotting meat) smell horrific.
I'm not saying that there's no innate knowledge. This entire list of reflexes https://en.wikipedia.org/wiki/List_of_reflexes is essentially a list of innate knowledge in humans, many of which have been demonstrated in newborns, apparently without considering such experiments unethical.
I'm saying that there are limits to how much knowledge can be inherited. I.e. the question isn't "Where could innate knowledge be encoded other than in synapses?" but "Considering the extremely large number of synapses involved in complex behavior far exceeds genetic storage capacity, how are their weights determined?" And since we know that in addition to having innate behaviors, animals are also capable of learning (e.g. responding to artificial stimuli not found in nature), it stands to reason that most synapse weights must be set by a dynamic learning process.
> That's not the only way to one could encode innate knowledge.
Maybe sections could be read from DNA and broadcast as action potentials?
There's already ribosomes that go over RNA. You'd need a variant which instead of making amino acids, would read out the base pairs and make something that causes action potentials to happen based on the contents.
Some people just believe there is no innate knowledge or we dont need it if we just scale/learn better (in the direction of Bitter Lesson)
(ML) Academia is also heavily biased against it due to mainly two reasons:
- Its harder to publish, since if you learn Task X with innate Knowledge, its not as general, so reviewer can claims its just (feature) engineering - Which hurts acceptance, so people always try to frame their work as general as possible
- Historical reasons due to the conflict the symbolic community (which rely heavily on innate knowledge)
You’d have to explain where that innate knowledge is stored though. The entire human genome is less than a GB if I remember correctly. Some of that being allocated to ”priors” for neural circuit development seems reasonable, but it can’t be very detailed across everything a brain does. The rest of the body needs some bytes too.
Not really - that 1GB is the seed for a procedural generation mechanism that has been finely tuned to its unfolding in an environment over 4 billion years.
Sure. But that’s just compression, right? I guess you could argue that some information is stored outside the genome, in the structure of proteins etc. But the counter argument is that that information is quickly lost in cell divisions. Only DNA has the error correcting mechanisms needed to reliably store information, is my impression.
But generative models are always going to seem like they are doing ok. That's how they work. They are good at imitating and producing misleading demos.
Agree with LeCun that current ai doesn’t exhibit anything close to actual intelligence.
I think the solution lies into cracking the core algorithms used by nature to build the brain. Too bad it’s such an inscrutable hairball of analog spaghetti code.
This seems like the same exact talk LeCun has been giving for years, basically pushing JEPA, world models, and attacking contemporary LLMs. Maybe he’s right but it also seems like he’s wrong in terms of timing or impact. LLMs have been going strong for longer than he expected, and providing more value than expected.
This is also my read; JEPA is a genuinely interesting concept, but he's been hawking it for several years, and nothing has come of it in the domains in which LLMs are successful. Hoping that changes at some point!
Yeah, he was quite vocal in his opinion that they would plateau earlier than they did and that little value would be derived from them because they're just stochastic parrots. Agree with him that they're probably not sufficient for AGI, but, at least in my experience, they're adding a lot of value and they're continuously performing better in a range of tasks that he wasn't expecting them to.
> Some people were technical, but they didn't do technical work for many months, or longer, and now are no longer technical, they fell behind, but still think they are.
I think that LeCun has correctly identified that LLM is only one type of intelligence and that AGI/AMI needs to combine multiple other types … hierarchical goal setting, attention/focus management, and so on.
Seems that he is able to garner support for his ideas and to make progress at the leading edge - yes a little bit hard to take the “I know better” style, but then many innovations are driven by narcissism.
There is a lot of "transformer LLMs are flawed" going around, and a lot of alternative architectures being proposed, or even trained and demonstrated. But so far? There's nothing that would actually outperform transformer LLMs at their strengths. Most alternatives are sidegrades at best.
For how "naive" transformer LLMs seem, they sure set a high bar.
Saying "I know better" is quite easy. Backing that up is really hard.
> There is a lot of "transformer LLMs are flawed" going around, and a lot of alternative architectures being proposed, or even trained and demonstrated. But so far? There's nothing that would actually outperform transformer LLMs at their strengths. Most alternatives are sidegrades at best.
That's kind of an awkward timing to say that, as alternative to transformers have flourished over the past few weeks (Qwen3-Next, Granite 4).
But IIRC Le Cun's criticism applies to more than just transformers and to next-token predictors as a whole.
Both are still transformer LLMs at their core, and perform as such. They don't show a massive jump in capabilities over your average transformer.
Improvements in long context efficiency sure are nice, and I do think that trying to combine transformers with architectures that aren't cursed with O(n^2) on sequence length is a promising approach. But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
> They don't show a massive jump in capabilities over your average transformer
Long context is a massive capability improvement.
> But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
Transformers themselves were an incremental improvement over RNN with attention, and in terms of capabilities they weren't immediately superior to their predecessor.
What changed the game was that they were vastly cheaper to train which allowed to train massive models on phenomenal amounts of data.
Linear attention models being much more compute-efficient than transformers on longer context may result in a similar breakthrough.
It's very hard to tell in advance what will be a marginal improvement and what will be a game changer.
LeCun has been giving the same talk with literally the exact same slides for the past 3 years. JEPA still hasn't delivered despite FAIR's substantial backing.
LeCun seems like an extremely smart person that suffers from an overgrown ego. I got that strong impression from seeing his Twitter feed - basically "smarter than thou".
Generative world models seem to be doing ok. Dreamer V4 looks promising. I’m not 100% sold on the necessity of EBMs.
Also I’m skeptical that self-supervised learning is sufficient for human level learning. Some of our ability is innate. I don’t believe it’s possible for statistical methods to learn language from raw audiovisual data the way children can.
Human DNA has under 1GB of information content in it. Most of which isn't even used in the brain. And the brain doesn't have a mechanism to read data out from the DNA efficiently.
This puts a severe limit on how much "innate knowledge" a human can possibly have.
Sure, human brain has a strong inductive bias. It also has a developmental plan, and it follows that plan. It guides its own learning, and ends up being better at self-supervised learning than even the very best of our AIs. But that guidance, that sequencing and that bias must all be created by the rules encoded in the DNA, and there's only this much data in the DNA.
It's quite possible that the human brain has a bunch of simple and clever learning tricks that, if we pried out and applied to our AIs, would give us x100 the learning rate and x1000 the sample efficiency. Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.
The complexity of the human body surely weighs in at over 1 GB.
I think of DNA analogously to the rules of cellular automata. The entropy of the rules is much less than the entropy of the dynamical system the rules describe.
The body is filled with innate knowledge. The organs all know what to do. The immune system learns to detect intruders (without synapses). Even a single cell organism is capable of complex and fluid goal-oriented behavior, as Michael Levine attests.
I think the assumption that all knowledge exists in the brain, and all knowledge in the brain is encoded by neuronal weights, is probably too simplistic.
Regarding language and vision, I think the cognitive scientists are right: it is better to view these as organs or “modules” suited to a function. Damage Broca’s area and you get Broca’s aphasia. Damage your lung and you get trouble breathing. Neither of these looks like the result of statistical learning from randomly initialized parameters.
Damage Broca’s area early in brain development and... nothing happens?
Human brain has specialized regions, but there's still a lot of flexibility in it. It isn't a hard fixed function system at all. A lung can't just start pumping blood to compensate for a heart issue, but similar things happen to brain regions. The regions can end up repurposed, and an impressive amount of damage can be routed around.
A lot of the "brain damage" studies seem to point at a process not too dissimilar to ablation in artificial neural networks. You can null out some of the weights in a pretrained neural network, and that can fuck it up. But if you start fine-tuning the network afterwards, or train from scratch, with those weights still pinned to zero? The resulting performance can end up quite similar to a control case.
A major difference is that human brain doesn't separate training from inference. Both are always happening - but the proportion varies. It may be nigh-impossible to fully "undo" some types of damage if it happens after a certain associated development window has closed, but easy enough if the damage happens beforehand.
> Damage Broca’s area early in brain development and... nothing happens?
Citation needed.
Cerebral plasticity is a thing, but its not magic either.
It's not magic, but it's just magic enough to conclusively disprove "brain regions are fixed function" - if information-theoretic reasons somehow weren't enough for you.
Way too much weird re-routing and re-purposing can happen in the brain for that to be the case.
Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment.
> It's not magic, but it's just magic enough to conclusively disprove "brain regions are fixed function"
You cannot confidently disprove anything unless you can back your statement.
> information-theoretic reasons somehow weren't enough for you.
Your “Information-theoric reasoning” is completely pointless though.
> Human brain implements a learning algorithm of some kind - neuroscientists disagree on the specifics, but not on the core notion. It doesn't ship with all the knowledge it needs, or anywhere close. To work well, it has to learn, and it has to learn by getting information from the environment
Nobody said otherwise. But that doesn't mean everything is being learned either. There are many things a human is born with that it doesn't have to learn. (It's pretty obvious when you have kids: as primates humans are naturally attracted to climbing trees, and they will naturally collect stones and sticks, which is what primitive tools are made of).
And all of that "innate knowledge" still fits into under 1 gigabyte of compressed DNA.
1 gigabyte. That's the absolute limit of how much "innate knowledge" a human brain can have in it! Every single instinct, every learning algorithm, every innate behavior and every little cue a brain uses to build itself has to fit into a set of data just 1 gigabyte in size.
Clearly, nature must have found some impressively large levers - to be able to build and initialize brain with 90 billion connected neurons in it off something this small.
> all of that "innate knowledge" still fits into under 1 gigabyte of compressed DNA.
Yes, the same way Turing completeness fits in 8bits, which is both perfectly true (see rule 110) and perfectly useless to derive any conclusion about the limitation of innate knowledge.
Similarly, just because you can encode the number Pi in just two bytes (the ASCII for both “p” and “i” letters) it doesn't mean the number contains only two bytes of entropy.
Your comment is completely nonsensical. Are you disagreeing just to disagree?
Applying information theory out of its domain is nonsensical, yes. That's the point.
And for that reason, your argument about 1GB of data makes absolutely no sense at all.
Bullshit. We're talking about information, and were always talking about information.
The problem is that you claim that you can quantify it based on bad use of irrelevant tools.
> Or it could be that a single neuron in the human brain is worth 10000 neurons in an artificial neural network, and thus, the biggest part of the "secret" of human brain is just that it's hilariously overparameterized.
Being overparameterized alone doesn't explain how fast we learn things compared to deep neural nets though. Quite the opposite actually.
Only 1 GB code maybe, but dependent on the universe as the runtime environment.
I don't know why people really dislike the idea of innate knowledge so much, it's obvious other animals have tons of it, why would we be any different.
The problem with assuming tons of innate knowledge is that it needs to be stored somewhere. DNA certainly contains enough information to determine the development of various different neuron types and which kinds of other neurons they connect to, but it certainly cannot specify weights for every individual synapse, except for animals with very low neuron counts.
So the existence of a sensorimotor feedback loop for a basic behavior is innate (e.g. moving forward to seek food), but the fine-tuning for reliabily executing this behavior while adapting to changing conditions (e.g. moving over difficult terrain with an injured limb after spotting a tasty plant) needs to be learned through interacting with the environment. (Stumbling around eating random stuff to find out what is edible.)
>certainly cannot specify weights for every individual synapse
That's not the only way to one could encode innate knowledge. Besides, we have demonstrated that animals have innate knowledge experimentally many times, the only reason we can't do this to humans is that it would be horrifically unethical.
>Stumbling around eating random stuff to find out what is edible
Plenty of animals have innate knowledge about what is and isn't edible: it's why, for example, tasty things generally speaking smell good and why things that are bad (rotting meat) smell horrific.
I'm not saying that there's no innate knowledge. This entire list of reflexes https://en.wikipedia.org/wiki/List_of_reflexes is essentially a list of innate knowledge in humans, many of which have been demonstrated in newborns, apparently without considering such experiments unethical.
I'm saying that there are limits to how much knowledge can be inherited. I.e. the question isn't "Where could innate knowledge be encoded other than in synapses?" but "Considering the extremely large number of synapses involved in complex behavior far exceeds genetic storage capacity, how are their weights determined?" And since we know that in addition to having innate behaviors, animals are also capable of learning (e.g. responding to artificial stimuli not found in nature), it stands to reason that most synapse weights must be set by a dynamic learning process.
Yeah but the point was that people are uncomfortable with positing any innate knowledge at all.
> That's not the only way to one could encode innate knowledge.
Maybe sections could be read from DNA and broadcast as action potentials?
There's already ribosomes that go over RNA. You'd need a variant which instead of making amino acids, would read out the base pairs and make something that causes action potentials to happen based on the contents.
Various reasons
Some people just believe there is no innate knowledge or we dont need it if we just scale/learn better (in the direction of Bitter Lesson)
(ML) Academia is also heavily biased against it due to mainly two reasons: - Its harder to publish, since if you learn Task X with innate Knowledge, its not as general, so reviewer can claims its just (feature) engineering - Which hurts acceptance, so people always try to frame their work as general as possible - Historical reasons due to the conflict the symbolic community (which rely heavily on innate knowledge)
You’d have to explain where that innate knowledge is stored though. The entire human genome is less than a GB if I remember correctly. Some of that being allocated to ”priors” for neural circuit development seems reasonable, but it can’t be very detailed across everything a brain does. The rest of the body needs some bytes too.
Not really - that 1GB is the seed for a procedural generation mechanism that has been finely tuned to its unfolding in an environment over 4 billion years.
DNA is the ultimate demoscene exe
Sure. But that’s just compression, right? I guess you could argue that some information is stored outside the genome, in the structure of proteins etc. But the counter argument is that that information is quickly lost in cell divisions. Only DNA has the error correcting mechanisms needed to reliably store information, is my impression.
But generative models are always going to seem like they are doing ok. That's how they work. They are good at imitating and producing misleading demos.
Agree with LeCun that current ai doesn’t exhibit anything close to actual intelligence.
I think the solution lies into cracking the core algorithms used by nature to build the brain. Too bad it’s such an inscrutable hairball of analog spaghetti code.
The mistake you and many others are making is assuming that it is algorithmic.
Humans are not intrinsically machines. Through the education system and so on, humans are taught to somewhat behave as such.
This seems like the same exact talk LeCun has been giving for years, basically pushing JEPA, world models, and attacking contemporary LLMs. Maybe he’s right but it also seems like he’s wrong in terms of timing or impact. LLMs have been going strong for longer than he expected, and providing more value than expected.
This is also my read; JEPA is a genuinely interesting concept, but he's been hawking it for several years, and nothing has come of it in the domains in which LLMs are successful. Hoping that changes at some point!
>LLMs have been going strong for longer than he expected
Have they? They still seem to be a dead end toward AGI.
Yeah, he was quite vocal in his opinion that they would plateau earlier than they did and that little value would be derived from them because they're just stochastic parrots. Agree with him that they're probably not sufficient for AGI, but, at least in my experience, they're adding a lot of value and they're continuously performing better in a range of tasks that he wasn't expecting them to.
2 more years bro.
To quote Zuck:
> Some people were technical, but they didn't do technical work for many months, or longer, and now are no longer technical, they fell behind, but still think they are.
Where is this from?
https://youtu.be/WuTJkFvw70o?t=2340
I think that LeCun has correctly identified that LLM is only one type of intelligence and that AGI/AMI needs to combine multiple other types … hierarchical goal setting, attention/focus management, and so on.
Seems that he is able to garner support for his ideas and to make progress at the leading edge - yes a little bit hard to take the “I know better” style, but then many innovations are driven by narcissism.
There is a lot of "transformer LLMs are flawed" going around, and a lot of alternative architectures being proposed, or even trained and demonstrated. But so far? There's nothing that would actually outperform transformer LLMs at their strengths. Most alternatives are sidegrades at best.
For how "naive" transformer LLMs seem, they sure set a high bar.
Saying "I know better" is quite easy. Backing that up is really hard.
> There is a lot of "transformer LLMs are flawed" going around, and a lot of alternative architectures being proposed, or even trained and demonstrated. But so far? There's nothing that would actually outperform transformer LLMs at their strengths. Most alternatives are sidegrades at best.
That's kind of an awkward timing to say that, as alternative to transformers have flourished over the past few weeks (Qwen3-Next, Granite 4).
But IIRC Le Cun's criticism applies to more than just transformers and to next-token predictors as a whole.
Both are still transformer LLMs at their core, and perform as such. They don't show a massive jump in capabilities over your average transformer.
Improvements in long context efficiency sure are nice, and I do think that trying to combine transformers with architectures that aren't cursed with O(n^2) on sequence length is a promising approach. But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
> They don't show a massive jump in capabilities over your average transformer
Long context is a massive capability improvement.
> But it's promising as an incremental improvement, not a breakthrough that completely redefines the way AIs are made, the way transformer LLMs themselves did.
Transformers themselves were an incremental improvement over RNN with attention, and in terms of capabilities they weren't immediately superior to their predecessor.
What changed the game was that they were vastly cheaper to train which allowed to train massive models on phenomenal amounts of data.
Linear attention models being much more compute-efficient than transformers on longer context may result in a similar breakthrough.
It's very hard to tell in advance what will be a marginal improvement and what will be a game changer.
excellent point
[dead]
[flagged]
[flagged]
Give it up Yann…LLMs won.
They won for now...