It isn't possible to have "just probabilistic" (maybe a philosophical exception could be made for a uniform random distribution or whatever provides the little dose of randomness required to get nondeterministic results). Probabilities are always in context of a model. LLMs model language but language itself is a model of something else. My money would have been on language modelling nonsense, but that is quite clearly not the case. Turns out it models the world and so do LLMs.
The literal definition of a model is "an informative representation of an object, person, or system". I think you mean something else though, what are you trying to express exactly?
This is the first submission since a year that gives me some hope for humanity. It shows that linguistics is not obsolete. Maybe the last people capable of thinking will be linguists.
It would have been nice to see some version of “I am very surprised by how far LLMs have come since I wrote the stochastic parrots paper, here is how I have revised my thinking.” But there is nothing like that and the author is just doubling down or trying to correct perceived “misinterpretations” of her work.
Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?
I think it’s time to retire the Stochastic Parrots metaphor. A few years ago a lot of us didn’t think LLMs would ever be capable of doing what they can do now. I certainly didn’t. But new methods of training (RLVR) changed the game and took LLMs far beyond just reducing cross entropy on huge corpuses of text. And so we changed our opinions. Shame Emily Bender hasn’t too.
She says explicitly it's not an empirical hypothesis. It's just a label for how they function. Which hasn't really changed even as they've gotten more useful. I haven't followed the full drama but this post is her saying the term has been frequently misapplied and she's basically distancing herself from some critiques that were misinterpreting her intent.
Yes, she addresses this by denying that she's made any empirical hypothesis, but in a way that's some combination of disingenuous and confused.
She also says:
> What I am trying to do... is to help people understand what these systems actually are
Can a phrase that has no empirical content aid people in understanding an empirical phenomenon?
> the astonishing willingness of so many to... turn to synthetic text... for all kinds of weighty decisions.
Why is this astonishing, if the nature of these models as "stochastic parrots" places no limitations whatosever on their empirical capabilities, reliability, etc?
> the field of linguistics is particularly relevant in this moment, as a linguist’s eye view on language technology is desperately needed to help make wise decisions about how we do and don’t use these products
Is it wise to make decisions about a product on the basis of information that has no relevance to how it is actually likely to behave?
(It may be, if one has ethical concerns with "data theft, the exploitative labor practices", etc -- but one could have such concerns about any kind of product, not just a "stochastic parrot", and linguists are not generally academia's experts on, e.g., labor practices.)
Gowers, Tao and Lichtman are especially impressed by the funding of math.inc and the AI for Math Fund, a joint venture of Renaissance Philanthropies and XTX Markets.
Renaissance Philanthropies is a front for VC companies.
They never publish allocated computational resources, prior art or any novel algorithm that is used in the LLMs. For all we know, all accounts that are known to work on math stunts get 20% of total compute.
In other words, they ignore prior art, do not investigate and just celebrate if they get a vibe math result. It isn't science, it is a disgrace.
Is your justification in dismissing Fields medalists that they are impressed by funding? Not even receiving it (I assume you say this because Tao is not funded by AI for Math, but rather an advisor for it)?
Not only would it be a leap to suggest that people automatically lose their integrity by taking funds for projects they believe are useful, especially after involvement with adjacent fields, but you are suggesting merely being impressed by a fund is enough to dismiss their views?
You also have no evidence that Renaissance Philanthropies is a front for VC companies. All news coverage indicates that they seek to be an alternative for high net worth individuals engaging in philanthropy.
Many people discovering Erdos results, engaging in Olympiads etc, are doing so with publicly available models and publish the resources used in the process.
This is getting insane. You have no evidence for your initial claims and didn't respond to a thing I said, and are now claiming using AI for education is "child abuse". Please get help.
Lovely article well worth attention by virtue of its regard for the cultural traits of terminology and its inflections, while also debunking the pervasive lore that "AI" devices are doing anything but the merest resemblance of thinking.
It's rare to read an author who can directly face Brandolini's Law of misinformation asymmetry and not only hold his own against the bullshit but overcome it.
Automated theorem provers are not new, in fact they are very old. One of the most automated is ACL2, which uses the well studied waterfall method (unrelated to waterfall development).
LLMs certainly use something similar, except they understand text as input. LLMs, especially used for marketing stunts, have way more computing power available than any theorem prover ever had. They probably do random restarts if a proof fails which amounts to partially brute forcing.
Lawrence Paulson correctly complained about some of the hype that Lean/LLMs are getting.
ACL2 even uses formulaic text output that describes the proof in human language, despite being all in Common Lisp and not a mythical clanker.
They do not think and use old and well established algorithms or perhaps novel ones that were added.
Proof search isn't new, but I don't think that captures the value of LLMs.
They act as a learned proposal mechanism on top of hard search. Things like suggesting relevant lemmas, tactics, turning intent into formal steps, and ranking branches based on trained knowledge.
Maybe a kind of learned "intuition engine", from a large corpus of mathematical text, that still has to pass a formal checker. This is not really something we've had to this extent before.
> They do not think
That claim seems less useful, unless “think” is defined in a way that predicts some difference in capability. If the objection is that LLMs are not conscious, fine, but that doesn't say much about whether they can help produce correct formal proofs.
"Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind."
Modelling text describing the world is not modelling (some aspect) of the world?
Modelling the probability that a reader likes or dislike a piece of text is not modelling (some aspect) of a reader's state of mind?
No? There's no model involved. It's all just probabilistic. LLMs understand what you're thinking as well as a mood ring.
It isn't possible to have "just probabilistic" (maybe a philosophical exception could be made for a uniform random distribution or whatever provides the little dose of randomness required to get nondeterministic results). Probabilities are always in context of a model. LLMs model language but language itself is a model of something else. My money would have been on language modelling nonsense, but that is quite clearly not the case. Turns out it models the world and so do LLMs.
The literal definition of a model is "an informative representation of an object, person, or system". I think you mean something else though, what are you trying to express exactly?
Nothing about an LLM is “just”. In what precise sense do you mean it is probabilistic?
This is the first submission since a year that gives me some hope for humanity. It shows that linguistics is not obsolete. Maybe the last people capable of thinking will be linguists.
What a hill to die on.
It would have been nice to see some version of “I am very surprised by how far LLMs have come since I wrote the stochastic parrots paper, here is how I have revised my thinking.” But there is nothing like that and the author is just doubling down or trying to correct perceived “misinterpretations” of her work.
Meanwhile you have multiple Fields Medalists (Tau, Gowers) saying they’re very impressed by LLMs’ mathematical reasoning, something that the stochastic parrots thesis (if it has any empirically-predictive content at all) would predict was impossible. I doubt Tau and Gowers thought much of LLMs a few years ago either. But they changed their minds. Who do you want to listen to?
I think it’s time to retire the Stochastic Parrots metaphor. A few years ago a lot of us didn’t think LLMs would ever be capable of doing what they can do now. I certainly didn’t. But new methods of training (RLVR) changed the game and took LLMs far beyond just reducing cross entropy on huge corpuses of text. And so we changed our opinions. Shame Emily Bender hasn’t too.
Sigh.
She says explicitly it's not an empirical hypothesis. It's just a label for how they function. Which hasn't really changed even as they've gotten more useful. I haven't followed the full drama but this post is her saying the term has been frequently misapplied and she's basically distancing herself from some critiques that were misinterpreting her intent.
> stochastic parrots thesis (if it has any empirically-predictive content at all
Did you read TFA? This is precisely one of the non-questions that she answers.
Yes, she addresses this by denying that she's made any empirical hypothesis, but in a way that's some combination of disingenuous and confused.
She also says:
> What I am trying to do... is to help people understand what these systems actually are
Can a phrase that has no empirical content aid people in understanding an empirical phenomenon?
> the astonishing willingness of so many to... turn to synthetic text... for all kinds of weighty decisions.
Why is this astonishing, if the nature of these models as "stochastic parrots" places no limitations whatosever on their empirical capabilities, reliability, etc?
> the field of linguistics is particularly relevant in this moment, as a linguist’s eye view on language technology is desperately needed to help make wise decisions about how we do and don’t use these products
Is it wise to make decisions about a product on the basis of information that has no relevance to how it is actually likely to behave?
(It may be, if one has ethical concerns with "data theft, the exploitative labor practices", etc -- but one could have such concerns about any kind of product, not just a "stochastic parrot", and linguists are not generally academia's experts on, e.g., labor practices.)
Gowers, Tao and Lichtman are especially impressed by the funding of math.inc and the AI for Math Fund, a joint venture of Renaissance Philanthropies and XTX Markets.
Renaissance Philanthropies is a front for VC companies.
They never publish allocated computational resources, prior art or any novel algorithm that is used in the LLMs. For all we know, all accounts that are known to work on math stunts get 20% of total compute.
In other words, they ignore prior art, do not investigate and just celebrate if they get a vibe math result. It isn't science, it is a disgrace.
Is your justification in dismissing Fields medalists that they are impressed by funding? Not even receiving it (I assume you say this because Tao is not funded by AI for Math, but rather an advisor for it)?
Not only would it be a leap to suggest that people automatically lose their integrity by taking funds for projects they believe are useful, especially after involvement with adjacent fields, but you are suggesting merely being impressed by a fund is enough to dismiss their views?
You also have no evidence that Renaissance Philanthropies is a front for VC companies. All news coverage indicates that they seek to be an alternative for high net worth individuals engaging in philanthropy.
Many people discovering Erdos results, engaging in Olympiads etc, are doing so with publicly available models and publish the resources used in the process.
Renaissance "Philanthropy" brainwashes children with AI, which is child abuse:
https://www.renaissancephilanthropy.org/insights/renaissance...
https://www.renaissancephilanthropy.org/insights/embedding-a...
It promotes "agentic science", which will destroy science further:
https://www.renaissancephilanthropy.org/insights/open-source...
No one publishes. Please show me papers about the math proof logic in ChatGPT that are as detailed as those from Boyer/Moore/Kaufman for prior work.
If they are on arxiv.org with 50 authors in a sea of slop, I didn't find them. If they exist, they are certainly not from Gowers, Tao or Lichtman.
You have all the upper hand because your AI shills back you up here, but nothing of substance.
This is getting insane. You have no evidence for your initial claims and didn't respond to a thing I said, and are now claiming using AI for education is "child abuse". Please get help.
Lovely article well worth attention by virtue of its regard for the cultural traits of terminology and its inflections, while also debunking the pervasive lore that "AI" devices are doing anything but the merest resemblance of thinking.
It's rare to read an author who can directly face Brandolini's Law of misinformation asymmetry and not only hold his own against the bullshit but overcome it.
TIL that the "merest resemblance of thinking" is enough to take gold at IMO.
Automated theorem provers are not new, in fact they are very old. One of the most automated is ACL2, which uses the well studied waterfall method (unrelated to waterfall development).
LLMs certainly use something similar, except they understand text as input. LLMs, especially used for marketing stunts, have way more computing power available than any theorem prover ever had. They probably do random restarts if a proof fails which amounts to partially brute forcing.
Lawrence Paulson correctly complained about some of the hype that Lean/LLMs are getting.
ACL2 even uses formulaic text output that describes the proof in human language, despite being all in Common Lisp and not a mythical clanker.
They do not think and use old and well established algorithms or perhaps novel ones that were added.
Proof search isn't new, but I don't think that captures the value of LLMs.
They act as a learned proposal mechanism on top of hard search. Things like suggesting relevant lemmas, tactics, turning intent into formal steps, and ranking branches based on trained knowledge.
Maybe a kind of learned "intuition engine", from a large corpus of mathematical text, that still has to pass a formal checker. This is not really something we've had to this extent before.
> They do not think
That claim seems less useful, unless “think” is defined in a way that predicts some difference in capability. If the objection is that LLMs are not conscious, fine, but that doesn't say much about whether they can help produce correct formal proofs.
And also create novel math proofs.
Perhaps actual thinking is not automatically necessary for that either! - and the LLM is proof.