I am in academia and worked in NLP although I would describe myself as NLP adjacent.
I can confirm LLMs have essentially confined a good chunk of historical research into the bin. I suspect there are probably still a few PhD students working on traditional methods knowing full well a layman can do better using the mobile ChatGPT app.
That said traditional NLP has its uses.
Using the VADER model for sentiment analysis while flawed is vastly cheaper than LLMs to get a general idea. Traditional NLP is suitable for many tasks people are now spending a lot of money asking GPT to do just because they know GPT.
I recently did an analysis on a large corpus and VADER was essentially free while the cloud costs to run a Llama based sentiment model was about $1000. I ran both because VADER costs nothing but minimal CPU time.
NLP can be wrong but it can’t be jailbroken and it won’t make stuff up.
It currently costs around $2200 to run Gemini flash lite on all of Wikipedia English. It would probably cost around 10x that much to run sentiment analysis on every Yelp review ever posted. It's true that LLMs still cost a lot for some use cases, but for essentially any business case it's not worth using traditional NLP any more
That's because VADER is just a dictionary mapping each word to a single sentiment weight and adding it up with some basic logic for negations and such. There's an ocean of smaller NLP ML between that naive approach and LLMs. LLMs are trained to do everything. If all you need is a model trained to do sentiment analysis, using VADER over something like DistilBERT is NLP malpractice in 2025.
Sorry, I side with GP. Just because you don't want to use Llama/GPT because of cost, the middle-ground of DistilBERT etc (which can run on a single CPU) is a much more sensible cost/benefit tradeoff than VADER's decade old lexicon-based approach.
I can't really think of many NLP things that are one-decade old and don't have a better / faster / cheaper alternative.
I must have explained myself extremely poorly. I spent a fair bit of money ~$1,000 USD running a near SOTA fine-tuned llama model on cloud GPUs for this very particular task.
I think people do understand, but think you that your argument on price/performane uses two dataoint that are both far from a perceived better third option.
It's like saying I chose barefoot walking to get to the next town and while admittedly it was a painfull and not pleasant experience, it was free. I did try a helicopter service but that was very expensive for my use case.
People are pointing out you could have used a bicycle instead.
Maybe I misinterpreted what he wrote, but sanity checking the shiny new tech against fossilized tech of yesteryear to assure the new tech actually justifies it's higher cost doesn't sound like malpractice to me?
I mean he did use the state of the art for his work, he just checked how much better it actually was in comparison to a much simpler algorithm and thought the cost/benefit ratio to be questionable... At least that's what I read from his comments
Curious how big your dataset was if you used $1000 of GPU credits on DistilBERT. I've run BERT on CPU on moderate cloud instances no problem for datasets I've worked with, but which admittedly are not huge.
Price isn't a real issue in almost every imaginable use case either. Even a small open source model would outperform and you're going to get a lot of tokens per dollar with that.
VADER made me sad when it couldn’t do code mixed analyses in 2020. I’m thinking of dusting off that project, but then I dread the thought of using LLMs to do the same sentiment analysis.
I’d love to hear your thoughts on BERTs - I’ve dabbled a fair bit, fairly amateurishly, and have been astonished by their performance.
I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.
Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!
That is a really good question, I am not sure where to draw the line.
I think it would be safe to say BERT is/was firmly in the non-traditional side of NLP.
A variety of task specific RNN models preceded BERT, and RNN as a concept has been around for quite a long time, with the LSTM being more modern.
Maybe word2vec ushered in the end of traditional NLP and was simultaneously also the beginning of non-traditional NLP? Much like Newton has been said to be both the first scientist and also the last magician.
I find discussing these kind of questions with NLP academics to be awkward.
Sentiment analysis using traditional means is really lacking. I can’t talk about the current project I’m working on. But I needed a more nuanced sentiment. Think of something like people commenting on the Uber Eats app versus people commenting on a certain restaurant.
a) you can save costs on llama by running it locally
b) compute costs are plummeting. inference in the cloud costs has dropped over 80% in 1 year
c) similar to a), spending a little more and having a beefy enough machine is functionally cheaper after just a few projects
d) everyone trying to do sentiment analysis is trying to make waaaay more money anyway
so I dont see NLP’s even lower costs of being that relevant. its like pointing out that I could use assembly instead of 10 layers of abstraction. It doesnt really matter
As an NLP professor, yes, I think we're mostly screwed - saying LLMs are a dead end or not a big deal, like some of the interviewed say, is just wishful thinking. A lot of NLP tasks that were subject of active research for decades have just been wiped out.
Ironically, the tasks that still aren't solved well by LLMs and can still have a few years of life in them are the most low-level ones, that had become unfashionable in the last 15 years or so - part-of-speech tagging, syntactic parsing, NER. Of course, they have lost a lot of importance as well: you no longer need them for user-oriented downstream tasks. But they may still get some use: for example NER for its own sake is used in biomedical domains, and parsing can be useful for scientific studies of language (say, language universals related to syntax, language evolution, etc.). Which is more than you can say about translation or summarization, which have been pretty much obsoleted by LLMs. Still, even with these tasks, NLP will go from broad applicability to niche.
I'm not too worried for my livelihood at the moment (partly because I have tenure, and partly because the academic system works in such a way that zombie fields keep walking for quite long - there are still journals and conferences on the semantic web, which has been a zombie for who knows how long). But it's a pity: I got into this because it was fun and made an impact and now it seems most of my work is going to be irrelevant, like those semantic web researchers I used to look down at. I guess my consolation is that computers that really understand human language was the dream that got me into this in the first place, and it has been realized early. I can play with it and enjoy it while I sink into irrelevant research, I guess :/ Or try to pivot into discrete math or something.
Let's admit it - overnight it became much much harder to be a convincing professor, given each student can use GPTs of all sorts to contradict or otherwise intimidate you. Only a seasoned professor knows the feeling of being bullied by a smart-ass student. Which brings down the total value, the incentive, to teach, and also given the avoidance GPTs silently imprint in students. I mean - why write a program/paper/research when the GPT can do it for you, and save you the suffering.
The whole area of algorithms suddenly became more challenging, as now you also have to understand folding multi-dimensional spaces, and retell this all as a nice story for students to remember.
We are very likely heading into some dark uncharted era for academia, which will very likely lead to academia shrinking massively. And given the talk of 'all science now happens in big corpos'... I can expect the universities to go back to the original state they started from - monasteries.
Saying this all having spent 20+ years as part-time contributor to one such monastery.
Yes, my comment focused on NLP research but the importance of university teaching has also taken a hit - not that I fear bullying, but now students can have a dedicated custom teacher with infinite time and patience and that can answer questions at 3 AM, and obviously that reduces the relevance of the professor. While the human interaction in in-person teaching still provides some exclusive value, demand logically should go down. Although don't underestimate the power of inertia - one could also think that small noname universities would go out of business when the likes of MIT began offering their courses to everyone online, and it didn't happen. I do think LLMs bring higher risk than that, and a shrinking will indeed happen, but maybe not so dramatic. Let's see.
Regarding science, if we leave it exclusively to corporations we won't get very far, because most corporations aren't willing to do the basic/foundational science work. The Transformers and most of the relevant followup work that led to LLMs were developed in industry, but they wouldn't have been possible without the academics that kept working on neural networks while that field was actively scorned during the 90s-2000s AI winter. So I think research universities still should have a role to play. Of course, convincing the funders that this is indeed the case might be a different story.
>but they wouldn't have been possible without the academics that kept working on neural networks while that field was actively scorned during the 90s-2000s AI winter
The bottleneck was on compute power. Industry would have also worked on neural networks once the compute power for it existed.
Not totally related but I have wondered how someone who thinks they are an expert in a field may deal with contradictions presented by GPT.
For example, you may consider yourself an expert on some niche philosophy like say Orientalism. A student can now contradict any theories you can come up with using GPT and the scary thing is that they will be sensible contradictions.
I feel like the bar to consider yourself an expert is much higher - you not only have to beat your students but also know enough to beat GPT.
Why in this story students use GPT and professors don't?
If you are an expert -- sit down and start working with GPT on your own. See what it can and what it can not do. See where it helps. See where it hands down lose.
You are right - this is a new activity professors have to pick up. A latent point in my previous comment was that maybe some professors have not as much expertise as may be required. Now that this expertise is sort of democratised there's more pressure on professors to get better.
I used to study NLP but before transformers, and now I don't work with NLP/ML/LLMs at all. Can you explain to me this view?
LLMs are NLP? We have a model that works, and that works great for many people and many different usages, shouldn't NLP be at its top now? LLMs are not conceptionally that different to other models?
I worked with GIZA/MOSES statistical MT back in the day during my studies, it's at the end of the day just matrices that you don't really understand, same as with LLMs?
Its not the same at all. Llm is big yes and thats part of it. But llm is small compared to equivalent performance machine with something like ngram statistical models. You'd need the whole universe. Or something prohibitive like that. And itd still be worse. People don't like it but LLMs 'understand' texts in a very real meaning of the word. Because that's the most compressive way to do the task it's trained to. Is it the same as human understanding? Most likely not, but that complain is cheating.
NLP is indeed at its top, NLP professors aren't :)
Imagine if you had stayed in academia and kept working in MT for the last two decades. First of all, now you would see how LLMs render all your work pretty much obsolete. That's already hard for many people. Not so much for me, as I consider myself to be rather adaptable, and maybe not for you either - you can embrace the new thing and start working on it yourself, right?
But the problem is that you can't. The main issue is not lack of explainability, but the sheer resources needed. In academia we can't run, let alone train, anything within even one or two orders of magnitude of ChatGPT. We can work with toy models, knowing that we won't get even remotely near the state of the art (many are now doing this, with the excuse of "green AI", sustainability and such, but that won't even hold much longer). Or we can work with the likes of ChatGPT as plain users, but then we are studying the responses of an "oracle" that we don't even have control of, and it starts looking more like religion than science.
Ten years ago an academic could beat the state of the art in MT and many other NLP tasks, now for many tasks that's just impossible (unless we count coming up with some clever prompt engineering, but again, religion). For those of us who were in this field because we liked it, not only to make a living, research has become quite unfulfilling.
Ah, okay. The scale is just too big for academia, makes sense.
I remember that in my study days (just masters, not PhD) we have been unable to beat Google Translate already for general tasks, but we could beat it on weird language pairs, just because we could get more training data (mostly by downloading some ebooks online) and tinker with the algorithm a bit.
But the scale argument - more data will be better - was true even back then... it was just easy to get better training data than Google for weird subtasks.
(Actually one of my professors is now Big Guy of that "European Commission Approved LLM" research project that was in news few months ago. I am ... interested how that turns out. https://openeurollm.eu/ )
Unless we intend to surrender everything about human symbolic manipulations (all math, all proving, all computations, all programming) to llm in the nearest future, we still need some formal representations for engineering.
The major part of tradidional NLP was about formal representations. We are still to see the efficient mining techniques to extract the formal representations and analyses back from LLM.
How would we solve the traditional NLP problems, such as, for example, formalization of law corpus of a given country with LLM?
As an approximation we can look at non-natural language processing, e.g. compiler technologies. How do we write an optimizing compiler on LLM technologies? How do we ensure stability, correctness and price?
In a sence, the traditional NLP field has just doubled, not died. In addition to humans as language capable entities, who can not really explain how they use the language, we now also have LLM as another kind of language capable entities. Who in fact also can not explain anything. The only benefit is that it is cheaper to ask LLM the same question a million of times that a human.
As someone deeply involved in NLP, I’ve observed the field’s evolution: from decades of word counting and statistical methods to a decade of deep learning enabling “word arithmetic.” Now, with Generative AI, we’ve reached a new milestone, a universal NLP engine.
IMHO, the path to scalability often involves using GPT models for prototyping and cold starts. They are incredible at generating synthetic data, which is invaluable for bootstrapping datasets and also data labelling of a given dataset. Once a sufficient dataset is available, training a transformer model becomes feasible for high-intensity data applications where the cost of using GPT would be prohibitive.
GPT’s capabilities in data extraction and labeling are to me the killer applications, making it accessible for downstream tasks.
This shift signifies that NLP is transitioning from a data science problem to an engineering one, focusing on building robust, scalable systems.
Thanks for the link, just read it, and the Chomsky transcript. Chomsky wanted deep structure, Norvig bet on stats, but maybe Turing saw it coming, kids talk before they know grammar and so did the machines. It turns out we didn’t need to understand language to automate it.
CNNs were outperforming traditional methods on some tasks before 2017.
Problem was that all of the low level tasks , like part of speech tagging, parsing, named entity recognition , etc. never resulted in a good summarizing system or translating system.
Probabilistic graphical models worked a bit but not much.
Transformers were a leap, where none of the low level tasks had to be done for high level ones.
Pretty sure that equivalent leap happened in computer vision a bit before.
People were fiddling with low level pattern matching and filters and then it was all obliterated with an end to end cnn .
There was no leap in research. Everything had to do with availability of compute.
Neural nets are quite old, and everyone knew that they were universal function approximators. The reason why models never took off was because it was very expensive to train a model even of a limited size. There was no real available hardware to do this on short of supercomputer clusters, which were just all cpus, and thus wildly inefficient. But any researcher back then would have told you that you can figure anything out with neural nets.
Sometime in 2006, Nvidia realized that a lot of the graphics compute was just generic parallel compute and released Cuda. People started using graphics cards for compute. Then someone figured out you can actually train deep neural nets with decent speed.
Transformers wasn't even that big of a leap. The paper makes it sound like its some sort of novel architecture - in essence, instead of inputweights to next layer, you do inputmatrix1, inputmatrix2, inputmatrix3, and multiply them together. And as you guessed this, to train it you need more hardware because now you have to train 3 matrices rather than just one.
If we ever get like ASIC for ml, basically at a certain point, we will be able to iterate on architectures itself. The optimal LLM may be a combination of CNN,RNN, and Transformer blocks, all interwtined.
In NLP, transformers replaced RNNs. In computer vision, CNNs replaced previous methods (e.g. feature descriptors), and recently got replaced by visual transformers, though modern CNNs are still pretty good.
If Chomsky was writing papers in 2020 his paper would’ve been “language is all you need.”
That is clearly not true and as the article points out wide scale very large forecasting models beat that hypothesis that you need an actual foundational structure for language in order to demonstrate intelligence when in fact is exactly the opposite.
I’ve never been convinced by that hypothesis if for no other reason that we can demonstrate in the real world that intelligence is possible without linguistic structure.
As we’re finding: solving the markov process iteratively is the foundation of intelligence
out of that process emerges novel state transition processes - in some cases that’s novel communication methods that have structured mapping to state encoding inside the actor
communications happen across species to various levels of fidelity but it is not the underlying mechanism of intelligence, it is an emerging behavior that allows for shared mental mapping and storage
> As we’re finding: solving the markov process iteratively is the foundation of intelligence
No, the markov process allows LLMs to make connections between existing representations of human intelligence. LLMs are nothing without a data set developed by an existing intelligence.
Some people will never be convinced that a machine demonstrates intelligence. This is because for a lot of people, intelligence exists a subjective experience that they have and the belief that others have it too is only inasmuch as others appear to be like the self.
> The author Pamela McCorduck writes: "It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'."
The flip side of that is that every time a new AI approach becomes popular even more people proclaim "this is what thinking is", believing that the new technology reflects the underlying process of human intelligence. This phenomenon goes back further than AI as a formal discipline, to early computers, and even during the age of mechanical computers. There are parallels with robotics, where for centuries anything that could seemingly move like a human was perceived to be imbued with human-like qualities.[1] The human instinct to anthropomorphize is deep-seated and powerful.
I keep returning to this insight by a researcher of the Antikythera Mechanism, which in the context of ML seems even more apropos today than in 1986:
> I would like to conclude by telling a cautionary tale. Let us try and place the Antikythera Mechanism within the global context of ancient Greek thought. Firstly came the astronomers observing the motions of the heavenly bodies and collecting data. Secondly came the mathematicians inventing mathematical notation to describe the motions and fit the data. Thirdly came the technicians making mechanical models to simulate those mathematical constructions, like the Antikythera Mechanism. Fourthly came generations of students who learned their astronomy from these machines. Fifthly came scientists whose imagination had been so blinkered by generations of such learning that they actually believed that this was how the heavens worked. Sixthly came the authorities who insisted upon the received dogma. And so the human race was fooled into accepting the Ptolemaic system for a thousand years.
> Today we are in danger of making the same mistake over computers. Our present generation is able to view them with an appropriate skepticism when necessary. But our children's children may be brought up within a society dominated by computers, that they may actually believe this is how our brains work. We do not want the human race to be fooled again for another thousand years.
I also regularly return to Richard Stallman's admonition regarding the use of the term, intellectual property. He deeply disliked that term and argued it was designed to obfuscate, through self-serving[2] equivocations, the legal principles behind the laws of copyright, patent, trademark, trade secret, etc.
Contemporary machine learning may rightly be called artificial intelligence, but to conflate it with human intelligence is folly. It's clearly not human intelligence. It's something else. The same way dolphin intelligence isn't human intelligence, or a calculator isn't human intelligence. These things may be able to tell us something about the contours and limits of human intelligence, especially in contrast, but equivocations or even simple direct comparisons only serve to obfuscate and constrain how we think of intelligence.
[1] See, e.g., the 1927 film Metropolis, which played off prevailing beliefs and fears about the progress and direction of actuated machines.
[2] Serving the interests of those who profit the most from expanding the scope and duration of these legal regimes by obfuscating the original intent and design behind each regime, replacing them with concepts and processes that favored expansion.
I don't think argument by assertion is appropriate where there's a lot of people who "clearly" believe that it's a good approximation of human intelligence. Given we don't understand how human intelligence works, asserting that one plausible model (a continuous journey through an embedding space held in neurons) that works in machines isn't how humans do it seems too strong.
It is demonstrably true that artificial neurons have nothing to do with cortical neurons in mammals[1] so even if this model of human intelligence is useful, transformers/etc aren't anywhere close to actually implementing the model faithfully. Perhaps by Turing completeness o3 or whatever has stumbled into a good implementation of this model, but that is implausible. o3 still wildly confabulates worse than any dementia patient, still lacks the robust sense of folk physics we see in infants, etc. (This is even more apparent in video generators, Veo2 is SOTA and it still doesn't understand object permanence or gravity.) It does not make sense to say something is a model of human intelligence if it can do PhD-level written tasks but is outsmarted by literal babies (also chimps, dogs, pigeons...)
AI people toss around the term "neuron" way too freely.
> somebody figured out how to make a computer do something
Well, I would argue that in most deterministic AI systems the thinking was all done by the AI researchers and then encoded for the computer. That’s why historically it’s been easy to say, “No, the machine isn’t doing any thinking, but only applying thinking that’s embedded within.” I think that line of argument becomes less obvious when you have learning systems where the behavior is training dependent. It’s still fairly safe to argue that the best LLMs today are not yet thinking, at least not in a way a human does. But in another generation or two? It will become much harder to deny.
> It’s still fairly safe to argue that the best LLMs today are not ... thinking
I agree completely.
> But in another generation or two? It will become much harder to deny.
Unless there is something ... categorically different about what an LLM does and in a generation or two we can articulate what that is (30 years of looking at something makes it easier to understand ... sometimes).
> It’s still fairly safe to argue that the best LLMs today are not yet thinking, at least not in a way a human does. But in another generation or two? It will become much harder to deny.
Current LLMs have a hard division between training and inference time; human brains don’t-we train as we infer (although we probably do a mix of online/offline training: you build new connections while awake, but then pruning and consolidation happens while you sleep). I think softening the training-vs-inference division is a necessary (but possibly not sufficient) condition for closing the artificial-vs-human intelligence gap. But that softening is going to require completely different architectures from current LLMs, and I don’t think anyone has much of an idea what those new architectures will look like, or how long it will take for them to arrive
In many ways LLMs are a regression compared to what was before. They solve a huge class of problems quickly and cheaply, but they also have severe limitations that older methods didn't have.
So no, it's not a linear progress story like in a sci-fi story.
Do LLMS think? Of course they do. But thinking =/= intelligence.
Its pretty easy to define what an AI actually would look like:
A human coder sits down and writes an algorithm. In that algorithm, there is no reference to any specific piece of information on ANYTHING (including human words), whether its manually written in code or derived through training a neural net on that information and the code is just a bunch of matrix multiplies.
The algorithm has 2 interfaces - a terminal for a human to interact with, and an api to a tcp socket over which it can communicate to the world wide web.
A human could give this algorithm an instruction, like for example, "Design and build me a flying car and put it in my driveway and do not spend a single cent of my money, and do everything legally".
Provided there are no limits on communication that would result in the algorithm being perma banned of the internet, the algorithm prior to even tackling the task at hand will have to do the following at the least:
- figure out how to properly structure HTTP communication to be able to talk to servers, and essentially build an internal API.
- figure out what the words you typed mean - i.e map them to information collected from the web and
- start running internal simulations to figure out what the best course of action is
- figure out how to deal with ambiguity and ask you questions (like "how far do you want to fly"), and figure out how to deal with dead ends.
- start executing actions with preplanned risk (figuring out what risk is in the process) and learn from mistakes.
And that's just the short start.
But the key factor is that this same process that it uses to figure basic functionality is the same process (at least on the lowest level) that it would use to start designing a flying car once it has all the information it needs to "understand" the task.
And there isn't anything even remotely close on the horizon with any of the current AI research that indicates that we have any idea what that process looks like. The only claims that we can make is that its definitely recursive, not fully forward like LLMs, and its essentially a search algorithm. But what its searching and what the guidance metric is for search direction is the mystery.
> every time somebody figured out how to make a computer do something
Well, there’s the clue it is not really thinking if somebody told the machine how to do things. My Roomba isn’t intelligent because it’s been programmed to clean the floor, now is it?
Wake me up when machines learn to do something on their own. I know everybody is on the AI hype train, but please show your extraordinary evidence to your extraordinary claims first.
It doesn’t mean they tie intelligence to subjective experience. Take digestion. Can a computer simulate digestion, yes. But no computer can “digest” if it’s just silicon in the corner of an office. There are two hurdles. The leap from simulating intelligence to intelligence, and the leap from intelligence to subjective experience. If the computer gets attached to a mechanism that physically breaks down organic material, that’s the first leap. If the computer gains a first person experience of that process, that’s the second.
You can’t just short-circuit from simulates to does to has subjective experience.
And the claim other humans don’t have subjective experience is such non-starter.
> And the claim other humans don’t have subjective experience is such non-starter.
There is no empirical test for the subjective experience of consciousness. You can't even prove to anybody else that you have it. We assume other people experience as we ourselves do as a basic decency we extend to other humans. This is a good thing, but it's essentially faith not science.
As for machines not having it, I'm fine with that assumption, but until there can be some sort of empirical test for it, it's not science. Thankfully, it's also not relevant to any engineering matter. Whether the machines have a subjective experience in any way comparable to our own doesn't touch any question about what demonstrable capabilities or limitations they have. We don't need to know if the computer has a ""soul"" to know if the computer can be a solution to any given engineering problem. Whether machines can have subjective experience shouldn't be considered an important question to engineers; let theologians waste their time fruitlessly debating that.
I think you're talking about consciousness rather than intelligence. While I do see people regularly distinguishing between simulation and reality for consciousness, I don't often see people make that distinction for intelligence.
> And the claim other humans don’t have subjective experience is such non-starter.
What about other primates? Other mammals? The smarter species of cephalopods?
Certain many psychopaths seem to act as if they have this belief.
This is why I want the field to go straight to building indistinguishable agents- specifically, you should be able to video chat with an avatar that is impossible to tell from a human.
Then we can ask "if this is indistinguishable from a human, how can you be sure that anybody is intelligent?"
Personally I suspect we can make zombies that appear indistinguishable from humans (limited to video chat; making a robot that appears human to a doctor would be hard) but that don't have self-consciousness or any subjective experience.
"There is considerable overlap between the intelligence of the smartest bears and the dumbest tourists."
LLMs are not artificial intelligence but artificial stupidity.
LLMs will happily hallucinate. LLMs will happily tell you total lies with complete confidence. LLMs will give you grammatically perfect completely vapid content. etc.
And yet that is still better than what most humans could do in the same situation.
We haven't proved that machines can have intelligence, but instead we are happily proving that most people, most of the time just aren't very intelligent at all.
> LLMs will happily hallucinate. LLMs will happily tell you total lies with complete confidence.
Perhaps we should avoid anthropomorphizing them too much. LLMs don't inhabit a "real world" where they can experiment and learn. Their training data is their universe, and it's likely filled with conflicting, peculiar, and untestable information.
Yes, the output is sometimes "a lie" if we apply it to our world, but in "their world" stuff is might be just strangely different. And it's not like the real world has only "hard simple truths" - quantum mechanics comes to mind about how strange stuff can be.
> yet that is still better than what most humans could do in the same situation
Yup. A depressing takeaway from LLMs is most humans don’t demonstrate a drive to be curious and understand, but instead, to sort of muddle through most (economic) tasks.
Humans are basically incapable of recognizing that there’s something that’s more powerful than them
They’re never going to actively collectively admit that that’s the case, because humans collectively are so so systematically arrogant and self possessed that they’re not even open to the possibility of being lower on the intelligence totem pole
The only possible way forward for AI is to create the thing that everybody is so scared of so they can actually realize their place in the universe
> Humans are basically incapable of recognizing that there’s something that’s more powerful than them
> They’re never going to actively collectively admit that that’s the case, because humans collectively are so so systematically arrogant and self possessed that they’re not even open to the possibility of being lower on the intelligence totem pole
For most of human history, the clear majority of humans have believed in God(s), spirits, angels, bodhisattvas, etc - beings which by definition are above us on the “totem pole” - and although atheism is much more widespread today, I think it almost certainly remains a minority viewpoint at the global level.
So I’m sceptical of your idea humans have some inherent unwillingness to believe in superhuman entities. From an atheist perspective, one might say that the (globally/historically) average human is so eager to believe in such entities, that if they don’t actually exist, they’ll imagine them and then convince themselves that their imaginings are entirely real. (Whereas, a theist might argue that the human eagerness to believe in such entities is better explained by their existence.)
I’m curious as to how you derive your position of “misanthropic”
The answer to your question though is I do not believe that humans have at any point in history prevented an existential threat from actually being realized before it was realized
I will not be convinced a machine demonstrates intelligence until someone demonstrates a robot that can navigate 3D space as intelligently as, say, a cockroach. AFAICT we are still many years away from this, probably decades. A bunch of human language knowledge and brittle heuristics doesn't convince me at all.
This ad hominem is really irritating. People have complained since Alan Turing that AI research ignores simpler intelligence, instead trying to bedazzle people with fancy tricks that convey the illusion of human intelligence. Still true today: lots of talk about o3's (exaggerated) ability to do fancy math, little talk about its appallingly bad general quantitative reasoning. The idea of "jagged AI" is unscientific horseshit designed to sweep this stuff under the rug.
In the natural world, intelligence requires embodiment. And, depending on your point of view, consciousness. Modern AI exhibits neither of those characteristics.
It is until proven otherwise because modern science still doesn’t have a consensus or standards or biological tests which can account for it. As in, highly “intelligent” people often lack “common sense” or fall prey to con artists. It’s pompous as shit to assert a black box mimicry constitutes intelligence. Wake me up when it can learn to play a guitar and write something as good as Bob Dylan and Tom Petty. Hint: we’ll both be dead before that happens.
This to me is a weak argument. You have the ability to appreciate and judge something as good as Bob Dylan and Tom Petty. That's what makes you intelligent.
> This to me is a weak argument. You have the ability to appreciate and judge something as good as Bob Dylan and Tom Petty. That's what makes you intelligent.
What if you don't? Do you think that makes someone not intelligent?
Great seeing Ray Mooney (who I took a graduate class with) and Emily Bender (a colleague of many at the UT Linguistics Dept., and a regular visitor) sharing their honest reservations with AI and LLMs.
I try to stay as far away from this stuff as possible because when the bottom falls out, it's going to have devastating effects for everyone involved. As a former computational linguist and someone who built similar tools at reasonable scale for largeish social media organizations in the teens, I learned the hard way not to trust the efficacy of these models or their ability to get the sort of reliability that a naive user would expect from them in practical application.
How exactly is the bottom going to fall out? And are you really trying to present that you have practical experience building comparable tools to an LLM prior to the Transformer paper being written?
Now, there does appear to be some shenanigans going on with circular financing involving MSFT, NVIDIA, and SMCI (https://x.com/DarioCpx/status/1917757093811216627), but the usefulness of all the modern LLMs is undeniable. Given the state of the global economy and the above financial engineering issues I would not be surprised that at some point there isn't a contraction and the AI hype settles down a bit. With that said, LLMs could be made illegal and people would still continue running open source models indefinitely and organizations will build proprietary models in secret, b/c LLMs are that good.
Since we are throwing out predictions, I'll throw one out. Demand for LLMs to be more accurate will bring methods like formal verification to the forefront and I predict eventually model/agents will start to be able to formalize solved problems into proofs using formal verification techniques to guarantee correctness. At that point you will be able to trust the outputs for things the model "knows" (i.e. has proved) and use the probably correct answers the model spits out as we currently do today.
Probably something like the following flow:
1) Users enter prompts
2) Model answers questions and feeds those conversations to another model/program
3) Offline this other model uses formal verification techniques to try and reduce the answers to a formal proof.
4) The formal proofs are fed back into the first model's memory and then it uses those answers going forward.
5) Future questions that can be mapped to these formalized proofs can now be answered with almost no cost and are guaranteed to be correct.
> And are you really trying to present that you have practical experience building comparable tools to an LLM prior to the Transformer paper being written?
I believe (could be wrong) they were talking about their prior GOFAI/NLP experience when referencing scaling systems.
In any case, is it really necessary to be so harsh about over-confidence and then go on to predict the future of solving hallucinations with your formal verification ideas?
Over my years in academia, I noticed that the linguistics departments were always the most fiercely ideological. Almost every comment of a talk would be get contested by somebody from the audience.
It was annoying, but as a psych guy I was also jealous of them for having such clearly articulated theoretical frameworks. It really helped them develop cohesive lines of research to delineate the workings of each theory
> Don't try and say anything pro-linguistics here, (...)
Shit-talking LLMs without providing any basis or substance is not what I would call "pro-linguistics". It just sounds like petty spiteful behavior, lashing out out of frustration for rendering old models obsolete.
From a scientific explanatory perspective, the old models are not obsolete because they are explanatory whereas LLMs do not explain anything about human linguistic behaviour.
What do you mean "work"? Their goal is not to be some general purpose AI. (To be clear I'm talking narrowly about computational linguistics, not old fashioned NLP more broadly).
The interesting question is whether just a gigantic set of probabilities somehow captures things about language and cognition that we would not expect ...
Curious what you are expecting when you say "bottom falls out". Are you expecting significant failures of large-scale systems? Or more a point where people recognize some flaw that you see in LLMs?
> learned the hard way not to trust the efficacy of these models or their ability to get the sort of reliability that a naive user would expect from them in practical application
But…they work. Linguistics as a science is still solid. But as a practical exercise, it seems to be moot other than for finding niches where LLMs are too pricey.
Lots of great quotes in this piece but this one stuck out for me:
> TAL LINZEN: It’s sometimes confusing when we pretend that there’s a scientific conversation happening, but some of the people in the conversation have a stake in a company that’s potentially worth $50 billion.
I think the AI hype (much of which is justified) detracts from the fact that we have Actually Good NLP at last. I've worked on NL2SQL in both the before and after times, and it's still not a solved problem, but it's frustrating to talk to AI startup people who have never really thought deeply about named entity recognition, disambiguation etc. The tools are much, much better. The challenges and pitfalls remain much the same.
As someone who dropped out of NLP during the chaos, all this stuff honestly feels way too familiar - progress is cool but watching your work become pointless overnight stings hard.
This paper is awful. They bizarrely argue the fact that transformers are not very sensitive to word order as a positive of transformers despite the fact that's not how languages work. There's also this absurd passage.
>However, a closer look at the statistical structure of language use reveals that word order contains surprisingly little information over and above lexical information. To see this intuitively, imagine we give you a set of words {dogs, bones, eat} without telling you the original order of the words. You can still reconstruct the meaning based entirely on (1) the meanings of the words in isolation and (2) your knowledge of how the world works—dogs usually eat bones; bones rarely eat dogs. Indeed, many languages show a high level of nondeterminism in word order (Futrell et al., 2015b; Koplenig et al., 2017), and word order cues are often redundant with meaning or case markers (Pijpops and Zehentner, 2022; Mahowald et al., 2023). The fact that word order is relatively uninformative in usage also partly explains why bag-of-words methods dominated NLP tasks until around 2020, consistently outperforming much more sophisticated approaches: it turns out that most of the information in sentences is in fact present in the bag of words.
While it is certainly possible to guess that your interlocutor meant "dogs eat bones", the sentence "bones eat dogs" is entirely possible (if unlikely)! For example, imagine a moving skeleton in a video game or something. The idea that word order isn't vital to meaning is deeply unserious. (Of course there are languages where word order matters less, but there are still important rules about constituent structure, clausal embedding etc, which constrain word order).
It tends to happen for any prompt that calls for generating a piece of output for which there are many valid answers, but one is highly weighted and you want variety. Do you remember that meme a few years ago where people were asked to generate a color and then a hand tool, and most people immediately responded "erqunzzrebengyrnfgbarbsgubfrgjb"? (rot13+padding for those who haven't done this)
This particular example is too small to regularly trip AIs, but as a general rule I do not consider it tricky to try to textually negative-prompt to remove a commonly-valid-but-not-currently-wanted response. (Obviously, if you manually tweak the weights to forbid something rather than using the word "not", this fails.)
From my very rough observations, for models that fit on a local device, it typically starts to happen maybe 10% of the time when the prompt reaches 300 characters or so (specifying other parts of what you want); bigger models just need a bit more input before they fail. Reasoning models might be better, but we can watch them literally burn power running nonsensical variations through the thought pane so they're far from a sure answer.
This happens in any context you can think of: from scratch or extending an existing work; single or list; information lookup, prose generation, code generation (consider "Extend this web app to do lengthy-description-of-some-task. Remember I am not using React you stupid piece of shit AI!").
I wonder whether tenures are causing inefficiencies in the market? You might be encouraging someone to work on an outdated field without the correct incentives.
Just like having employees with experience, I guess.
But tenured researchers are supposed to have some more protection specifically because they do research (and reach conclusions) on topics that people in leadership positions in society might not like.
I was contrasting FiNER, GliNER, and Smolagents in a recent blog post on my substack and while the first two are fast and provide somewhat good results, running a LLM locally is 10x better easily.
Would love to read that post - we’re considering using GliNER for discrete parts of our ingestion pipeline where we assumed it would be a great perf/$ drop-in for larger models.
For me as a lay-person, the article is disjointed and kinda hard to follow. It's fascinating that all the quotes are emotional responses or about academic politics. Even now, they are suspicious of transformers and are bitter that they were wrong. No one seems happy that their field of research has been on an astonishing rocketship of progress in the last decade.
The way I see this is that for a long time there was an academic field that was working on parsing natural human language and it was influenced by some very smart people who had strong opinions. They focused mainly on symbolic approaches to parsing, rather than probabilistic. And there were some fairly strong assumptions about structure and meaning. Norvig wrote about this: https://norvig.com/chomsky.html and I think the article bears repeated, close reading.
Unfortunately, because ML models went brr some time ago (Norvig was at the leading edge of this when he worked on the early google search engine and had access to huge amounts of data), we've since seen that probabilistic approaches produce excellent results, surpassing everything in the NLP space in terms of producing real-world sysems, without addressing any of the issues that the NLP folks believe are key (see https://en.wikipedia.org/wiki/Stochastic_parrot and the referenced paper). Personally I would have preferred if the parrot paper hadn't also discussed environmental costs of LLMs, and focused entirely on the semantic issues associated with probabilistic models.
I think there's a huge amount of jealousy in the NLP space that probabilistic methods worked so well, so fast (with transformers being the key innovation that improved metrics). And it's clear that even state-of-the-art probabilistic models lack features that NLP people expected.
Repeatedly we have seen that probabilistic methods are the most effective way to make forward progress, provided you have enough data and good algorithms. It would be interesting to see the NLP folks try to come up with models that did anything near what a modern LLM can do.
This is pretty much correct. I'd have to search for it but I remember an article from a couple years back that detailed how LLMs blew up the field of NLP processing overnight.
Although I'd also offer a slightly different lens through which to look at the reaction of other researchers. There's jealousy, sure, but overnight a ton of NLP researchers basically had to come to terms with the fact that their research was useless, at least from a practical perspective.
For example, imagine you just got your PhD in machine translation, which took you 7 years of laboring away in grad/post grad work. Then something comes out that can do machine translation several orders of magnitude better than anything you have proposed. Anyone can argue about what "understanding" means until they're blue in the face, but for machine translation, nobody really cares that much - people just want to get text in another language that means the same thing as the original language, and they don't really care how.
Tha majority of research leads to "dead ends", but most folks understand that's the nature of research, and there is usually still value in discovering "OK, this won't work". Usually, though, this process is pretty incremental. With LLMs all of a sudden you had lots of folks whose life work was pretty useless (again, from a practical perspective), and that'd be tough for anyone to deal with.
Note that the author has a background spanning a lot of the timespans/topics discussed - much work in multilingual NLP, translation, and more recently at DeepMind, Cohere, and Meta (in other words, someone with a great perspective on everything in the top article).
Re: Machine Translation, note that Transformers were introduced for this task, and built on one of the earlier notions of attention in sequence models: https://arxiv.org/abs/1409.0473 (2014, 38k citations)
That's not to say there weren't holdouts or people who really were "hurt" by a huge jump in MT capability - just that this is a logical progression in language understanding methods as seen by some folks (though who could have predicted the popularity of chat interfaces).
I wouldn't say NLP as a field was resistant to probabilistic approaches or even neural networks. From maybe 2000-2018 almost all the papers were about using probabilistic methods to figure out word sense disambiguation or parsing or sentiment analysis or whatever. What changed was that these tasks turned out not to be important for the ultimate goal of making language technologies. We thought things like parsing were going to be important because we thought any system that can understand language would have to do so on the basis of the parse tree. But it turns out a gigantic neural network text generator can do nearly anything we ever wanted from a language technology, without dealing with any of the intermediate tasks that used to get so much attention. It's like the whole field got short-circuited.
The way I have experienced this, starting from circa 2018, it was a bit more incremental. First, LSTMs and then transformers lead to new heights on the old tasks, such as syntactic parsing and semantic role labelling, which was sad for the previous generation, but at least we were playing the same game. But then not only the old tools of NLP, but the research questions themselves became irrelevant because we could just ask a model nicely and get good results on very practical downstream tasks that didn't even exist a short while ago. NLP suddenly turned into general document/information processing field, with a side hustle in conversational assistants. Already GPT2 essentially mastered the grammar of English, and what difficulties remain are super-linguistic and have more to do with general reasoning. I would say that it's not that people are bitter that other people make progress, it's more that there is not much progress to be had in the old haunts at all.
I think you greatly understate the impact as EVERYONE is freaking the fuck out about AI, not just NLP researchers.
AI is obliterating the usefulness of all mental work. Look at the high percentage of HN articles trying to figure out whether LLMs can eliminate software developers. Or professional writers. Or composers. Or artists. Or lawyers.
Focusing on the NLP researchers really understates the scope of the insecurity induced by AI.
All of this matches my understanding. It was interesting taking an NLP class in 2017, the professors said basically listen, this curriculum is all historical and now irrelevant given LLMs, we’ll tell you a little about them but basically it’s all cutting edge sorry.
Even 15-ish years ago when I was in school, the NLP folks viewed probabilistic models with suspicion. NLP treated everyone from our Math department with suspicion and gave them a hard time. It created so many politics that some folks who wanted to do statistical approaches would call themselves CS so that the NLP old guard wouldn't give them a hard time.
On the contrary, to some of us (who have focused on probability, big data, algorithms, and HPC, while eschewing complex theories that require geniuses to understand) the bitter lesson is incredibly sweet.
Very much like when I moved from tightly coupled to "embarassing" parallelism. A friend said "don't call it embarassing... it's pleasant not to have to think about hard distributed computing problems".
powerful response but.. "fit for what purposes" .. All of human writings are not functionally equivalent. This has been discussed at length. e.g. poetry versus factual reporting or summation..
I agree with criticism of Noam Chomsky as a linguist. I was raised in the typological tradition which has its very own kind of beef with Chomsky due to other reasons (his singular focus on English for constructing his theories amongst other things), but his dislike of statistical methods was of course equally suspect.
Nevertheless there is something to be said for classical linguistic theory in terms of constituent (or dependency) grammars and various other tools. They give us much simpler models that, while incomplete, can still be fairly useful at a fraction of the cost and size of transformer architectures (e.g. 99% of morphology can be modeled with finite state machines). They also let us understand languages better - we can't really peek into a transformer to understand structural patterns in a language or to compare them across different languages.
That is simply false about UG only being based on English. Maybe in 1950 but any modern generativist theory uses data from many, many languages and English has been re-analysed in light of other languages (see here for an example of quantifiers being analysed in English on the basis of data in a Salish language https://philpapers.org/rec/MATQAT )
And the more general version, “Humanity progresses one funeral at a time.” Which is why the hyper-longevity people are basically trying to freeze all human progress.
Or Effective Altruism's long-termism that effectively makes everyone universally poor now. Interestingly, Guillaume Verdon (e/acc) is friends with Bryan Johnson and seems to be pro-longevity.
It's a truly bitter pill to swallow when your whole area of research goes redundant.
I have a bit of background in this field so it's nice to see even people who were at the top of the field raise concerns that I had. That comment about LHC was exactly what I told my professor. That the whole field seems to be moving in a direction where you need a lot of resources to do anything. You can have 10 different ideas on how to improve LLMs but unless you have the resources there is barely anything you can do.
NLP was the main reason I pursued an MS degree but by the end of my course I was not longer interested in it mostly because of this.
> That the whole field seems to be moving in a direction where you need a lot of resources to do anything. You can have 10 different ideas on how to improve LLMs but unless you have the resources there is barely anything you can do.
I think you're confusing problems, or you're not realizing that improving the efficiency of a class of models is a research area on it's own. Look at any field that involves expensive computational work. Model reduction strategies dominate research.
> No one seems happy that their field of research has been on an astonishing rocketship of progress in the last decade.
Well, they're unhappy that an unrelated field of research more-or-less accidentally solved NLP. All the specialized NLP techniques people spent a decade developing were obviated by bigger deep learning models.
My view is that "traditional" NLP will get re-incorporated into LLMs (or their successors) over time. We just didn't get to it yet. Appropriate inductive biases will only make LLMs better, faster and cheaper.
There will always be trouble in LLM "paradise" and desire to take it to the next level. Use raw-accessed (highest performing) LLM, intensely, for coding and you will rack up $10-$20/hr bill. China is not supposed to have adequate GPUs at their disposal, - they will come up with smaller and more efficient models. Etc, etc, etc...
I am in academia and worked in NLP although I would describe myself as NLP adjacent.
I can confirm LLMs have essentially confined a good chunk of historical research into the bin. I suspect there are probably still a few PhD students working on traditional methods knowing full well a layman can do better using the mobile ChatGPT app.
That said traditional NLP has its uses.
Using the VADER model for sentiment analysis while flawed is vastly cheaper than LLMs to get a general idea. Traditional NLP is suitable for many tasks people are now spending a lot of money asking GPT to do just because they know GPT.
I recently did an analysis on a large corpus and VADER was essentially free while the cloud costs to run a Llama based sentiment model was about $1000. I ran both because VADER costs nothing but minimal CPU time.
NLP can be wrong but it can’t be jailbroken and it won’t make stuff up.
It currently costs around $2200 to run Gemini flash lite on all of Wikipedia English. It would probably cost around 10x that much to run sentiment analysis on every Yelp review ever posted. It's true that LLMs still cost a lot for some use cases, but for essentially any business case it's not worth using traditional NLP any more
idk why are you changing targets for comparison?
it's like:
"does apple cure cancer in monkeys?" vs "does blueberry cure diabetes in pigs?"
That's because VADER is just a dictionary mapping each word to a single sentiment weight and adding it up with some basic logic for negations and such. There's an ocean of smaller NLP ML between that naive approach and LLMs. LLMs are trained to do everything. If all you need is a model trained to do sentiment analysis, using VADER over something like DistilBERT is NLP malpractice in 2025.
> using VADER over something like DistilBERT is NLP malpractice in 2025.
Ouch. Was that necessary?
I used $1000 worth of GPU credits and threw in VADER because it’s basically free both in time and credits.
I usually do this on large dataset out of pure interest in how it correlates with expensive methods on English language text.
I am well aware of how VADER works and its limitations, I am also aware of the limitations of all sentiment analysis.
Sorry, I side with GP. Just because you don't want to use Llama/GPT because of cost, the middle-ground of DistilBERT etc (which can run on a single CPU) is a much more sensible cost/benefit tradeoff than VADER's decade old lexicon-based approach.
I can't really think of many NLP things that are one-decade old and don't have a better / faster / cheaper alternative.
I must have explained myself extremely poorly. I spent a fair bit of money ~$1,000 USD running a near SOTA fine-tuned llama model on cloud GPUs for this very particular task.
I think people do understand, but think you that your argument on price/performane uses two dataoint that are both far from a perceived better third option.
It's like saying I chose barefoot walking to get to the next town and while admittedly it was a painfull and not pleasant experience, it was free. I did try a helicopter service but that was very expensive for my use case.
People are pointing out you could have used a bicycle instead.
This was clear both other times you explained it, the other commenters seem to want to nitpick despite it.
Maybe I misinterpreted what he wrote, but sanity checking the shiny new tech against fossilized tech of yesteryear to assure the new tech actually justifies it's higher cost doesn't sound like malpractice to me?
I mean he did use the state of the art for his work, he just checked how much better it actually was in comparison to a much simpler algorithm and thought the cost/benefit ratio to be questionable... At least that's what I read from his comments
Curious how big your dataset was if you used $1000 of GPU credits on DistilBERT. I've run BERT on CPU on moderate cloud instances no problem for datasets I've worked with, but which admittedly are not huge.
If I'm reading correctly, they used $1000 running a Llama model, not DistilBERT.
You read it correctly. I obviously didn't explain myself well.
Price isn't a real issue in almost every imaginable use case either. Even a small open source model would outperform and you're going to get a lot of tokens per dollar with that.
> dictionary mapping each word to a single sentiment weight
That seems to me like it would flat out fail on sarcasm. How is that still considered a usable method today?
*consigned a good chunk of historical research into the bin
To be fair it is still stuck there, so
VADER made me sad when it couldn’t do code mixed analyses in 2020. I’m thinking of dusting off that project, but then I dread the thought of using LLMs to do the same sentiment analysis.
Does it work for sarcasm and typos which real world people tend to do?
So... What were the results? How did the Llama based model compare to VADER?
I’d love to hear your thoughts on BERTs - I’ve dabbled a fair bit, fairly amateurishly, and have been astonished by their performance.
I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.
Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!
That is a really good question, I am not sure where to draw the line.
I think it would be safe to say BERT is/was firmly in the non-traditional side of NLP.
A variety of task specific RNN models preceded BERT, and RNN as a concept has been around for quite a long time, with the LSTM being more modern.
Maybe word2vec ushered in the end of traditional NLP and was simultaneously also the beginning of non-traditional NLP? Much like Newton has been said to be both the first scientist and also the last magician.
I find discussing these kind of questions with NLP academics to be awkward.
Sentiment analysis using traditional means is really lacking. I can’t talk about the current project I’m working on. But I needed a more nuanced sentiment. Think of something like people commenting on the Uber Eats app versus people commenting on a certain restaurant.
a) you can save costs on llama by running it locally
b) compute costs are plummeting. inference in the cloud costs has dropped over 80% in 1 year
c) similar to a), spending a little more and having a beefy enough machine is functionally cheaper after just a few projects
d) everyone trying to do sentiment analysis is trying to make waaaay more money anyway
so I dont see NLP’s even lower costs of being that relevant. its like pointing out that I could use assembly instead of 10 layers of abstraction. It doesnt really matter
As an NLP professor, yes, I think we're mostly screwed - saying LLMs are a dead end or not a big deal, like some of the interviewed say, is just wishful thinking. A lot of NLP tasks that were subject of active research for decades have just been wiped out.
Ironically, the tasks that still aren't solved well by LLMs and can still have a few years of life in them are the most low-level ones, that had become unfashionable in the last 15 years or so - part-of-speech tagging, syntactic parsing, NER. Of course, they have lost a lot of importance as well: you no longer need them for user-oriented downstream tasks. But they may still get some use: for example NER for its own sake is used in biomedical domains, and parsing can be useful for scientific studies of language (say, language universals related to syntax, language evolution, etc.). Which is more than you can say about translation or summarization, which have been pretty much obsoleted by LLMs. Still, even with these tasks, NLP will go from broad applicability to niche.
I'm not too worried for my livelihood at the moment (partly because I have tenure, and partly because the academic system works in such a way that zombie fields keep walking for quite long - there are still journals and conferences on the semantic web, which has been a zombie for who knows how long). But it's a pity: I got into this because it was fun and made an impact and now it seems most of my work is going to be irrelevant, like those semantic web researchers I used to look down at. I guess my consolation is that computers that really understand human language was the dream that got me into this in the first place, and it has been realized early. I can play with it and enjoy it while I sink into irrelevant research, I guess :/ Or try to pivot into discrete math or something.
Let's admit it - overnight it became much much harder to be a convincing professor, given each student can use GPTs of all sorts to contradict or otherwise intimidate you. Only a seasoned professor knows the feeling of being bullied by a smart-ass student. Which brings down the total value, the incentive, to teach, and also given the avoidance GPTs silently imprint in students. I mean - why write a program/paper/research when the GPT can do it for you, and save you the suffering.
The whole area of algorithms suddenly became more challenging, as now you also have to understand folding multi-dimensional spaces, and retell this all as a nice story for students to remember.
We are very likely heading into some dark uncharted era for academia, which will very likely lead to academia shrinking massively. And given the talk of 'all science now happens in big corpos'... I can expect the universities to go back to the original state they started from - monasteries.
Saying this all having spent 20+ years as part-time contributor to one such monastery.
Yes, my comment focused on NLP research but the importance of university teaching has also taken a hit - not that I fear bullying, but now students can have a dedicated custom teacher with infinite time and patience and that can answer questions at 3 AM, and obviously that reduces the relevance of the professor. While the human interaction in in-person teaching still provides some exclusive value, demand logically should go down. Although don't underestimate the power of inertia - one could also think that small noname universities would go out of business when the likes of MIT began offering their courses to everyone online, and it didn't happen. I do think LLMs bring higher risk than that, and a shrinking will indeed happen, but maybe not so dramatic. Let's see.
Regarding science, if we leave it exclusively to corporations we won't get very far, because most corporations aren't willing to do the basic/foundational science work. The Transformers and most of the relevant followup work that led to LLMs were developed in industry, but they wouldn't have been possible without the academics that kept working on neural networks while that field was actively scorned during the 90s-2000s AI winter. So I think research universities still should have a role to play. Of course, convincing the funders that this is indeed the case might be a different story.
>but they wouldn't have been possible without the academics that kept working on neural networks while that field was actively scorned during the 90s-2000s AI winter
The bottleneck was on compute power. Industry would have also worked on neural networks once the compute power for it existed.
Not totally related but I have wondered how someone who thinks they are an expert in a field may deal with contradictions presented by GPT.
For example, you may consider yourself an expert on some niche philosophy like say Orientalism. A student can now contradict any theories you can come up with using GPT and the scary thing is that they will be sensible contradictions.
I feel like the bar to consider yourself an expert is much higher - you not only have to beat your students but also know enough to beat GPT.
Why in this story students use GPT and professors don't?
If you are an expert -- sit down and start working with GPT on your own. See what it can and what it can not do. See where it helps. See where it hands down lose.
You are right - this is a new activity professors have to pick up. A latent point in my previous comment was that maybe some professors have not as much expertise as may be required. Now that this expertise is sort of democratised there's more pressure on professors to get better.
I used to study NLP but before transformers, and now I don't work with NLP/ML/LLMs at all. Can you explain to me this view?
LLMs are NLP? We have a model that works, and that works great for many people and many different usages, shouldn't NLP be at its top now? LLMs are not conceptionally that different to other models?
I worked with GIZA/MOSES statistical MT back in the day during my studies, it's at the end of the day just matrices that you don't really understand, same as with LLMs?
Its not the same at all. Llm is big yes and thats part of it. But llm is small compared to equivalent performance machine with something like ngram statistical models. You'd need the whole universe. Or something prohibitive like that. And itd still be worse. People don't like it but LLMs 'understand' texts in a very real meaning of the word. Because that's the most compressive way to do the task it's trained to. Is it the same as human understanding? Most likely not, but that complain is cheating.
NLP is indeed at its top, NLP professors aren't :)
Imagine if you had stayed in academia and kept working in MT for the last two decades. First of all, now you would see how LLMs render all your work pretty much obsolete. That's already hard for many people. Not so much for me, as I consider myself to be rather adaptable, and maybe not for you either - you can embrace the new thing and start working on it yourself, right?
But the problem is that you can't. The main issue is not lack of explainability, but the sheer resources needed. In academia we can't run, let alone train, anything within even one or two orders of magnitude of ChatGPT. We can work with toy models, knowing that we won't get even remotely near the state of the art (many are now doing this, with the excuse of "green AI", sustainability and such, but that won't even hold much longer). Or we can work with the likes of ChatGPT as plain users, but then we are studying the responses of an "oracle" that we don't even have control of, and it starts looking more like religion than science.
Ten years ago an academic could beat the state of the art in MT and many other NLP tasks, now for many tasks that's just impossible (unless we count coming up with some clever prompt engineering, but again, religion). For those of us who were in this field because we liked it, not only to make a living, research has become quite unfulfilling.
Ah, okay. The scale is just too big for academia, makes sense.
I remember that in my study days (just masters, not PhD) we have been unable to beat Google Translate already for general tasks, but we could beat it on weird language pairs, just because we could get more training data (mostly by downloading some ebooks online) and tinker with the algorithm a bit.
But the scale argument - more data will be better - was true even back then... it was just easy to get better training data than Google for weird subtasks.
(Actually one of my professors is now Big Guy of that "European Commission Approved LLM" research project that was in news few months ago. I am ... interested how that turns out. https://openeurollm.eu/ )
Unless we intend to surrender everything about human symbolic manipulations (all math, all proving, all computations, all programming) to llm in the nearest future, we still need some formal representations for engineering.
The major part of tradidional NLP was about formal representations. We are still to see the efficient mining techniques to extract the formal representations and analyses back from LLM.
How would we solve the traditional NLP problems, such as, for example, formalization of law corpus of a given country with LLM?
As an approximation we can look at non-natural language processing, e.g. compiler technologies. How do we write an optimizing compiler on LLM technologies? How do we ensure stability, correctness and price?
In a sence, the traditional NLP field has just doubled, not died. In addition to humans as language capable entities, who can not really explain how they use the language, we now also have LLM as another kind of language capable entities. Who in fact also can not explain anything. The only benefit is that it is cheaper to ask LLM the same question a million of times that a human.
As someone deeply involved in NLP, I’ve observed the field’s evolution: from decades of word counting and statistical methods to a decade of deep learning enabling “word arithmetic.” Now, with Generative AI, we’ve reached a new milestone, a universal NLP engine.
IMHO, the path to scalability often involves using GPT models for prototyping and cold starts. They are incredible at generating synthetic data, which is invaluable for bootstrapping datasets and also data labelling of a given dataset. Once a sufficient dataset is available, training a transformer model becomes feasible for high-intensity data applications where the cost of using GPT would be prohibitive.
GPT’s capabilities in data extraction and labeling are to me the killer applications, making it accessible for downstream tasks.
This shift signifies that NLP is transitioning from a data science problem to an engineering one, focusing on building robust, scalable systems.
Reminds me of the whole Chomsky vs. Norvig debate - https://norvig.com/chomsky.html
Thanks for the link, just read it, and the Chomsky transcript. Chomsky wanted deep structure, Norvig bet on stats, but maybe Turing saw it coming, kids talk before they know grammar and so did the machines. It turns out we didn’t need to understand language to automate it.
CNNs were outperforming traditional methods on some tasks before 2017.
Problem was that all of the low level tasks , like part of speech tagging, parsing, named entity recognition , etc. never resulted in a good summarizing system or translating system.
Probabilistic graphical models worked a bit but not much.
Transformers were a leap, where none of the low level tasks had to be done for high level ones.
Pretty sure that equivalent leap happened in computer vision a bit before.
People were fiddling with low level pattern matching and filters and then it was all obliterated with an end to end cnn .
There was no leap in research. Everything had to do with availability of compute.
Neural nets are quite old, and everyone knew that they were universal function approximators. The reason why models never took off was because it was very expensive to train a model even of a limited size. There was no real available hardware to do this on short of supercomputer clusters, which were just all cpus, and thus wildly inefficient. But any researcher back then would have told you that you can figure anything out with neural nets.
Sometime in 2006, Nvidia realized that a lot of the graphics compute was just generic parallel compute and released Cuda. People started using graphics cards for compute. Then someone figured out you can actually train deep neural nets with decent speed.
Transformers wasn't even that big of a leap. The paper makes it sound like its some sort of novel architecture - in essence, instead of inputweights to next layer, you do inputmatrix1, inputmatrix2, inputmatrix3, and multiply them together. And as you guessed this, to train it you need more hardware because now you have to train 3 matrices rather than just one.
If we ever get like ASIC for ml, basically at a certain point, we will be able to iterate on architectures itself. The optimal LLM may be a combination of CNN,RNN, and Transformer blocks, all interwtined.
> ever get like ASIC for ml
Is this what you're mentioning?
[0] https://linearmicrosystems.com/using-asic-chips-for-artifici...
> never resulted in a good ... translating system
that seems too broad
> all obliterated with an end to end cnn
you mixed your nouns.. what you were saying about transformers was about transformers.. that specifically replaced cnn. So,no
In NLP, transformers replaced RNNs. In computer vision, CNNs replaced previous methods (e.g. feature descriptors), and recently got replaced by visual transformers, though modern CNNs are still pretty good.
yikes! yes typing too fast..
If Chomsky was writing papers in 2020 his paper would’ve been “language is all you need.”
That is clearly not true and as the article points out wide scale very large forecasting models beat that hypothesis that you need an actual foundational structure for language in order to demonstrate intelligence when in fact is exactly the opposite.
I’ve never been convinced by that hypothesis if for no other reason that we can demonstrate in the real world that intelligence is possible without linguistic structure.
As we’re finding: solving the markov process iteratively is the foundation of intelligence
out of that process emerges novel state transition processes - in some cases that’s novel communication methods that have structured mapping to state encoding inside the actor
communications happen across species to various levels of fidelity but it is not the underlying mechanism of intelligence, it is an emerging behavior that allows for shared mental mapping and storage
> As we’re finding: solving the markov process iteratively is the foundation of intelligence
No, the markov process allows LLMs to make connections between existing representations of human intelligence. LLMs are nothing without a data set developed by an existing intelligence.
Some people will never be convinced that a machine demonstrates intelligence. This is because for a lot of people, intelligence exists a subjective experience that they have and the belief that others have it too is only inasmuch as others appear to be like the self.
It's called the AI effect: https://en.wikipedia.org/wiki/AI_effect
> The author Pamela McCorduck writes: "It's part of the history of the field of artificial intelligence that every time somebody figured out how to make a computer do something—play good checkers, solve simple but relatively informal problems—there was a chorus of critics to say, 'that's not thinking'."
The flip side of that is that every time a new AI approach becomes popular even more people proclaim "this is what thinking is", believing that the new technology reflects the underlying process of human intelligence. This phenomenon goes back further than AI as a formal discipline, to early computers, and even during the age of mechanical computers. There are parallels with robotics, where for centuries anything that could seemingly move like a human was perceived to be imbued with human-like qualities.[1] The human instinct to anthropomorphize is deep-seated and powerful.
I keep returning to this insight by a researcher of the Antikythera Mechanism, which in the context of ML seems even more apropos today than in 1986:
> I would like to conclude by telling a cautionary tale. Let us try and place the Antikythera Mechanism within the global context of ancient Greek thought. Firstly came the astronomers observing the motions of the heavenly bodies and collecting data. Secondly came the mathematicians inventing mathematical notation to describe the motions and fit the data. Thirdly came the technicians making mechanical models to simulate those mathematical constructions, like the Antikythera Mechanism. Fourthly came generations of students who learned their astronomy from these machines. Fifthly came scientists whose imagination had been so blinkered by generations of such learning that they actually believed that this was how the heavens worked. Sixthly came the authorities who insisted upon the received dogma. And so the human race was fooled into accepting the Ptolemaic system for a thousand years.
> Today we are in danger of making the same mistake over computers. Our present generation is able to view them with an appropriate skepticism when necessary. But our children's children may be brought up within a society dominated by computers, that they may actually believe this is how our brains work. We do not want the human race to be fooled again for another thousand years.
-- E.C. Zeeman, Gears from the Greeks, January 1986, http://zakuski.utsa.edu/~gokhman/ecz/gears_from_the_greeks.p...
I also regularly return to Richard Stallman's admonition regarding the use of the term, intellectual property. He deeply disliked that term and argued it was designed to obfuscate, through self-serving[2] equivocations, the legal principles behind the laws of copyright, patent, trademark, trade secret, etc.
Contemporary machine learning may rightly be called artificial intelligence, but to conflate it with human intelligence is folly. It's clearly not human intelligence. It's something else. The same way dolphin intelligence isn't human intelligence, or a calculator isn't human intelligence. These things may be able to tell us something about the contours and limits of human intelligence, especially in contrast, but equivocations or even simple direct comparisons only serve to obfuscate and constrain how we think of intelligence.
[1] See, e.g., the 1927 film Metropolis, which played off prevailing beliefs and fears about the progress and direction of actuated machines.
[2] Serving the interests of those who profit the most from expanding the scope and duration of these legal regimes by obfuscating the original intent and design behind each regime, replacing them with concepts and processes that favored expansion.
> It's clearly not human intelligence
I don't think argument by assertion is appropriate where there's a lot of people who "clearly" believe that it's a good approximation of human intelligence. Given we don't understand how human intelligence works, asserting that one plausible model (a continuous journey through an embedding space held in neurons) that works in machines isn't how humans do it seems too strong.
It is demonstrably true that artificial neurons have nothing to do with cortical neurons in mammals[1] so even if this model of human intelligence is useful, transformers/etc aren't anywhere close to actually implementing the model faithfully. Perhaps by Turing completeness o3 or whatever has stumbled into a good implementation of this model, but that is implausible. o3 still wildly confabulates worse than any dementia patient, still lacks the robust sense of folk physics we see in infants, etc. (This is even more apparent in video generators, Veo2 is SOTA and it still doesn't understand object permanence or gravity.) It does not make sense to say something is a model of human intelligence if it can do PhD-level written tasks but is outsmarted by literal babies (also chimps, dogs, pigeons...)
AI people toss around the term "neuron" way too freely.
[1] https://www.quantamagazine.org/how-computationally-complex-i...
> somebody figured out how to make a computer do something
Well, I would argue that in most deterministic AI systems the thinking was all done by the AI researchers and then encoded for the computer. That’s why historically it’s been easy to say, “No, the machine isn’t doing any thinking, but only applying thinking that’s embedded within.” I think that line of argument becomes less obvious when you have learning systems where the behavior is training dependent. It’s still fairly safe to argue that the best LLMs today are not yet thinking, at least not in a way a human does. But in another generation or two? It will become much harder to deny.
> It’s still fairly safe to argue that the best LLMs today are not ... thinking
I agree completely.
> But in another generation or two? It will become much harder to deny.
Unless there is something ... categorically different about what an LLM does and in a generation or two we can articulate what that is (30 years of looking at something makes it easier to understand ... sometimes).
Intelligence requires agency
> It’s still fairly safe to argue that the best LLMs today are not yet thinking, at least not in a way a human does. But in another generation or two? It will become much harder to deny.
Current LLMs have a hard division between training and inference time; human brains don’t-we train as we infer (although we probably do a mix of online/offline training: you build new connections while awake, but then pruning and consolidation happens while you sleep). I think softening the training-vs-inference division is a necessary (but possibly not sufficient) condition for closing the artificial-vs-human intelligence gap. But that softening is going to require completely different architectures from current LLMs, and I don’t think anyone has much of an idea what those new architectures will look like, or how long it will take for them to arrive
In many ways LLMs are a regression compared to what was before. They solve a huge class of problems quickly and cheaply, but they also have severe limitations that older methods didn't have.
So no, it's not a linear progress story like in a sci-fi story.
> 'that's not thinking'."
Do LLMS think? Of course they do. But thinking =/= intelligence.
Its pretty easy to define what an AI actually would look like:
A human coder sits down and writes an algorithm. In that algorithm, there is no reference to any specific piece of information on ANYTHING (including human words), whether its manually written in code or derived through training a neural net on that information and the code is just a bunch of matrix multiplies.
The algorithm has 2 interfaces - a terminal for a human to interact with, and an api to a tcp socket over which it can communicate to the world wide web.
A human could give this algorithm an instruction, like for example, "Design and build me a flying car and put it in my driveway and do not spend a single cent of my money, and do everything legally".
Provided there are no limits on communication that would result in the algorithm being perma banned of the internet, the algorithm prior to even tackling the task at hand will have to do the following at the least:
- figure out how to properly structure HTTP communication to be able to talk to servers, and essentially build an internal API.
- figure out what the words you typed mean - i.e map them to information collected from the web and
- start running internal simulations to figure out what the best course of action is
- figure out how to deal with ambiguity and ask you questions (like "how far do you want to fly"), and figure out how to deal with dead ends.
- start executing actions with preplanned risk (figuring out what risk is in the process) and learn from mistakes.
And that's just the short start.
But the key factor is that this same process that it uses to figure basic functionality is the same process (at least on the lowest level) that it would use to start designing a flying car once it has all the information it needs to "understand" the task.
And there isn't anything even remotely close on the horizon with any of the current AI research that indicates that we have any idea what that process looks like. The only claims that we can make is that its definitely recursive, not fully forward like LLMs, and its essentially a search algorithm. But what its searching and what the guidance metric is for search direction is the mystery.
> every time somebody figured out how to make a computer do something
Well, there’s the clue it is not really thinking if somebody told the machine how to do things. My Roomba isn’t intelligent because it’s been programmed to clean the floor, now is it?
Wake me up when machines learn to do something on their own. I know everybody is on the AI hype train, but please show your extraordinary evidence to your extraordinary claims first.
Who's making extraordinary claims here?
I think they referred to the claim that AIs playing checkers should be considered thinking.
> Wake me up when machines learn to do something on their own.
Be careful what you wish for if only just this once
It doesn’t mean they tie intelligence to subjective experience. Take digestion. Can a computer simulate digestion, yes. But no computer can “digest” if it’s just silicon in the corner of an office. There are two hurdles. The leap from simulating intelligence to intelligence, and the leap from intelligence to subjective experience. If the computer gets attached to a mechanism that physically breaks down organic material, that’s the first leap. If the computer gains a first person experience of that process, that’s the second.
You can’t just short-circuit from simulates to does to has subjective experience.
And the claim other humans don’t have subjective experience is such non-starter.
> And the claim other humans don’t have subjective experience is such non-starter.
There is no empirical test for the subjective experience of consciousness. You can't even prove to anybody else that you have it. We assume other people experience as we ourselves do as a basic decency we extend to other humans. This is a good thing, but it's essentially faith not science.
As for machines not having it, I'm fine with that assumption, but until there can be some sort of empirical test for it, it's not science. Thankfully, it's also not relevant to any engineering matter. Whether the machines have a subjective experience in any way comparable to our own doesn't touch any question about what demonstrable capabilities or limitations they have. We don't need to know if the computer has a ""soul"" to know if the computer can be a solution to any given engineering problem. Whether machines can have subjective experience shouldn't be considered an important question to engineers; let theologians waste their time fruitlessly debating that.
I think you're talking about consciousness rather than intelligence. While I do see people regularly distinguishing between simulation and reality for consciousness, I don't often see people make that distinction for intelligence.
> And the claim other humans don’t have subjective experience is such non-starter.
What about other primates? Other mammals? The smarter species of cephalopods?
Certain many psychopaths seem to act as if they have this belief.
This is why I want the field to go straight to building indistinguishable agents- specifically, you should be able to video chat with an avatar that is impossible to tell from a human.
Then we can ask "if this is indistinguishable from a human, how can you be sure that anybody is intelligent?"
Personally I suspect we can make zombies that appear indistinguishable from humans (limited to video chat; making a robot that appears human to a doctor would be hard) but that don't have self-consciousness or any subjective experience.
Thats not really intelligence though.
"There is considerable overlap between the intelligence of the smartest bears and the dumbest tourists."
LLMs are not artificial intelligence but artificial stupidity.
LLMs will happily hallucinate. LLMs will happily tell you total lies with complete confidence. LLMs will give you grammatically perfect completely vapid content. etc.
And yet that is still better than what most humans could do in the same situation.
We haven't proved that machines can have intelligence, but instead we are happily proving that most people, most of the time just aren't very intelligent at all.
> LLMs will happily hallucinate. LLMs will happily tell you total lies with complete confidence.
Perhaps we should avoid anthropomorphizing them too much. LLMs don't inhabit a "real world" where they can experiment and learn. Their training data is their universe, and it's likely filled with conflicting, peculiar, and untestable information.
Yes, the output is sometimes "a lie" if we apply it to our world, but in "their world" stuff is might be just strangely different. And it's not like the real world has only "hard simple truths" - quantum mechanics comes to mind about how strange stuff can be.
> yet that is still better than what most humans could do in the same situation
Yup. A depressing takeaway from LLMs is most humans don’t demonstrate a drive to be curious and understand, but instead, to sort of muddle through most (economic) tasks.
Humans are basically incapable of recognizing that there’s something that’s more powerful than them
They’re never going to actively collectively admit that that’s the case, because humans collectively are so so systematically arrogant and self possessed that they’re not even open to the possibility of being lower on the intelligence totem pole
The only possible way forward for AI is to create the thing that everybody is so scared of so they can actually realize their place in the universe
> Humans are basically incapable of recognizing that there’s something that’s more powerful than them
> They’re never going to actively collectively admit that that’s the case, because humans collectively are so so systematically arrogant and self possessed that they’re not even open to the possibility of being lower on the intelligence totem pole
For most of human history, the clear majority of humans have believed in God(s), spirits, angels, bodhisattvas, etc - beings which by definition are above us on the “totem pole” - and although atheism is much more widespread today, I think it almost certainly remains a minority viewpoint at the global level.
So I’m sceptical of your idea humans have some inherent unwillingness to believe in superhuman entities. From an atheist perspective, one might say that the (globally/historically) average human is so eager to believe in such entities, that if they don’t actually exist, they’ll imagine them and then convince themselves that their imaginings are entirely real. (Whereas, a theist might argue that the human eagerness to believe in such entities is better explained by their existence.)
Are you really that misanthropic that you think we need AI to show us how meaningless we are?
I’m curious as to how you derive your position of “misanthropic”
The answer to your question though is I do not believe that humans have at any point in history prevented an existential threat from actually being realized before it was realized
I will not be convinced a machine demonstrates intelligence until someone demonstrates a robot that can navigate 3D space as intelligently as, say, a cockroach. AFAICT we are still many years away from this, probably decades. A bunch of human language knowledge and brittle heuristics doesn't convince me at all.
This ad hominem is really irritating. People have complained since Alan Turing that AI research ignores simpler intelligence, instead trying to bedazzle people with fancy tricks that convey the illusion of human intelligence. Still true today: lots of talk about o3's (exaggerated) ability to do fancy math, little talk about its appallingly bad general quantitative reasoning. The idea of "jagged AI" is unscientific horseshit designed to sweep this stuff under the rug.
In the natural world, intelligence requires embodiment. And, depending on your point of view, consciousness. Modern AI exhibits neither of those characteristics.
How do they convince themselves that other people have intelligence too?
It is until proven otherwise because modern science still doesn’t have a consensus or standards or biological tests which can account for it. As in, highly “intelligent” people often lack “common sense” or fall prey to con artists. It’s pompous as shit to assert a black box mimicry constitutes intelligence. Wake me up when it can learn to play a guitar and write something as good as Bob Dylan and Tom Petty. Hint: we’ll both be dead before that happens.
I can't write something as good as Bob Dylan and Tom Petty. Ergo I'm not intelligent.
You have achieved enlightenment.
Now you no longer need to post here.
This to me is a weak argument. You have the ability to appreciate and judge something as good as Bob Dylan and Tom Petty. That's what makes you intelligent.
> This to me is a weak argument. You have the ability to appreciate and judge something as good as Bob Dylan and Tom Petty. That's what makes you intelligent.
What if you don't? Do you think that makes someone not intelligent?
Think about it for a second.
Yes. If you do not possess the potential ability to judge other human beings and/or their work, you lack intelligence.
Great seeing Ray Mooney (who I took a graduate class with) and Emily Bender (a colleague of many at the UT Linguistics Dept., and a regular visitor) sharing their honest reservations with AI and LLMs.
I try to stay as far away from this stuff as possible because when the bottom falls out, it's going to have devastating effects for everyone involved. As a former computational linguist and someone who built similar tools at reasonable scale for largeish social media organizations in the teens, I learned the hard way not to trust the efficacy of these models or their ability to get the sort of reliability that a naive user would expect from them in practical application.
How exactly is the bottom going to fall out? And are you really trying to present that you have practical experience building comparable tools to an LLM prior to the Transformer paper being written?
Now, there does appear to be some shenanigans going on with circular financing involving MSFT, NVIDIA, and SMCI (https://x.com/DarioCpx/status/1917757093811216627), but the usefulness of all the modern LLMs is undeniable. Given the state of the global economy and the above financial engineering issues I would not be surprised that at some point there isn't a contraction and the AI hype settles down a bit. With that said, LLMs could be made illegal and people would still continue running open source models indefinitely and organizations will build proprietary models in secret, b/c LLMs are that good.
Since we are throwing out predictions, I'll throw one out. Demand for LLMs to be more accurate will bring methods like formal verification to the forefront and I predict eventually model/agents will start to be able to formalize solved problems into proofs using formal verification techniques to guarantee correctness. At that point you will be able to trust the outputs for things the model "knows" (i.e. has proved) and use the probably correct answers the model spits out as we currently do today.
Probably something like the following flow:
1) Users enter prompts
2) Model answers questions and feeds those conversations to another model/program
3) Offline this other model uses formal verification techniques to try and reduce the answers to a formal proof.
4) The formal proofs are fed back into the first model's memory and then it uses those answers going forward.
5) Future questions that can be mapped to these formalized proofs can now be answered with almost no cost and are guaranteed to be correct.
I have argued the same theme elsewhere. Formal reasoning over LLM output is the next step for AI. Where do we go for funding?
> And are you really trying to present that you have practical experience building comparable tools to an LLM prior to the Transformer paper being written?
I believe (could be wrong) they were talking about their prior GOFAI/NLP experience when referencing scaling systems.
In any case, is it really necessary to be so harsh about over-confidence and then go on to predict the future of solving hallucinations with your formal verification ideas?
Talk is cheap. Show me the code.
When you see the code it’s going to be too late.
Don't try and say anything pro-linguistics here, people are weirdly hostile if you think it's anything but probabilities.
Over my years in academia, I noticed that the linguistics departments were always the most fiercely ideological. Almost every comment of a talk would be get contested by somebody from the audience.
It was annoying, but as a psych guy I was also jealous of them for having such clearly articulated theoretical frameworks. It really helped them develop cohesive lines of research to delineate the workings of each theory
Are there really all that many parallels between linguistics (the study of langauge) and computational-linguistics/NLP (subject of discussion)?
Computational linguistics, yes - it is the application of linguistics to computers.
Modern NLP, not really - it's all based around statistical modeling with very little linguistics.
Maybe they could’ve tried to say something pro-linguistics, but the comment was entirely anti-LLM.
God forbid someone not bow before the almighty LLM
> Don't try and say anything pro-linguistics here, (...)
Shit-talking LLMs without providing any basis or substance is not what I would call "pro-linguistics". It just sounds like petty spiteful behavior, lashing out out of frustration for rendering old models obsolete.
From a scientific explanatory perspective, the old models are not obsolete because they are explanatory whereas LLMs do not explain anything about human linguistic behaviour.
On one hand you have models which you argue are explanatory, but arguably do not work.
On the other hand, you have models that not only work but took the world by storm, and may or may not be made explanatory.
You either invest more work getting one explanatory model to work, or invest more work getting a working model to become explanatory.
What do you think is the fruitful research path?
What do you mean "work"? Their goal is not to be some general purpose AI. (To be clear I'm talking narrowly about computational linguistics, not old fashioned NLP more broadly).
The interesting question is whether just a gigantic set of probabilities somehow captures things about language and cognition that we would not expect ...
They are far far more capable than anything your fellow computational linguists have come up with.
As the saying goes, 'every time I fire a linguist, the performance of the speech recognizer goes up'
Curious what you are expecting when you say "bottom falls out". Are you expecting significant failures of large-scale systems? Or more a point where people recognize some flaw that you see in LLMs?
> learned the hard way not to trust the efficacy of these models or their ability to get the sort of reliability that a naive user would expect from them in practical application
But…they work. Linguistics as a science is still solid. But as a practical exercise, it seems to be moot other than for finding niches where LLMs are too pricey.
Lots of great quotes in this piece but this one stuck out for me:
> TAL LINZEN: It’s sometimes confusing when we pretend that there’s a scientific conversation happening, but some of the people in the conversation have a stake in a company that’s potentially worth $50 billion.
I think the AI hype (much of which is justified) detracts from the fact that we have Actually Good NLP at last. I've worked on NL2SQL in both the before and after times, and it's still not a solved problem, but it's frustrating to talk to AI startup people who have never really thought deeply about named entity recognition, disambiguation etc. The tools are much, much better. The challenges and pitfalls remain much the same.
As someone who dropped out of NLP during the chaos, all this stuff honestly feels way too familiar - progress is cool but watching your work become pointless overnight stings hard.
I’m curious how have large language models impacted linguistics and particularly the idea of a universal grammar?
There's a lot of debate about it. Here's one view: https://arxiv.org/abs/2501.17047
This paper is awful. They bizarrely argue the fact that transformers are not very sensitive to word order as a positive of transformers despite the fact that's not how languages work. There's also this absurd passage.
>However, a closer look at the statistical structure of language use reveals that word order contains surprisingly little information over and above lexical information. To see this intuitively, imagine we give you a set of words {dogs, bones, eat} without telling you the original order of the words. You can still reconstruct the meaning based entirely on (1) the meanings of the words in isolation and (2) your knowledge of how the world works—dogs usually eat bones; bones rarely eat dogs. Indeed, many languages show a high level of nondeterminism in word order (Futrell et al., 2015b; Koplenig et al., 2017), and word order cues are often redundant with meaning or case markers (Pijpops and Zehentner, 2022; Mahowald et al., 2023). The fact that word order is relatively uninformative in usage also partly explains why bag-of-words methods dominated NLP tasks until around 2020, consistently outperforming much more sophisticated approaches: it turns out that most of the information in sentences is in fact present in the bag of words.
While it is certainly possible to guess that your interlocutor meant "dogs eat bones", the sentence "bones eat dogs" is entirely possible (if unlikely)! For example, imagine a moving skeleton in a video game or something. The idea that word order isn't vital to meaning is deeply unserious. (Of course there are languages where word order matters less, but there are still important rules about constituent structure, clausal embedding etc, which constrain word order).
Has there been an LLM that reliably does not ignore the word "not"?
Because I'm pretty sure that's a regression compared to most prior NLP.
> Has there been an LLM that reliably does not ignore the word "not"?
Curious. I would expect most of them to get that right, unless it's an intentionally tricky question. Do you have an example?
It tends to happen for any prompt that calls for generating a piece of output for which there are many valid answers, but one is highly weighted and you want variety. Do you remember that meme a few years ago where people were asked to generate a color and then a hand tool, and most people immediately responded "erqunzzrebengyrnfgbarbsgubfrgjb"? (rot13+padding for those who haven't done this)
This particular example is too small to regularly trip AIs, but as a general rule I do not consider it tricky to try to textually negative-prompt to remove a commonly-valid-but-not-currently-wanted response. (Obviously, if you manually tweak the weights to forbid something rather than using the word "not", this fails.)
From my very rough observations, for models that fit on a local device, it typically starts to happen maybe 10% of the time when the prompt reaches 300 characters or so (specifying other parts of what you want); bigger models just need a bit more input before they fail. Reasoning models might be better, but we can watch them literally burn power running nonsensical variations through the thought pane so they're far from a sure answer.
This happens in any context you can think of: from scratch or extending an existing work; single or list; information lookup, prose generation, code generation (consider "Extend this web app to do lengthy-description-of-some-task. Remember I am not using React you stupid piece of shit AI!").
"It helps to have tenure when something like this happens."
I wonder whether tenures are causing inefficiencies in the market? You might be encouraging someone to work on an outdated field without the correct incentives.
Just like having employees with experience, I guess.
But tenured researchers are supposed to have some more protection specifically because they do research (and reach conclusions) on topics that people in leadership positions in society might not like.
I was contrasting FiNER, GliNER, and Smolagents in a recent blog post on my substack and while the first two are fast and provide somewhat good results, running a LLM locally is 10x better easily.
Would love to read that post - we’re considering using GliNER for discrete parts of our ingestion pipeline where we assumed it would be a great perf/$ drop-in for larger models.
This looks like the post: https://jdsemrau.substack.com/p/finer-gliner-and-smolagents-...
Thank you
For me as a lay-person, the article is disjointed and kinda hard to follow. It's fascinating that all the quotes are emotional responses or about academic politics. Even now, they are suspicious of transformers and are bitter that they were wrong. No one seems happy that their field of research has been on an astonishing rocketship of progress in the last decade.
The way I see this is that for a long time there was an academic field that was working on parsing natural human language and it was influenced by some very smart people who had strong opinions. They focused mainly on symbolic approaches to parsing, rather than probabilistic. And there were some fairly strong assumptions about structure and meaning. Norvig wrote about this: https://norvig.com/chomsky.html and I think the article bears repeated, close reading.
Unfortunately, because ML models went brr some time ago (Norvig was at the leading edge of this when he worked on the early google search engine and had access to huge amounts of data), we've since seen that probabilistic approaches produce excellent results, surpassing everything in the NLP space in terms of producing real-world sysems, without addressing any of the issues that the NLP folks believe are key (see https://en.wikipedia.org/wiki/Stochastic_parrot and the referenced paper). Personally I would have preferred if the parrot paper hadn't also discussed environmental costs of LLMs, and focused entirely on the semantic issues associated with probabilistic models.
I think there's a huge amount of jealousy in the NLP space that probabilistic methods worked so well, so fast (with transformers being the key innovation that improved metrics). And it's clear that even state-of-the-art probabilistic models lack features that NLP people expected.
Repeatedly we have seen that probabilistic methods are the most effective way to make forward progress, provided you have enough data and good algorithms. It would be interesting to see the NLP folks try to come up with models that did anything near what a modern LLM can do.
This is pretty much correct. I'd have to search for it but I remember an article from a couple years back that detailed how LLMs blew up the field of NLP processing overnight.
Although I'd also offer a slightly different lens through which to look at the reaction of other researchers. There's jealousy, sure, but overnight a ton of NLP researchers basically had to come to terms with the fact that their research was useless, at least from a practical perspective.
For example, imagine you just got your PhD in machine translation, which took you 7 years of laboring away in grad/post grad work. Then something comes out that can do machine translation several orders of magnitude better than anything you have proposed. Anyone can argue about what "understanding" means until they're blue in the face, but for machine translation, nobody really cares that much - people just want to get text in another language that means the same thing as the original language, and they don't really care how.
Tha majority of research leads to "dead ends", but most folks understand that's the nature of research, and there is usually still value in discovering "OK, this won't work". Usually, though, this process is pretty incremental. With LLMs all of a sudden you had lots of folks whose life work was pretty useless (again, from a practical perspective), and that'd be tough for anyone to deal with.
You might be thinking of this article by Sebastian Ruder: https://www.ruder.io/nlp-imagenet/
Note that the author has a background spanning a lot of the timespans/topics discussed - much work in multilingual NLP, translation, and more recently at DeepMind, Cohere, and Meta (in other words, someone with a great perspective on everything in the top article).
Re: Machine Translation, note that Transformers were introduced for this task, and built on one of the earlier notions of attention in sequence models: https://arxiv.org/abs/1409.0473 (2014, 38k citations)
That's not to say there weren't holdouts or people who really were "hurt" by a huge jump in MT capability - just that this is a logical progression in language understanding methods as seen by some folks (though who could have predicted the popularity of chat interfaces).
Yes, I think a lot of NLP folks must’ve had their “God does not play dice with the univers(al grammar)” moment.
The majority of NLP people were not into universal grammar at all.
I wouldn't say NLP as a field was resistant to probabilistic approaches or even neural networks. From maybe 2000-2018 almost all the papers were about using probabilistic methods to figure out word sense disambiguation or parsing or sentiment analysis or whatever. What changed was that these tasks turned out not to be important for the ultimate goal of making language technologies. We thought things like parsing were going to be important because we thought any system that can understand language would have to do so on the basis of the parse tree. But it turns out a gigantic neural network text generator can do nearly anything we ever wanted from a language technology, without dealing with any of the intermediate tasks that used to get so much attention. It's like the whole field got short-circuited.
The way I have experienced this, starting from circa 2018, it was a bit more incremental. First, LSTMs and then transformers lead to new heights on the old tasks, such as syntactic parsing and semantic role labelling, which was sad for the previous generation, but at least we were playing the same game. But then not only the old tools of NLP, but the research questions themselves became irrelevant because we could just ask a model nicely and get good results on very practical downstream tasks that didn't even exist a short while ago. NLP suddenly turned into general document/information processing field, with a side hustle in conversational assistants. Already GPT2 essentially mastered the grammar of English, and what difficulties remain are super-linguistic and have more to do with general reasoning. I would say that it's not that people are bitter that other people make progress, it's more that there is not much progress to be had in the old haunts at all.
I think you greatly understate the impact as EVERYONE is freaking the fuck out about AI, not just NLP researchers.
AI is obliterating the usefulness of all mental work. Look at the high percentage of HN articles trying to figure out whether LLMs can eliminate software developers. Or professional writers. Or composers. Or artists. Or lawyers.
Focusing on the NLP researchers really understates the scope of the insecurity induced by AI.
As someone in NLP who lived through this experience: there's something uniquely ironic and cruel about building the wave that washes yourself away.
All of this matches my understanding. It was interesting taking an NLP class in 2017, the professors said basically listen, this curriculum is all historical and now irrelevant given LLMs, we’ll tell you a little about them but basically it’s all cutting edge sorry.
Same for my nlp class of 2021. Just directly went onto talking about transformers after a brief intro of the old stuff
Even 15-ish years ago when I was in school, the NLP folks viewed probabilistic models with suspicion. NLP treated everyone from our Math department with suspicion and gave them a hard time. It created so many politics that some folks who wanted to do statistical approaches would call themselves CS so that the NLP old guard wouldn't give them a hard time.
Sounds like the bitter lesson is bitter indeed!
On the contrary, to some of us (who have focused on probability, big data, algorithms, and HPC, while eschewing complex theories that require geniuses to understand) the bitter lesson is incredibly sweet.
Very much like when I moved from tightly coupled to "embarassing" parallelism. A friend said "don't call it embarassing... it's pleasant not to have to think about hard distributed computing problems".
The progression reminds me of how brute force won out in the chess AI game long ago with Deep Blue. Custom VLSI and FPGA acceleration and all.
> most effective way to make forward progress
powerful response but.. "fit for what purposes" .. All of human writings are not functionally equivalent. This has been discussed at length. e.g. poetry versus factual reporting or summation..
https://www.amazon.com/dp/B0DYDGZTMV makes the case that DeepSeek is a poet.
At least the author is upfront that the poetry is a showcase of AI.
I agree with criticism of Noam Chomsky as a linguist. I was raised in the typological tradition which has its very own kind of beef with Chomsky due to other reasons (his singular focus on English for constructing his theories amongst other things), but his dislike of statistical methods was of course equally suspect.
Nevertheless there is something to be said for classical linguistic theory in terms of constituent (or dependency) grammars and various other tools. They give us much simpler models that, while incomplete, can still be fairly useful at a fraction of the cost and size of transformer architectures (e.g. 99% of morphology can be modeled with finite state machines). They also let us understand languages better - we can't really peek into a transformer to understand structural patterns in a language or to compare them across different languages.
That is simply false about UG only being based on English. Maybe in 1950 but any modern generativist theory uses data from many, many languages and English has been re-analysed in light of other languages (see here for an example of quantifiers being analysed in English on the basis of data in a Salish language https://philpapers.org/rec/MATQAT )
do transformers not use a symbolic and a probabilistic approach?
Well, if you’ve built a career on something, you will usually actively resist anything that threatens to destroy it.
In other words, what is progress for the field might not be progress for you !
This reminds me of Thomas Kuhn’s excellent book ´the structure of scientific revolutions’ https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Re...
It reminds me much more of Paul Feyerabend's even better book "Against Method" https://en.wikipedia.org/wiki/Against_Method
Or Planck's principle - "Science progresses one funeral at a time".
And the more general version, “Humanity progresses one funeral at a time.” Which is why the hyper-longevity people are basically trying to freeze all human progress.
Or Effective Altruism's long-termism that effectively makes everyone universally poor now. Interestingly, Guillaume Verdon (e/acc) is friends with Bryan Johnson and seems to be pro-longevity.
Can you elaborate?
It's a truly bitter pill to swallow when your whole area of research goes redundant.
I have a bit of background in this field so it's nice to see even people who were at the top of the field raise concerns that I had. That comment about LHC was exactly what I told my professor. That the whole field seems to be moving in a direction where you need a lot of resources to do anything. You can have 10 different ideas on how to improve LLMs but unless you have the resources there is barely anything you can do.
NLP was the main reason I pursued an MS degree but by the end of my course I was not longer interested in it mostly because of this.
> That the whole field seems to be moving in a direction where you need a lot of resources to do anything. You can have 10 different ideas on how to improve LLMs but unless you have the resources there is barely anything you can do.
I think you're confusing problems, or you're not realizing that improving the efficiency of a class of models is a research area on it's own. Look at any field that involves expensive computational work. Model reduction strategies dominate research.
> No one seems happy that their field of research has been on an astonishing rocketship of progress in the last decade.
Well, they're unhappy that an unrelated field of research more-or-less accidentally solved NLP. All the specialized NLP techniques people spent a decade developing were obviated by bigger deep learning models.
Have we already forgotten what AlexNet did to Computer Vision as a research domain?
The field is natural language processing.
I think we can squeeze it in there. Thanks!
My view is that "traditional" NLP will get re-incorporated into LLMs (or their successors) over time. We just didn't get to it yet. Appropriate inductive biases will only make LLMs better, faster and cheaper.
There will always be trouble in LLM "paradise" and desire to take it to the next level. Use raw-accessed (highest performing) LLM, intensely, for coding and you will rack up $10-$20/hr bill. China is not supposed to have adequate GPUs at their disposal, - they will come up with smaller and more efficient models. Etc, etc, etc...