The Timmy Trap

(jenson.org)

132 points | by metadat 7 hours ago ago

115 comments

> LLMs mimic intelligence, but they aren’t intelligent.

I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?

I see two possibilities:

1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.

2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.

[-]

libraryofbabel 4 hours ago

Agree. This article would had been a lot stronger if it had just concentrated on the issue of anthropomorphizing LLMs, without bringing “intelligence” into it. At this point LLMs are so good at a variety of results-oriented tasks (gold on the Mathematical Olympiad, for example) that we should either just call them intelligent or stop talking about the concept altogether.

But the problem of anthropomorphizing is real. LLMs are deeply weird machines - they’ve been fine-tuned to sound friendly and human, but behind that is something deeply alien: a huge pile of linear algebra that does not work at all like a human mind (notably, they can’t really learn form experience at all after training is complete). They don’t have bodies or even a single physical place where their mind lives (each message in a conversation might be generated on a different GPU in a different datacenter). They can fail in weird and novel ways. It’s clear that anthropomorphism here is a bad idea. Although that’s not a particularly novel point.

dkdcio 5 hours ago

> I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?

the ability for long-term planning and, more cogently, actually living in the real world where time passes

[-]

hackyhacky 5 hours ago

> the ability for long-term planning and, more cogently, actually living in the real world where time passes

1. LLMs seem to be able to plan just fine.

2. LLMs clearly cannot be "actually living" but I fail to see how that's related to intelligence per se.

[-]

dkdcio 4 hours ago

if it’s not actually living it’s not making intelligent decisions. if I make a grocery list, and go to my store, and the store isn’t there, what do I do? I make an intelligent decision about what to do next (probably investigating wtf happened, then going to the second nearest store)

my genuine question is how does a LLM handle that situation? and as you point out, it’s an absurd comparison

[-]

hackyhacky 4 hours ago

So your definition of "living" means going to the grocery store?

If you want to know how an LLM would handle that situation, why don't you ask it?

[-]

dkdcio 3 hours ago

to answer your first question, no, that is not what I’m saying. for your second question, you’re entirely missing the point of mine

a LLM cannot actually be intelligent if it cannot operate in a temporal context ;)

[-]

hackyhacky 3 hours ago

I agree that I am missing your point. Can you please clarify?

> a LLM cannot actually be intelligent if it cannot operate in a temporal context ;)

When I have a conversation with an LLM, that conversation happens in time. It has a beginning, a middle, and an end. The conversation can refer to earlier parts of the conversation. How is that not a "temporal context"?

Furthermore, can you explain why a temporal context is necessary for intelligence? For example, if a human being could download their entire brain into a computer and exist there, as if they were an LLM, would they cease to be intelligent, in your view?

[-]

dkdcio 3 hours ago

> It has a beginning, a middle, and an end. The conversation can refer to earlier parts of the conversation. How is that not a "temporal context"?

This is not what I mean for a few reasons:

1. This context literally has limits; we'll get back to the grocery store 2. This is a point-in-time conversation

On the latter point, that is, you can have the same conversation tomorrow. The LLM has not "learned" anything, it has not adapted in any way. Yes, you are experiencing time, and the conversation is happening over time, but the LLM is not experiencing nor aware of time and is not intelligently adapting to it. Yes, they get trained and "updated" in that way, it's not the same thing.

If you don't respond for an hour, then do, the LLM is not aware of that unless its system injects a "datetime.now()" somewhere in the prompt. Point of this being: an LLM is not an adaptable system. Now you can play the "What if?" game ad ininfinitum -- make it aware of the current time, current location, etc. etc.

Hence my grocery store example. If I go out into the real world, I experience real things, and I make intelligence decisions based off those experiences. An LLM cannot do that, just full stop. And again, you can go "well what if I put the LLM in a robot body, and give it a system, then it can go grocery shopping". And only at this point are we kinda-sorta-close to having a discussion about intelligence. If this mythical creature can go to the grocery store, notice it's not there, look up what happened, maybe ask some friends who live in the same city if they know, maybe make some connection months later to some news article...a LLM or system we build on an LLM cannot do this. It cannot go into the store and think "ya know, if I buy all this ice cream and eat it, that could be bad" and connect it to the million other things a real person is doing and considering in their day to day life

The actual world is practically infinitely complex. Talking about "a LLM writing a list is planning and that shows intelligence" is a frighening attenuation of intelligence in the world world to anthropomorphization to a very high degree. Reframing as "intelligence needs to be able to adapt to the world around it over time" is a much better starting point IMO

Applejinx 36 minutes ago

No, they're echoing previous examples of people planning, by framing prompts and recursively designed prompts to incorporate what, in fairness, is a large database including the text of people planning.

It still matters that there's nobody in there. You're figuring out better ways to tap into the history of language-users having represented planning in language. As such, this seems a brittle way to represent 'planning'.

aDyslecticCrow 5 hours ago

Is making a list the act of planning?

[-]

hackyhacky 3 hours ago

> Is making a list the act of planning?

Depends on the content of the list.

A list of the names of the seven dwarfs: no, not an act of planning.

A list of steps necessary to build a web site: yes, an act of planning.

libraryofbabel 4 hours ago

> actually living in the real world where time passes

sure, but it feels like this is just looking at what distinguishes humans from LLMs and calling that “intelligence.” I highlight this difference too when I talk about LLMs, but I don’t feel the need to follow up with “and that’s why they’re not really intelligent.”

[-]

dkdcio 3 hours ago

well the second part (implied above, I didn’t actually write it) is “and operate intelligently in that world”. talking about “intelligence” in some abstract form where “does this text output constitute intelligence” is hyper silly to me. the discussion should anchor on real-world consequences, not the endless hypotheticals we end up with in these discussions

card_zero 5 hours ago

It may be the case that the failures of the ability of the machine (2) are best expressed by reference to the shortcomings of its internal workings (1), and not by contrived tests.

[-]

hackyhacky 5 hours ago

It might be the case, but if those shortcomings are not visible in the results of the machine (and therefore not interpretable by a test), why do its internal workings even matter?

[-]

card_zero 5 hours ago

I'm saying best expressed. Like, you see the failures in the results, but trying to pin down exactly what's the matter with the results means you resort to a lot of handwaving and abstract complaints about generalities. So if you knew how the internals had to be that would make the difference, you could lean on that.

sobiolite 7 hours ago

The article says that LLMs don't summarize, only shorten, because...

"A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text."

Then later says...

"LLMs operate in a similar way, trading what we would call intelligence for a vast memory of nearly everything humans have ever written. It’s nearly impossible to grasp how much context this gives them to play with"

So, they can't summarize, because they lack context... but they also have an almost ungraspably large amount of context?

[-]

usefulcat 5 hours ago

I think "context" is being used in different ways here.

> "It’s nearly impossible to grasp how much context this gives them to play with"

Here, I think the author means something more like "all the material used to train the LLM".

> "A true summary, the kind a human makes, requires outside context and reference points."

In this case I think that "context" means something more like actual comprehension.

The author's point is that an LLM could only write something like the referenced summary by shortening other summaries present in its training set.

jchw 6 hours ago

I think the real takeaway is that LLMs are very good at tasks that closely resemble examples it has in its training. A lot of things written (code, movies/TV shows, etc.) are actually pretty repetitive and so you don't really need super intelligence to be able to summarize it and break it down, just good pattern matching. But, this can fall apart pretty wildly when you have something genuinely novel...

[-]

strangattractor 4 hours ago

Is anyone here aware of LLMs demonstrating an original thought? Something truly novel.

My own impression is something more akin to a natural language search query system. If I want a snippet of code to do X it does that pretty well and keeps me from having to search through poor documentation of many OSS projects. Certainly doesn't produce anything I could not do myself - so far.

Ask it about something that is currently unknown and it list a bunch of hypotheses that people have already proposed.

Ask it to write a story and you get a story similar to one you already know but with your details inserted.

I can see how this may appear to be intelligent but likely isn't.

[-]

Earw0rm 2 hours ago

If I come up with something novel while using an LLM, which I wouldn't have come up with had I not had the LLM at my bidding, where did the novelty really come from?

[-]

kylebyte 2 hours ago

If I came up with something novel while watching a sunrise, which I wouldn't have come up with had I not been looking at it, where did the novelty really come from?

[-]

Earw0rm 2 hours ago

If the sunrise might have inspired the same idea in someone else, that's a good question.

I've no priors as to how original you are, nor how humble, so I mean this in a general, rather than personal, sense.

jchw 4 hours ago

Well that's the tricky part: what is novel? There are varying answers. I think we're all pretty unoriginal most of the time, but at the very least we're a bit better than LLMs at mashing together and synthesizing things based on previous knowledge.

But seriously, how would you determine if an LLM's output was novel? The training data set is so enormous for any given LLM that it would be hard to know for sure that any given output isn't just a trivial mix of existing data.

gus_massa 6 hours ago

Humans too. If I were too creative writing the midterm, most of my students would fail and everyone would be very unhappy.

[-]

BobaFloutist 5 hours ago

That's because midterms are specifically supposed to assess how well you learned the material presented (or at least directed to), not your overall ability to reason. If you teach a general reasoning class, getting creative with the midterm is one thing, but if you're teaching someone how to solve differential equations, they're learning to the very edge of their ability in a given amount of time, and you present them with an example outside of what's been described, it kind of makes sense that they can't just already solve it. I mean, that's kind of the whole premise of education, that you can't just present someone with something completely outside of their experience and expect them to derive from first principles how it works.

[-]

throwway120385 2 hours ago

I would argue that on a math midterm it's entirely reasonable to show a problem they've never seen before and test whether they've made the connection between that problem and the problems they've seen before. We did that all the time in upper division Physics.

[-]

BobaFloutist 2 hours ago

A problem they've never seen before, of course. A problem that requires a solving strategy or tool they've never seen before (above and beyond synthesis of multiple things they have seen before) is another matter entirely.

It's like the difference between teaching kids rate problems and then putting ones with negative values or nested rates on a test versus giving them a continuous compound interest problem and expecting them to derive e, because it is fundamentally about rates of change, isn't it?

card_zero 5 hours ago

That's exams, not humanity.

jchw 5 hours ago

I honestly think that reflects more on the state of education than it does human intelligence.

My primary assertion is that LLMs struggle to generalize concepts and ideas, hence why they need petabytes of text just to often fail basic riddles when you muck with the parameters a little bit. People get stuck on this for two reasons: one, because they have to reconcile this with what they can see LLMs are capable of, and it's just difficult to believe that all of this can be accomplished without at least intelligence as we know it; I reckon the trick here is that we simply can't even conceive of how utterly massive the training datasets for these models are. We can look at the numbers but there's no way to fully grasp just how vast it truly is. The second thing is definitely the tendency to anthropomorphize. At first I definitely felt like OpenAI was just using this as an excuse to hype their models and come up with reasons for why they can never release weights anymore; convenient. But also, you can see even engineers who genuinely understand how LLMs work coming to the conclusion that they've become sentient, even though the models they felt were sentient now feel downright stupid compared to the current state-of-the-art.

Even less sophisticated pattern matching than what humans are able to do is still very powerful, but it's obvious to me that humans are able to generalize better.

btown 6 hours ago

It's an interesting philosophical question.

Imagine an oracle that could judge/decide, with human levels of intelligence, how relevant a given memory or piece of information is to any given situation, and that could verbosely describe which way it's relevant (spatially, conditionally, etc.).

Would such an oracle, sufficiently parallelized, be sufficient for AGI? If it could, then we could genuinely describe its output as "context," and phrase our problem as "there is still a gap in needed context, despite how much context there already is."

And an LLM that simply "shortens" that context could reach a level of AGI, because the context preparation is doing the heavy lifting.

The point I think the article is trying to make is that LLMs cannot add any information beyond the context they are given - they can only "shorten" that context.

If the lived experience necessary for human-level judgment could be encoded into that context, though... that would be an entirely different ball game.

[-]

entropicdrifter 5 hours ago

I agree with the thrust of your argument.

IMO we already have the technology for sufficient parallelization of smaller models with specific bits of context. The real issue is that models have weak/inconsistent/myopic judgement abilities, even with reasoning loops.

For instance, if I ask Cursor to fix the code for a broken test and the fix is non-trivial, it will often diagnose the problem incorrectly almost instantly, hyper-focus on what it imagines the problem is without further confirmation, implement a "fix", get a different error message while breaking more tests than it "fixed" (if it changed the result for any tests), and then declare the problem solved simply because it moved the goalposts at the start by misdiagnosing the issue.

tovej 6 hours ago

You can reconcile these points by considering what specific context is necessary. The author specifies "outside" context, and I would agree. The human context that's necessary for useful summaries is a model of semantic or "actual" relationships between concepts, while the LLM context is a model of a single kind of fuzzy relationship between concepts.

In other words the LLM does not contain the knowledge of what the words represent.

ratelimitsteve 6 hours ago

I think the differentiator here might not be the context it has, but the context it has the ability to use effectively in order to derive more information about a given request.

kayodelycaon 6 hours ago

They can’t summarize something that hasn’t been summarized before.

[-]

timmg 6 hours ago

About a year ago, I gave a film script to an LLM and asked for a summary. It was written by a friend and there was no chance it or its summary was in the training data.

It did a really good -- surprisingly good -- job. That incident has been a reference point for me. Even if it is anecdotal.

[-]

pc86 6 hours ago

I'm not as cynical as others about LLMs but it's extremely unlikely that script had multiple truly novel things in it. Broken down into sufficient small pieces it's very likely every story element was present multiple times in the LLM's training data.

[-]

hnfong 5 hours ago

https://en.wikipedia.org/wiki/No_true_Scotsman

Spivak 6 hours ago

I'm not sure I understand the philosophical point being made here. The LLM has "watched" a lot of movies and so understands the important parts of the original script it's presented with. Are we not describing how human media literacy works?

[-]

BobaFloutist 5 hours ago

The point is that if you made a point to write a completely novel script, with (content-wise, not semantically) 0 DNA in it from previous movie scripts, with an unambiguous but incoherent and unstructured plot, your average literate human would be able summarize what happened on the page, for all that they'd be annoyed and likely distressed by how unusual it was; but that an LLM would do a disproportionately bad job compared to how well they do at other things, which makes us reevaluate what they're actually doing and how they actually do it.

It feels like they've mastered language, but it's looking more and more like they've actually mastered canon. Which is still impressive, but very different.

[-]

throwway120385 2 hours ago

This tracks, because the entire system reduces to a sophisticated regression analysis. That's why we keep talking about parameters and parameter counts. They're literally talking about the number of parameters that they're weighting during training. Beyond that there are some mathematical choices in how you interrelated the parameters that yields some interesting emergent phenomena, and there are architecture choices to be made there. But the whole thing boils down to regression, and regression is at its heart a development of a canon from a representative variety of examples.

We are warned in statistics to be careful when extrapolating from a regression analysis.

pc86 5 hours ago

I'm not making a philosophical point. The earlier comment is "I updated a new script and it summarized it," I was simply saying the odds of that script actually being new is very slim. Even though obviously that script or summaries of it do not exist in their entirety in the training data, its individual elements almost certainly do. So it's not really a novel (pun unintended?) summarization.

originalcopy 2 hours ago

I'd like to see some examples of when it struggles to do summaries. There were no real examples in the text, besides one hypothetical which ChatGPT made up.

I think LLMs do great summaries. I am not able to come up with anything where I could criticize it and say "any human would come up with a better summary". Are my tasks not "truly novel"? Well, then I am not able, as a human, to come up with anything novel either.

naikrovek 6 hours ago

they can, they just can't do it well. at no point does any LLM understand what it's doing.

[-]

kblissett 6 hours ago

If you think they can't do this task well I encourage you to try feeding ChatGPT some long documents outside of its training cutoff and examining the results. I expect you'll be surprised!

kayodelycaon 6 hours ago

It can produce something that looks like a summarization based on similarly matching texts.

Depending how unique the text is determines how accurate the summarization is likely to be.

Joeri 4 hours ago

LLMs mimic intelligence, but they aren’t intelligent.

They aren’t just intelligence mimics, they are people mimics, and they’re getting better at it with every generation.

Whether they are intelligent or not, whether they are people or not, it ultimately does not matter when it comes to what they can actually do, what they can actually automate. If they mimic a particular scenario or human task well enough that the job gets done, they can replace intelligence even if they are “not intelligent”.

If by now someone still isn’t convinced that LLMs can indeed automate some of those intelligence tasks, then I would argue they are not open to being convinced.

[-]

shafoshaf 4 hours ago

They can mimic well documented behavior. Applying an LLM to a novel task is where the model breaks down. This obviously has huge implications for automation. For example, most business do not have unique ways of handling accounting transactions, yet each company has a litany of AR and AP specialists who create semmingly unique SOPs. LLMs can easily automate those workers since they are simply doing a slight variation at best of a very well documented system.

Asking an LLM to take all this knowledge and apply it to a new domain? That will take a whole new paradigm.

intalentive 3 hours ago

Good point about the Turing Test:

>The original Turing Test was designed to compare two participants chatting through a text-only interface: one AI and one human. The goal was to spot the imposter. Today, the test is simplified from three participants to just two: a human and an LLM.

By the original meaning of the test it's easy to tell an LLM from a human.

nojs 5 hours ago

Even stronger than our need to anthropomorphize seems to be our innate desire to believe our species is special, and that “real intelligence” couldn’t ever be replicated.

If you keep redefining real intelligence as the set of things machines can’t do, then it’s always going to be true.

[-]

safetytrick 5 hours ago

Yes, I agree, we seem to need to feel "special".

Language is really powerful, I think it's a huge part of our intelligence.

The interesting part of the article to me is the focus on fluency. I have not seen anything that LLMs do well that isn't related to powerful utilization of fluency.

ArnavAgrawal03 5 hours ago

> They had known him for only 15 seconds, yet they still perceived the act of snapping him in half as violent.

This is right out of Community

[-]

WorkerBee28474 2 hours ago

Clip from s01e01: https://www.youtube.com/watch?v=z906aLyP5fg

umanwizard 6 hours ago

The article claims (without any evidence, argument or reason) that LLMs are not intelligent, then simply refuses to define intelligence.

How do you know LLMs aren't intelligent, if you can't define what that means?

[-]

energy123 6 hours ago

It's strange seeing so many takes like this two weeks after LLMs won gold medals at IMO and IOI. The cognitive dissonance is going to be wild when it all comes to a head in two years.

[-]

oytis an hour ago

I've seen these claims, and Google even published the texts of the solutions, but it still didn't published the full log of interaction between the model and operator.

aprilthird2021 6 hours ago

IBM Watson won Jeopardy years ago, was it intelligent?

[-]

perching_aix 6 hours ago

> Rather than being given questions, contestants are instead given general knowledge clues in the form of answers and they must identify the person, place, thing, or idea that the clue describes, phrasing each response in the form of a question. [0]

Doesn't sound like a test of intelligence to me, so no.

[0] https://en.wikipedia.org/wiki/Jeopardy!

[-]

aprilthird2021 6 hours ago

Why? Computers also won chess years ago, but they're not intelligent either? Why is winning a math competition intelligent but a trivia competition or a chess competition not intelligent?

[-]

umanwizard 5 hours ago

Math and chess are similar in the sense that for humans, both require creativity, logical problem solving, etc.

But they are not at all similar for computers. Chess has a constrained small set of rules and it is pretty straightforward to make a machine that beats humans by brute force computation. Pre-Leela chess programs were just tree search, a hardcoded evaluation function, and lots of pruning heuristics. So those programs are really approaching the game in a fundamentally different way from strong humans, who rely much more on intuition and pattern-recognition rather than calculation. It just turns out the computer approach is actually better than the human one. Sort of like how a car can move faster than a human even though cars don’t do anything much like walking.

Math is not analogous: there’s no obvious algorithm for discovering mathematical proofs or solving difficult problems that could be implemented in a classical, pre-Gen AI computer program.

[-]

aDyslecticCrow 4 hours ago

> there’s no obvious algorithm for discovering mathematical proofs or solving difficult problems that could be implemented in a classical, pre-Gen AI computer program.

Fundamentally opposite. Computer algorithms have been part of math research since they where invented, and mathematical proof algorithms are widespread and excellent.

The llms that are now "intelligent enough to do maths" are just trained to rephrase questions into prolog code.

[-]

umanwizard 4 hours ago

> The llms that are now "intelligent enough to do maths" are just trained to rephrase questions into prolog code.

Do you have a source that talks about this?

perching_aix 5 hours ago

I don't wish to join you in framing intelligence as a step function.

I think winning a Go or a chess competition does demonstrate intelligence. And winning a math competition does even more so.

I do not think a trivia competition like Jeopardy demonstrates intelligence much at all, however. Specifically because it reads like it's not about intelligence, but about knowledge: it tests for association and recall, not for performing complex logical transformations.

This isn't to say I consider these completely independent. Most smart people are both knowledgeable and intelligent. It's just that they are distinct dimensions in my opinion.

You wouldn't say something tastes bad because its texture feels weird in your mouth, would you?

[-]

tjr 4 hours ago

I might even think that a symbolic chess program is in some sense more intelligent than a modern LLM. It has a concrete model of the world it operates in along with representation what it can, cannot, and is trying to, do. When LLMs get the right answer, it seems more like... highly-optimized chance, rather than coming from any sort of factual knowledge.

aDyslecticCrow 4 hours ago

> I think winning a Go or a chess competition does demonstrate intelligence.

Chess is a simple alfa beta pruned minmax seaech tree. If that's intelligent then a drone flight controller or a calculator is aswell.

> association and recall, not for performing complex logical transformations.

By that definition humans doing chess aren't as intelligent as a computer doing chess, since high level chess is heavily reliant on memory and recall of moves and progressions.

So your definition falls appart.

[-]

perching_aix 4 hours ago

> So your definition falls apart.

I did not share any definitions, only vague opinions. Not that I'd know what it means for a definition to "fall apart".

And the specific bit you cite is barely even a vague opinion; it is my interpretation of the show "Jeopardy!" based on the Wiki article (I've never seen a single episode, wasn't really a thing where I'm from):

> Specifically because it reads like it's about (...) knowledge: it tests for association and recall (...)

Also:

> By that definition humans doing chess aren't as intelligent as a computer doing chess, since high level chess is heavily reliant on memory and recall of moves and progressions.

Yes, I did find this really quite disappointing and disillusioning when I first learned about it. A colleague of mine even straight up quit competitive chess over it.

[-]

aDyslecticCrow 3 hours ago

> it is my interpretation of the show "Jeopardy!" based on the Wiki article

You are spot on though. I mostly wanted to argue that no decent distinction can be made here.

> I did find this really quite disappointing and disillusioning when I first learned about it

ye... same here.

---

I'm personally in the camp that "intelligence" is a human concept. A metric to compare humans. Applying it to computers makes us anthropomorphism computers and think of them as people. Thinking of LLMs as people makes us trust them with bad things.

So we should call them impressive, fluent, fast, useful, good at tasks. Computers already beat us at most math, statistics, searching for information, spacial visualization, information recollection, lossless communication. LLMs just adds to that list, but does nothing new to make the word "intelligent" applicable. Even if we reach the AGI singularity; thinking of them as humans or using human terminology to describe them is a fatal error.

(Destroying earth to make paperclips is arguably the least intelligent thing you could ever do.)

umanwizard 3 hours ago

FWIW you can get quite good at chess with minimal opening prep. (Just not to the very top of the elite.)

im3w1l 5 hours ago

None of these things are enough by itself. It's rather that they have now solved so many things that the sum total has (arguably) crossed the threshold.

As for solving math problems, that is an important part of recursive self improvement. If it can come up with better algorithms and turn them into code, that will translate into raising it's own intelligence.

4 hours ago

[deleted]

[-]

umanwizard 4 hours ago

Despite its title, that section does not contain a definition of intelligence.

krapp 6 hours ago

Why do critics of LLM intelligence need to provide a definition when people who believe LLMs are intelligent only take it on faith, not having such a definition of their own?

[-]

hackyhacky 5 hours ago

> Why do critics of LLM intelligence need to provide a definition when people who believe LLMs are intelligent only take it on faith, not having such a definition of their own?

Because advocates of LLMs don't use their alleged intelligence as a defense; but opponents of LLMs do use their alleged non-intelligence as an attack.

Really, whether or not the machine is "intelligent", by whatever definition, shouldn't matter. What matters is whether it is a useful tool.

[-]

tjr 4 hours ago

This seems reasonable. Much AI research has historically been about building computer systems to do things that otherwise require human intelligence to do. The question of "is the computer actually intelligent" has been more philosophical than practical, and many such practically useful computer systems have been developed, even before LLMs.

On the other hand, one early researcher said something to the effect of, Researchers in physics look at the universe and wonder how it all works. Researchers in biology look at living organisms and wonder how they can be alive. Researchers in artificial intelligence wonder how software can be made to wonder such things.

I feel like we are still way off from having a working solution there.

aDyslecticCrow 4 hours ago

The entire argument is that thinking it's intelligent or a person makes us missuse the tool in dangerous ways. Not to make us feel better; but to not do stupid things with them.

As a tool its useful yes, that is not the issue;

- theyre used as phycologist and life coaches.

- judges of policy and law documents

- writers of life affecting computer systems.

- Judges of job applications.

- Sources of medical advice,

- legal advisors

- And increasingly as a thing to blame when any of above goes awry.

If we think of llms as very good text writing tools, the responsibility to make "intelligent" decisions and more crucially take responsibility for those decisions remains on real people rather than dice.

But if we think of them as intelligent humans, we making a fatal misjudgement.

hnfong 4 hours ago

It's actually very weird to "believe" LLMs are "intelligent".

Pragmatic people see news like "LLMs achieve gold in Math Olympiad" and think "oh wow, it can do maths at that level, cool!" This gets misinterpreted by so called "critics of LLM" who scream "NO THEY ARE JUST STOCHASTIC PARROTS" at every opportunity yet refuse to define what intelligence actually is.

The average person might not get into that kind of specific detail, but they know that LLMs can do some things well but there are tasks they're not good at. What matters is what they can do, not so much whether they're "intelligent" or not. (Of course, if you ask a random person they might say LLMs are pretty smart for some tasks, but that's not the same as making a philosophical claim that they're "intelligent")

Of course there's also the AGI and singularity folks. They're kinda loony too.

ticulatedspline 4 hours ago

- LLMs don't need to be intelligent to take jobs, bash scripts have replaced people.

- Even if CEOs are completely out of touch and the tool can't do the job you can still get laid off in an ill informed attempt to replace you. Then when the company doesn't fall over because the leftover people, desperate to keep covering rent fill the gaps it just looks like efficiency to the top.

- I don't think our tendency anthropomorphize LLMs is really the problem here.

pbw 6 hours ago

LLM's can shorten and maybe tend to if you just say "summarize this" but you can trivially ask them to do more. I asked for a summary of Jenson's post and then offer a reflection, GPT-5 said, "It's similar to the Plato’s Cave analogy: humans see shadows (the input text) and infer deeper reality (context, intent), while LLMs either just recite shadows (shorten) or imagine creatures behind them that aren’t there (hallucinate). The “hallucination” behavior is like adding “ghosts”—false constructs that feel real but aren’t grounded.

That ain't shortening because none of that was in his post.

[-]

pitpatagain 5 hours ago

I can't decide how to read your last sentence.

That reflection seems totally off to me: fluent, and flavored with elements of the article, but also not really what the article is about and a pretty weird/tortured use of the elements of the allegory of the cave, like it doesn't seem anything like Plato's Cave to me. Ironically demonstrates the actual main gist of the article if you ask me.

But maybe you meant that you think that summary is good and not textually similar to that post so demonstrating something more sophisticated than "shortening".

[-]

pbw 5 hours ago

Yes, GPT-5's response above was not shortening because there was nothing in the OP about Plato's Cave. I agree that Plato's cave analogy was confusing here. Here's a better one from GPT-5, which is deeply ironic:

A New Yorker book review often does the opposite of mere shortening. The reviewer:

* Places the book in a broader cultural, historical, or intellectual context.

* Brings in other works—sometimes reviewing two or three books together.

* Builds a thesis that connects them, so the review becomes a commentary on a whole idea-space, not just the book’s pages.

This is exactly the kind of externalized, integrative thinking Jenson says LLMs lack. The New Yorker style uses the book as a jumping-off point for an argument; an LLM “shortening” is more like reading only the blurbs and rephrasing them. In Jenson’s framing, a human summary—like a rich, multi-book New Yorker review—operates on multiple layers: it compresses, but also expands meaning by bringing in outside information and weaving a narrative. The LLM’s output is more like a stripped-down plot synopsis—it can sound polished, but it isn’t about anything beyond what’s already in the text.

[-]

pitpatagain 4 hours ago

Ah ok, you meant the second thing.

I don't think the Plato's Cave analogy is confusing, I think it's completely wrong. It's "not in the article" in the sense that it is literally not conceptually what the article is about and it's also not really what Plato's Cave is about either, just taking superficial bits of it and slotting things into it, making it doubly wrong.

[-]

pbw 14 minutes ago

And you think the comparison to book reviews is equally bad? Both are from GPT-5.

pbw 5 hours ago

Essentially, Jenson's complaint is "When I ask an LLM to 'summarize' it interprets that differently from how I think of the word 'summarize' and I shouldn't have to give it more than a one-word prompt because it should infer what I'm asking for."

Isamu 5 hours ago

You can compare the current state of LLMs to the days of chess machines when they first approached grandmaster level play. The machine approach was very brute force, and there was a lot of work done to improve the sheer amount of look ahead that was required to complete at the grandmaster level.

As opposed to what grandmasters actually did, which was less look ahead and more pattern matching to strengthen the position.

Now LLMs successfully leverage pattern matching, but interestingly it is still a kind of brute force pattern matching, requiring the statistical absorption of all available texts, far more than a human absorbs in a lifetime.

This enables the LLM to interpolate an answer from the structure of the absorbed texts with reasonable statistical relevance. This is still not quite “what humans do” as it still requires brute force statistical analysis of vast amounts of text to achieve pretty good results. For example training on all available Python sources in github and elsewhere (curated to avoid bad examples) yields pretty good results, not how a human would do it, but statistically likely to be pertinent and correct.

stefanv 6 hours ago

What if the problem is not that we overestimate LLMs, but that we overestimate intelligence? Or to express the same idea for a more philosophically inclined audience, what if the real mistake isn’t in overestimating LLMs, but in overestimating intelligence itself by imagining it as something more than a web of patterns learned from past experiences and echoed back into the world?

[-]

justinlivi 4 hours ago

I think AI skeptics have a strong bias to assume that human intelligence fundamentally functions differently from LLMs. They may be correct, but we don't have a strong enough understanding of human cognition to make the claim in as uncertain terms as the skeptical argument is unusually made. The training methods between human learning and machine learning are obviously fundamentally vastly different as are the infrastructure-level mechanics. These elements are likely never going to align, though with time the machine infrastructure may start to increasingly resemble human bio hardware. I bring this up because these known vast differences may account for a significant portion of the differences in expected output from human and machine processing. We don't understand the fundamental conceptual "black box" portions of either form of processing well enough to state definitely what is similar or dissimilar about those hazy areas. Somewhere within that not-well-understood area is what we collectively have vaguely defined "intelligence." But also within that area are all the other aspects that both humans and now machines are quite good at - prediction, fluency, translation. The challenge of lexicon and definition is potentially as difficult a task as is sharpening the focus of our understanding of the hazy black-box portion of both machine processing as well as human processing. Until all those are better defined I don't think we have a good measure for answering the question of machine intelligence either way.

xg15 5 hours ago

I feel this article should be paired with this other one [1] that was on the frontpage a few days ago.

My impression is, there is currently one tendency to "over-anthropomorphize" LLMs and treat them like conscious or even superhuman entities (encouraged by AI tech leaders and AGI/Singularity folks) and another to oversimplify them and view them as literal Markov chains that just got lots of training data.

Maybe those articles could help guarding against both extremes.

[1] https://www.verysane.ai/p/do-we-understand-how-neural-networ...

[-]

mattgreenrocks 5 hours ago

Previously when someone called out the tendency to over-anthropomorphize LLMs, a lot of the answers amounted to, “but I like doing it, therefore we should!”

I’ll be the first to say one should pick their battles. But hearing that over and over from a crowd like this that can be quite pedantic is very telling.

[-]

tempodox an hour ago

This very comment thread demonstrates how utterly hopeless it is trying to educate the believers. It has developed into a full-blown religion by now.

kbaker 5 hours ago

Seems like this is close to the Uncanny Valley effect.

LLM intelligence is in the spot where it is simultaneously genius-level but also just misses the mark a tiny bit, which really sticks out for those who have been around humans their whole lives.

I feel that, just like more modern CGI, this will slowly fade with certain techniques and you just won't notice it when talking to or interacting with AI.

Just like in his post during the whole Matrix discussion.

> "When I asked for examples, it suggested the Matrix and even gave me the “Summary” and “Shortening” text, which I then used here word for word. "

He switches in AI-written text and I bet you were reading along just the same until he pointed it out.

This is our future now I guess.

vcarrico 5 hours ago

I might be mixing the concepts of intelligence and conscience etc, but the human mind is more than language and data; it's also experience. LLMs have all the data and can express anything around that context, but will never experience anything, which is singular for each of us, and it's part of what makes what we call intelligence (?). So they will never replicate the human mind; they can just mimic it.

I heard from Miguel Nicolelis that language is a filter for the human mind, so you can never build a mind from language. I interpreted this like trying to build an orange from its juice.

[-]

hackyhacky 5 hours ago

> LLMs have all the data and can express anything around that context, but will never experience anything,

On the contrary, all their training data is their "experience".

ChrisMarshallNY 6 hours ago

That's a great article.

Scott Jenson is one of my favorite authors.

He's really big on integrating an understanding of basic human nature, into design.

andoando 4 hours ago

This, along with a ton of commentary on LLMs, seems like its written by someone who has no technical understanding of LLMs.

0x457 4 hours ago

> A philosophical exploration of free will and reality disguised as a sci-fi action film about breaking free from systems of control.

How is that a summary? It reads as a one-liner review I would leave on Letterboxed or something I would say, trying to be pretentious and treating the movie as a work of art. It is a work of art, because all movies are art, but that's an awful summary.

[-]

pwdisswordfishz 4 hours ago

Review? Where does it say anything about the quality of the film?

jhanschoo 3 hours ago

I disagree with the author in a big way. '25's LLMs are designed to echo material already out there on the Internet if it exists because we value that more. If I want a summary of The Matrix, I prefer a summary that agrees with the zeitgeist, rather than a novel, unorthodox summary that requires a justification as to its deviation.

In fact, the example provided by the author is a great illustration of this:

> A philosophical exploration of free will and reality disguised as a sci-fi action film about breaking free from systems of control.

The words here refer back to the notions "free will" that is prominent in Western discourse from St. Augustine through Descartes and thereafter and similarly of "sci-fi". These are notions an uneducated East-Asian with limited Internet use and pop culture fluency will simply not understand. They would in fact prefer the latter description. The author and this hypothetical East-Asian live in very different zeitgeists, and correspondingly experience the movie differently and value different summaries of the film. They each prefer a summary that agrees with the zeitgeist, rather than a novel, unorthodox summary (relative to their zeitgeist) that requires a justification as to its deviation.

On the other hand, if you asked LLMs to explain material and concepts, one modality in which it does is use formulaic, novel, and unorthodox analogies to explain it to you. By formulaic and novel, I mean that the objects and scenarios in the analogies are frequently of a certain signature kind that it has been trained with, but it is novel in that the analogies are not found in the wild on the internet.

If you have frequently used LLMs for the purpose of explaining concepts, you will have encountered these analogies and know what I mean by this. The analogies are frequently not too accurate, but they round out the response by giving an ELI5-style answer.

Ironically, the author may have succumbed to LLM sycophancy.

foobarian 5 hours ago

The LLMs are like a Huffman codec except the context is infinite and lossy

AndrewKemendo 5 hours ago

Who are you going to lodge your complaint to that the set of systems and machines that just took your job isn’t “intelligent?”

Humans seem to get wrapped around these concepts like intelligence consciousness etc. because they seem to be the only thing differentiating us from every other animal when in fact it’s all a mirage.

codeulike 6 hours ago

Well I, for one, can't beleive what that guy did to poor Timmy

[-]

beezle 5 hours ago

When I saw the post title I immediately thought of Timmy from South Park lol

snozolli 6 hours ago

Regarding Timmy, the Companion Cube from the game Portal is the greatest example of induced anthropomorphism that I've ever experienced. If you know, you know, and if you don't, you should really play the game, since it's brilliant.

[-]

andrewla 4 hours ago

It is a brilliant game and the empathy you develop for the cube is a great concept.

But arguably much deeper is the fact that nothing in this game, or any single-player game, is a living thing in any form. Arguably the game's characterization of GLaDOS hits even harder on the anthropomorphism angle.

generationP 6 hours ago

The cube doesn't work, or at least it didn't for me. The goggly eyes really do make a difference.

aidenn0 3 hours ago

Floyd in Planetfall was that for those of us that are older.

ChrisMarshallNY 6 hours ago

I'm on a Mac, and would love to see Portal 2 (at least) ported to M-chips.

I would love Portal 3, even more.

bitwize 5 hours ago

That's a matter of informed anthropomorphism. A lot of people don't become attached to the Companion Cube, but are informed that their player character is so attached.

tovej 6 hours ago

Good article, it's been told before but it bears repeating.

Also I got caught on this one kind of irrelevant point regarding the characterization of the Matrix: I would say Matrix is not just diguised as a story about escaping systems of control, it's quite clearly about oppressive systems in society, with specific reference to gender expression. Lilly Wachowski has explicitly stated that it was supposed to be an allegory for gender transition.

[-]

xg15 5 hours ago

It wasn't. Switch was intended to be genderfluid, but the Matrix itself, or "logging out" of it was apparently not meant as an allegory for transitioning (though she doesn't mind the interpretation) :

https://www.them.us/story/lilly-wachowski-work-in-progress-s...

kayodelycaon 6 hours ago

The character Switch was supposed to have a different gender in the matrix vs real life. It’s really a shame that didn’t happen.

altruios 6 hours ago

There is nothing more free than the freedom to be who you really are.

Going to rewatch the Matrix tonight.

naikrovek 6 hours ago

I've mentioned this to colleagues at work before.

LLMs give a very strong appearance of intelligence, because humans are super receptive to information provided via our native language. We often have to deal with imperfect speakers and writers, and we must infer context and missing information on our own. We do this so well that we don't know we're doing it. LLMs have perfect grammar and we subtly feel that they are extremely smart because subconsciously we recognize that we don't have to think about anything that's said, it is all syntactically perfect.

So, LLMs sort of trick us into masking their true limitations and believing that they are truly thinking; there are even models that call themselves thinking models, but they don't think, they just predict what the user is going to complain about and say that to themselves as an additional, dynamic prompt on top of the one you actually enter.

LLMs are very good at fooling us into the idea that they know anything at all; they don't. And humans are very bad at being discriminate about the source of the information presented to them if it is presented in a friendly way. The combination of those things is what has resulted in the insanely huge AI hype cycle that we are currently living in the middle of. Nearly everyone is overreacting to what LLMs actually are, and the few of us that believe that we sort of see what's actually happening are ignored for being naysayers, buzz-kills, and luddites. Shunned for not drinking the Kool-Aid.

[-]

hnfong 4 hours ago

You're ignored not because you're right or wrong, but because your unsolicited advice is not useful.

For example, I can spin up any LLM and get it to translate some English text into Japanese with maybe 99% accuracy. I don't need to believe whether it "really knows" English or Japanese, I only need to believe the output is accurate.

Similarly I can ask a LLM to code up a function that does a specific thing, and it will do it with high accuracy. Maybe there'll be some bugs, but I can review the code and fix them, which in some cases boosts my productivity. I don't need to believe whether it "really knows" C++ or Rust, I only need it to write something good enough.

I mean, just by these two examples, LLMs are really great tools, and I'm personally hyped for these use cases alone. Am I fooled by the LLM? I don't think so, I don't have any fantasy about it being extremely intelligent or always being right. I doubt most reasonable people these days would either.

So basically you're going about assuming people are fooled by LLMs (which they might not be), and wondering why you're unpopular when you're basically telling everyone they're gullible and foolish.

nataliste 5 hours ago

The author's argument is built on fallacies that always pop up in these kinds of critiques.

The "summary vs shortening" distinction is moving the goalposts. They makes the empirical claim that LLMs fail at summarizing novel PDFs without any actual evidence. For a model trained on a huge chunk of the internet, the line between "reworking existing text" and "drawing on external context" is so blurry it's practically meaningless.

Similarly, can we please retire the ELIZA and Deep Blue analogies? Comparing a modern transformer to a 1960s if-then script or a brute-force chess engine is a category error. It's a rhetorical trick to make LLMs seem less novel than they actually are.

And blaming everything on anthropomorphism is an easy out. It lets you dismiss the model's genuinely surprising capabilities by framing it as a simple flaw in human psychology. The interesting question isn't that we anthropomorphize, but why this specific technology is so effective at triggering that response from humans.

The whole piece basically boils down to: "If we define intelligence in a way that is exclusively social and human, then this non-social, non-human thing isn't intelligent." It's a circular argument.