Do Large Language Models learn world models or just surface statistics? (2023)

(thegradient.pub)

37 points | by fragmede 6 hours ago ago

56 comments

jebarker 3 hours ago

> Do they merely memorize training data and reread it out loud, or are they picking up the rules of English grammar and the syntax of C language?

This is a false dichotomy. Functionally the reality is in the middle. They "memorize" training data in the sense that the loss curve is fit to these points but at test time they are asked to interpolate (and extrapolate) to new points. How well they generalize depends on how well an interpolation between training points works. If it reliably works then you could say that interpolation is a good approximation of some grammar rule, say. It's all about the data.

[-]

mjburgess 3 hours ago

This only applies to intra-distribution "generalisation", which is not the meaning of the term we've come to associate with science. Here generalisation means across all environments (ie., something generalises if its valid and reliable where valid = measures property, and reliable = under causal permutation to the environment).

Since an LLM does not change in response to the change in meaning of terms (eg., consider the change to "the war in ukraine" over the last 10 years) -- it isn't reliable in the scientific sense. Explaining why it isnt valid would take much longer, but its not valid either.

In any case: the notion of 'generalisation' used in ML just means we assume there is a single stationary distribution of words, and we want to randomly sample from that distribution without bias to oversampling from points identical to the data.

Not least that this assumption is false (there is no stationary distribution), it is also irrelevant to generalisation in traditional sense. Since whether we are biased towards the data or not isn't what we're interested in. We want output to be valid (the system to use words to mean what they mean) and to be reliable (to do so across all environments in which they mean something).

This does not follow from, nor is it even related to, this ML sense of generalisation. Indeed, if LLMs generalised in this sense, they would be very bad at usefully generalising -- since the assumptions here are false.

[-]

jebarker 2 hours ago

I don't really follow what you're saying here. I understand that the use of language in the real-world world is not sampled from a stationary distribution, but it also seems plausible that you could relax that assumption in an LLM, e.g. conditioning the distribution on time, and then intra-distribution generalization would still make sense to study how well the LLM works for held-out test samples.

Intra-distribution generalization seems like the only rigorously defined kind of generalization we have. Can you provide any references that describe this other kind of generalization? I'd love to learn more.

[-]

ericjang 2 hours ago

intra-distribution generalization is also not well posed in practical real world settings. suppose you learn a mapping f : x -> y. casually, intra-distribution generalization implies that f generalizes for "points from the same data distribution p(x)". Two issues here:

1. In practical scenarios, how do you know if x' is really drawn from p(x)? Even if you could compute log p(x') under the true data distribution, you can only verify that the support for x' is non-zero. one sample is not enough to tell you if x' drawn from p(x).

2. In high dimensional settings, x' that is not exactly equal to an example within the training set can have arbitrarily high generalization error. here's a criminally under-cited paper discussing this: https://arxiv.org/abs/1801.02774

burnt-resistor 4 hours ago

I think they learn how to become salespeople, politicians, lawyers, and résumé consultants with fanciful language lacking in facts, truth, and honesty.

[-]

01HNNWZ0MV43FF 3 hours ago

If we can put salespeople out of work it will be a great boon to humankind

[-]

natpalmer1776 3 hours ago

I suddenly have a vision of an AI driven sales pipeline that uses millions of invasive datapoints about you to create the most convincing sales pitch mathematically possible.

vonneumannstan 4 hours ago

[flagged]

pvg 5 hours ago

Big thread at the time https://news.ycombinator.com/item?id=34474043

[-]

randcraw 3 hours ago

Thanks. Now, after almost two years of incomparably explosive growth in LLMs since that paper, it's remarkable to realize that we still don't know if Scarecrow has a brain. Or if he'll forever remain just a song and dance man.

javaunsafe2019 4 hours ago

Idk from when even id this article? Got me LLMs currently are broke and the majority is already aware of this.

Copilot fails the cleanly refactor complex Java methods in a way that I’m better of writing that stuff by my own as I have to understand it anyways.

And the news that they don’t scale as predicted is too bad compared to how weak they currently perform…

[-]

lxgr 4 hours ago

Why does an LLM have to be better than you to be useful to you?

Personally, I use them for the things they can do, and for the things they can't, I just don't, exactly as I would for any other tool.

People assuming they can do more than they are actually capable of is a problem (compounded by our tendency to attribute intelligence to entities with eloquent language, which might be more of a surface level thing than we used to believe), but that's literally been one for as long as we had proverbial hammers and nails.

[-]

lou1306 3 hours ago

> Why does an LLM have to be better than you to be useful to you?

((time to craft the prompt) + (time required to fix LLM output)) ~ (time to achieve the task on my own)

it's not hard to see that working on my own is a very attractive proposition. It drives down complexity, does not require me to acquire new skills (i.e., prompt engineering), does not require me to provide data to a third party nor to set up an expensive rig to run a model locally, etc.

[-]

lxgr 3 hours ago

Then they might indeed not be the right tool for what you're trying to do.

I'm just a little bit tired of sweeping generalizations like "LLMs are completely broken". You can easily use them as a tool part of a process that then ends up being broken (because it's the wrong tool!), yet that doesn't disqualify them for all tool use.

vonneumannstan 4 hours ago

If you can't find a use for the best LLMs it is 100% a skill issue. IF the only way you can think to use them is re-factoring complex java codebases you're ngmi.

[-]

lxgr 3 hours ago

So far I haven't found one that does my dishes and laundry. I really wish I knew how to properly use them.

My point being: Why would anyone have to find a use for a new tool? Why wouldn't "it doesn't help me with what I'm trying to do" be an acceptable answer in many cases?

[-]

Workaccount2 3 hours ago

I have found more often than not that people in the "LLMs are useless" camp are actually in the "I need LLMs to be useless" camp.

[-]

mdp2021 3 hours ago

Do not forget the very linear reality of those people that shout "The car does not work!" in frustration because they would gladly use a car.

exe34 3 hours ago

nice example of poisoning the well!

not2b 2 hours ago

They are learning a grammar, finding structure in the text. In the case of Othello, the rules for what moves are valid are quite simple, and can be represented in a very small model. The slogan is "a minute to learn, a lifetime to master". So "what is a legal move" is a much simpler problem than "what is a winning strategy".

It's similar to asking a model to only produce outputs corresponding to a regular expression, given a very large number of inputs that match that regular expression. The RE is the most compact representation that matches them all and it can figure this out.

But we aren't building a "world model", we're building a model of the training data. In artificial problems with simple rules, the model might be essentially perfect, never producing an invalid Othello move, because the problem is so limited.

I'd be cautious about generalizing from this work to a more open-ended situation.

[-]

og_kalu 2 hours ago

I don't think the point is that Othello-GPT has somehow modellled the real world training on only games but that tasking it to predict the next move forces it to model its data in a deep way. There's nothing special about Othello games vs internet text except that the latter will force it to model much more things.

dboreham 4 hours ago

It turns out our word for "surface statistics" is "world model".

[-]

mdp2021 3 hours ago

World model based interfaces have an internal representation and when asked, describe its details.

Surface statistics based interfaces have an internal database of what is expected, and when asked, they give a conformist output.

[-]

naasking 3 hours ago

The point is that "internal database of statistical correlations" is a world model of sorts. We all have an internal representation of the world featuring only probabilistic accuracy after all. I don't think the distinction is as clear as you want it to be.

[-]

mdp2021 2 hours ago

> "internal database of statistical correlations" [would be] a world model of sorts

Not in the sense used in the article: «memorizing “surface statistics”, i.e., a long list of correlations that do not reflect a causal model of the process generating the sequence».

A very basic example: when asked "two plus two", would the interface reply "four" because it memorized a correlation of the two ideas, or because it counted at some point (many points in its development) and in that way assessed reality? That is a dramatic difference.

exe34 2 hours ago

> and when asked, describe its details.

so humans don't typically have world models then. you ask most people how they arrived at their conclusions (outside of very technical fields) and they will confabulate just like an LLM.

the best example is phenomenology, where people will grant themselves skills that they don't have, to reach conclusions. see also heterophenomenology, aimed at working around that: https://en.wikipedia.org/wiki/Heterophenomenology

[-]

mdp2021 2 hours ago

That the descriptive is not the prescriptive should not be a surprise.

That random people will largely have suboptimal skills should not be a surprise.

Yes, many people can't think properly. Proper thinking remains there as a potential.

[-]

exe34 an hour ago

> Yes, many people can't think properly. Proper thinking remains there as a potential.

that's a matter of faith, not evidence. by that reasoning, the same can be said about LLMs. after all, they do occasionally get it right.

marcosdumay 3 hours ago

Well, for some sufficiently platonic definition of "world".

[-]

mdp2021 3 hours ago

In a way the opposite, I'd say: the archetypes in Plato are the most stable reality and are akin to the logos that the past and future tradition hunted - knowing it is to know how things are (how things work), hence knowledge of the state of things, hence a faithful world model.

To utter conformist statements spawned from surface statistics would be "doxa" - repeating "opinions".

[-]

marcosdumay an hour ago

It has a profound and extensive knowledge about something. But that "something" is how words follow each other on popular media.

LLMs are very firmly stuck inside the Cave Allegory.

maximus93 2 hours ago

Honestly, I think it’s somewhere in between. LLMs are great at spotting patterns in data and using that to make predictions, so you could say they build a sort of "world model" for the data they see. But it’s not the same as truly understanding or reasoning about the world, it’s more like theyre really good at connecting the dots we give them.

They dont do science or causality theyre just working with the shadows on the wall, not the actual objects casting them. So yeah, they’re impressive, but let’s not overhype what they’re doing. It’s pattern matching at scale, not magic. Correct me if I am wrong.

foobarqux 4 hours ago

Lots of problems with this paper including the fact that, even if you accept their claim that internal board state is equivalent to world model, they don't appear to do the obvious thing which is display the reconstructed "internal" board state. More fundamentally though, reifying the internal board as a "world model" is absurd: otherwise a (trivial) autoencoder would also be building a "world model".

[-]

sebzim4500 4 hours ago

>More fundamentally though, reifying the internal board as a "world model" is absurd: otherwise a (trivial) autoencoder would also be building a "world model".

The point is that they aren't directly training the model to output the grid state, like you would an autoencoder. It's trained to predict the next action and learning the state of the 'world' happens incidentally.

It's like how LLMs learn to build world models without directly being trained to do so, just in order to predict the next token.

[-]

foobarqux 4 hours ago

By the same reasoning if you train a neural net to output next action from the output of the autoencoder then the whole system also has a "world model", but if you accept that definition of "world model" then it is extremely weak and not the intelligence-like capability that is being implied.

And as I said in my original comment they are probably not even able to extract the board state very well, otherwise they would depict some kind of direct representation of the state, not all of the other figures of board move causality etc.

Note also that the board state is not directly encoded in the neural network: they train another neural network to find weights to approximate the board state if given the internal weights of the Othello network. It's a bit of fishing for the answer you want.

optimalsolver 4 hours ago

>It's like how LLMs learn to build world models without directly being trained to do so, just in order to predict the next token

That's the whole point under contention, but you're stating it as fact.

IanCal 3 hours ago

> hey don't appear to do the obvious thing which is display the reconstructed "internal" board state.

I've very confused by this, because they do. Then they manipulate the internal board state and see what move it makes. That's the entire point of the paper. Figure 4 is literally displaying the reconstructed board state.

[-]

foobarqux 2 hours ago

I replied to a similar comment elsewhere: They aren't comparing the reconstructed board state with the actual board state which is the obvious thing to do.

og_kalu 2 hours ago

>they don't appear to do the obvious thing which is display the reconstructed "internal" board state

This is literally figure 4

This also re-constructs the board state of a chess-playing LLM

https://adamkarvonen.github.io/machine_learning/2024/01/03/c...

[-]

foobarqux 2 hours ago

Unless I'm misunderstanding something they are not comparing the reconstructed board state to the actual state which is the straightforward thing you would show. Instead they are manipulating the internal state to show that it yields a different next-action, which is a bizarre, indirect way to show what could be shown in the obvious direct way.

[-]

og_kalu 2 hours ago

Figure 4 is showing both things. Yes, there is manipulation of the state but they also clearly show what the predicted board state is before any manipulations (alongside the actual board state)

[-]

foobarqux an hour ago

The point is not to show only a single example it is to show how well the recovered internal state reflects the actual state in general —— analyze the performance (this is particularly tricky due to the discrete nature of board positions). That’s ignoring all the other more serious issues I raised.

I haven’t read the paper in some time so it’s possible I’m forgetting something but I don’t think so.

[-]

og_kalu an hour ago

>That’s ignoring all the other more serious issues I raised.

The only other issue you raised doesn't make any sense. A world model is a representation/model of your environment you use for predictions. Yes, an auto-encoder learns to model that data to some degree. To what degree is not well known. If we found out that it learned things like 'city x in country a is approximately distance b from city y' let's just learn where y is and unpack everything else when the need arises then that would certainly qualify as a world model.

[-]

foobarqux 44 minutes ago

Linear regression also learns to model data to some degree. Using the term “world model” that expansively is intentionally misleading.

Besides that and the big red flag of not directly analyzing the performance of the predicted board state I also said training a neural network to return a specific result is fishy, but that is a more minor point than the other two.

[-]

og_kalu 24 minutes ago

The degree matters. If we find auto encoders learning surprisingly deep models then i have no problems saying they have a world model. It's not the gotcha you think it is.

>the big red flag of not directly analyzing the performance of the predicted board state I also said training a neural network to return a specific result is fishy

The idea that probes are some red flag is ridiculous. There are some things to take into account but statistics is not magic. There's nothing fishy about training probes to inspect a models internals. If the internals don't represent the state of the board then the probe won't be able to learn to reconstruct the state of the board. The probe only has access to internals. You can't squeeze blood out of a rock.

tyronehed 2 hours ago

[dead]

mjburgess 5 hours ago

This is irrelevant, and it's very frustrating that computer scientists think it is relevant.

If you give a universal function approximator the task of approximating an abstract function, you will get an approximation.

Eg.,

    def circle(radius): ... return points()
    aprox_cricle = neuralnetwork(sample(circle()))
    
    if is_model_of(samples(aprox_circle), circle)): print("OF COURSE!")

This is irrelevant: games, rules, shapes, etc. are all abstract. So any model of samples of these is a model of them.

The "world model" in question is a model of the world. Here "data" is not computer science data, ie., numbers its measurements of the world, ie., the state of a measuring device causally induced by the target of measurement.

Here there is no "world" in the data, you have to make strong causal assumptions about what properties of the target cause the measures. This is not in the data. There is no "world model" in measurement data. Hence the entirety of experimental science.

No result based on one mathematical function succeeding in approximating another is relevant whether measurement data "contains" a theory of the world which generates it: it does not. And of course if your data is abstract, and hence constitutes the target of modelling (only applies to pure math), then there is no gap -- a model of "measures" (ie., the points on a circle) is the target.

No model of actual measurement data, ie., no model in the whole family we call "machine learning", is a model of its generating process. It contains no "world model".

Photographs of the night sky are compatible with all theories of the solar system in human history (including, eg., stars are angels). There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.

The sense in which any model of measurement data is "surface statistics" is the same. Consider plato's cave: pots, swords, etc. on the outside project shadows inside. Modelling the measurement data is taking cardboard and cutting it out so it matches the shadows. Modelling the world means creating clay pots to match the ones passing by.

The latter is science: you build models of the world and compare them to data, using the data to decide between them.

The former is engineering (, pseudoscience): you take models of measures and reply these models to "predict" the next shadow.

If you claim the latter is just a "surface shortcut" you're an engineer. If you claim its a world model you're a pseudoscientist.

[-]

naasking 2 hours ago

> Here there is no "world" in the data, you have to make strong causal assumptions about what properties of the target cause the measures. This is not in the data. There is no "world model" in measurement data.

That's wrong. Whatever your measuring device, it is fundamentally a projection of some underlying reality, eg. a function m in m(r(x)) mapping real values to real values, where r is the function governing reality.

As you've acknowledged that neural networks can learn functions, the neural network here is learning m(r(x)). Clearly the world is in the model here, and if m is invertible, then we can directly extract r.

Yes, the domain of x and range of m(r(x)) is limited, so the inference will be limited for any given dataset, but it's wrong to say the world is not there at all.

HuShifang 3 hours ago

I think this is a great explanation.

The "Ladder of Causation" proposed by Judea Pearl covers similar ground - "Rung 1” reasoning is the purely predictive work of ML models, "Rung 2" is the interactive optimization of reinforcement learning, and "Rung 3" is the counterfactual and casual reasoning / DGP construction and work of science. LLMs can parrot Rung 3 understanding from ingested texts but it can't generate it.

bubblyworld 3 hours ago

> There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.

You're stating this as fact but it seems to be the very hypothesis the authors (and related papers) are exploring. To my mind, the OthelloGPT papers are plainly evidence against what you've written - summarising patterns in the sky really does seem to give you information about the world above and beyond the patterns themselves.

(to a scientist this is obvious, no? the precession of mercury, a pattern observable in these photographs, was famously not compatible with known theories until fairly recently)

> Modelling the measurement data is taking cardboard and cutting it out so it matches the shadows. Modelling the world means creating clay pots to match the ones passing by.

I think these are matters of degree. The former is simply a worse model than the latter of the "reality" in this case. Note that our human impressions of what a pot "is" are shadows too, on a higher-dimensional stage, and from a deeper viewpoint any pot we build to "match" reality will likely be just as flawed. Turtles all the way down.

[-]

mjburgess 3 hours ago

Well it doesnt, seem my other comment below.

It is exactly this non-sequitur which I'm pointing out.

Approximating an abstract discrete function (a game), with a function approximator has literally nothing to do with whether you can infer the causal properties of the data generating process from measurement data.

To equate the two is just rank pseudoscience. The world is not made of measurements. Summaries of measurement data aren't properties in the world, they're just the state of the measuring device.

If you sample all game states from a game, you define the game. This is the nature of abstract mathematical objects, they are defined by their "data".

Actual physical objects are not defined by how we measure them: the solar system isnt made of photographs. This is astrology: to attribute to the patterns of light hitting the eye some actual physical property in the universe which corresponds to those patterns. No such exists.

It is impossible, and always has been, to treat patterns in measurements as properties of objects. This is maybe one of the most prominent characteristics of psedusocience.

[-]

bubblyworld 2 hours ago

The point is that approximating a distribution causally downstream of the game (text-based descriptions, in this case) produces a predictive model of the underlying game mechanics itself. That is fascinating!

Yes, the one is formally derivable from the other, but the reduction costs compute, and to a fixed epsilon of accuracy this is the situation with everything we interact with on the day to day.

The idea that you can learn underlying mechanics from observation and refutation is central to formal models of inductive reasoning like Solomonoff induction (and idealised reaoners like AIXI, if you want the AI spin). At best this is well established scientific method, at worst a pretty decent epistemology.

Talking about sampling all of the game states is irrelevant here; that wouldn't be possible even in principle for many games and in this case they certainly didn't train the LLM on every possible Othello position.

> This is astrology: to attribute to the patterns of light hitting the eye some actual physical property in the universe which corresponds to those patterns. No such exists.

Of course not - but they are highly correlated in functional human beings. What do you think our perception of the world grounds out in, if not something like the discrepancies between (our brain's) observed data and it's predictions? There's even evidence in neuroscience that this is literally what certain neuronal circuits in the cortex are doing (the hypothesis being that so-called "predictive processing" is more energy efficient than alternative architectures).

Patterns in measurements absolutely reflect properties of the objects being measured, for the simple reason that the measurements are causally linked to the object itself in controlled ways. To think otherwise is frankly insane - this is why we call them measurements, and not noise.

sebzim4500 4 hours ago

I don't understand your objection at all.

In the example, the 'world' is the grid state. Obviously that's much simpler than the real world but the point is to show that even when the model is not directly trained to input/output this world state it is still learned as a side effect of prediction the next token.

[-]

mjburgess 4 hours ago

There is no world. The grid state is not a world, there is no causal relationship between the grid state and the board. No one in this debate denies that NNs approximate functions. Since a game is just a discrete function, no one denies an NN can approximate it. Showing this is entirely irrelevant and shows a profound misunderstanding of what's at issue.

The whole debate is about whether surface patterns in measurement data can be reversed by NNs to describe their generating process, ie., the world. If the "data" isnt actual measurements of the world, no one arguing about it.

If there is no gap between the generating algorithm and the samples, eg., between a "circle" and "the points on a circle" -- then there is no "world model" to learn. The world is the data. To learn "the points on a cirlce" is to learn the cirlce.

By taking cases where "the world" and "the data" are the same object (in the limit of all samples), you're just showing that NNs model data. That's already obvious, no ones arguing about it.

That a NN can approximate a discrete function does not mean it can do science.

The whole issue is that the cause of pixel distributions is not in those distributions. A model of pixel patterns is just a model of pixel patterns, not of the objects which cause those patterns. A TV is not made out of pixels.

The "debate" insofar as there is one, is just some researchers being profoundly confused about what measurement data is: measurements are not their targets, and so no model of data is a model of the target. A model of data is just "surface statistics" in the sense that these statistics describe measurements, not what caused them.

pkoird 3 hours ago

> Photographs of the night sky are compatible with all theories of the solar system in human history (including, eg., stars are angels). There is no summary of these photographs which gives information about the world over and above just summarising patterns in the night sky.

This is blatantly incorrect. Keep in mind that much of modern physics has been invented via observation. Kepler's law and ultimately the law of Gravitation and General Relativity came from these "photographs" of the night sky.

If you are talking about the fact that these theories only ever summarize what we see and maybe there's something else behind the scenes that's going on, then this becomes a different discussion.

5 hours ago

[deleted]