Do LLMs Break the Sapir-Whorf Hypothesis?

(dnhkng.github.io)

14 points | by dot_treo 2 days ago ago

7 comments

Beautiful article. One thing to note though is that humans are often trained in exactly one language (with additional ones, if any, mostly acquired later). This might be relevant for the decoupling between meaning and language- if meaning is expressed in a single language, the two could end up more strongly coupled than in an LLM that has been trained across tens of languages at the same time and was under a stronger pressure to abstract its representations.

If that were the case, it's funny that the argument "it's just a language processor manipulating words" might apply more strongly to humans than to LLMs.

dot_treo 2 days ago

The linguistic argument is fascinating.

One particular thing, unrelated to the linguistic argument itself, stood out to me. In the PCA visualisation, we can see that some sequences of layers have particularly tight and stationary clusters. Incidentally, those are also exactly the layers that the previous RYS post identified as most useful to repeat to improve perfomance on the probes.

I wonder, if that correlation could be used to identify good candidates for repeating layers.

[-]

PaulHoule 2 days ago

"This isn’t just correlation. It’s a complete structural reorganisation of the representation space."

[-]

dot_treo 2 days ago

I don't care too much about the article being written with LLM support. There is actual work being done that is being showcased here. I'd rather read an LLM version of it, rather that not learning about those things at all.

[-]

dnhkng 13 hours ago

Yes, dammit.

Author here.

I drafted it before I left for holiday, at it's not ready to publish.

It wasn't supposed to be officially posted yet, but I ran out of time before my flight.

My apologies!

widdershins a day ago

This is fantastic. I don't know if it tells us anything about the human brain, but it's so cool to be able to do emperical science on large neural networks like this.

vibe42 2 days ago

One thing to benchmark is if LLMs are better at solving complex problems if they're described in one language vs others.

There's SWE-bench Multilingual for example, but translating a problem into multiple natural languages before passing it to the LLM has not been benchmarked afaik.

If there's some residual of the natural language left when the middle layers execute, that would in part validate Sapir-Whorf.