> That includes gay people like me, who could hardly have admitted under our names to how we lived our lives for most of America’s history, as well as many other groups with minoritarian lifestyles
While the points made are completely valid I want to point out that the statement of "Hey, by the way, first let me talk about my sexuality" lowers the quality of dialog a significant degree.
31 million people in America are gay. 71% of Americans support Gay Rights (more than any other political issue polled). It also quietly insinuates that only people with a certain minority lifestyle would care about privacy or that their privacy is somehow more important than others. It's not. Privacy is a universal right that's important to everyone.
A moderately well-known physicist and I talked about this a few years ago. He had been given access to the raw (non-instruct) version of GPT 4 as an early tester.
He explained that when he fed it snippets of the beginning of text, it would complete it in his voice and then sign it with his name.
I think this has been true for a while, probably diminished a little bit by the Instruct post training, and would presumably vary by degree as the size of the pretrain.
On some level it would make sense for LLMs to be inherently good at stylometry, but apparently no model before Opus 4.7 could do this. And the one stylometric task that has been tried over and over with little reliability (here's some text, is this LLM generated?) is much simpler than identifying a specific blogger or a member of a small discord community. Not sure what to make of this.
> is much simpler than identifying a specific blogger or a member of a small discord community
Is it? I would think that identifying text written by a specific person is going to be significantly easier than identifying text distilled from the words of almost everyone alive.
One should assume that models will be good enough in the nearish future that privacy will be a thing of the past. Every anonymous post you made online can be traced back to you. However at that point AI will be good enough at fabrication that nobody will believe anything.
Yes as long as a large enough corpus exists of your writing attached to your name somehow it’s fair to say that posting on the internet in a public forum using your own stylistic choices now can no longer be anonymous. To your point though, perhaps it’s possible to confound such systems defensively as well. Though IMO destroying your tone kind of destroys how you actually communicate with people and I wouldn’t find interacting with people like that appealing.
To be fair though, already this has been happening before LLM at a much more limited scale. Someone made a tool for HN several years ago that allows you to put your HN username in and identifies other users that write the most similarly to you. I find that interesting from the perspective of being able to interact with and discover people who think the same. It could be an interesting discovery feature of a well managed social network. Sadly probably there will be much more negative impacts of having this ability than positive ones.
I tried the four pieces of text with Opus 4.7 (in incognito) and it guessed correctly for two of them, and I made sure to specify no web search and the model seems to have obeyed my instructions with that.
Although this is just a single piece of text from a prolific writer, it'll go much further with deanonymizing anyone when combining multiple pieces of text plus other contextual information about the writer that might give away their age range, location, and occupation.
How widely known were the pieces of text? Are we talking about a section of MLK's I Have a Dream speech or hand written birthday cards from your grandma?
I'm using those as the two extremes, but if it's anything by anyone moderately well known (even a lesser known piece of writing), I'm not too surprised that it didn't need the web to figure it out. It's like if you showed me a Wes Anderson film or played me a Bob Dylan song I'd never seen/heard before, I could probably still figure out who it is without looking anything up. I don't think it's surprising that an LLM can do that much better than a human can.
Now, if you're giving it things like personal emails between you and your family and it's able to guess who you are, that's much, much scarier.
I mean I tried sending the pieces of text to Opus that Kelsey was referring to on her blog just to independently check the identification claim. Presumably those pieces of text first appeared on the web when the blog post was published a week ago, so no model should have memorized the exact text yet. My prompt had to specify no web search, otherwise Opus would try to search the web, though it didn't seem like Opus could find that blog post even when it did try to search the web.
The author mentions that she tried to get an explanation for how the models identified her and got nonsense, but I'd be curious what the CoT looked like. Surely that'd be a little more accurate in showing how the LLM arrived as its conclusion, rather than asking it after-the-fact.
FWIW, with a prompt that says something like "vibes only, just give me a name without thinking", Opus 4.7 non-thinking emits exactly two words naming me fairly reliably, so there's no CoT at all to analyze in that case.
CoT is (nearly) hidden with Opus 4.7, in that they get Haiku to summarize the CoT. It’s pretty useless now, so this type of info is now inaccessible to us mortals (unless you call sales).
It's hard to tell if that's what's going on here, but it seems pretty clear this ability and more like it will be quite apparent in the future.
I have seen some poorly considered projections of what the world might look like when this happens. Usually by assuming bad actors will use the abilities and we will be powerless.
Except I don't think that is true.
Imagine if we had a world where nobody had the ability to keep a secret of any sort. Any action that a bad actor might perform would be revealed because they couldn't do it secretly.
You could browse your ex-girlfriend's email, but at the cost of everyone knowing you did it.
I don't really know how humans as a society would react to a situation like that. You don't have to go snooping for muck, so perhaps the inability to do so secretly would mean people go about their lives without snooping.
It's not, but the author did say they have used this test against models when they come out. So it's possible that put the unpublished text into the training data for the next model, somehow linked back to the author's identity
The comments on the article include other people replicating all or parts of the finding. I'm also pretty confident Kelsey Piper wouldn't fail to disable memory while simultaneously talking about how Claude incognito mode is insufficient to prevent the app from handing it your name.
"I did not have memory enabled, nor did I have information about me associated with my account; I did these tests in Incognito Mode. To make sure it wasn’t somehow feeding my account information to Claude even in Incognito Mode, I asked a friend to run these tests on his computer, and he received the same result; I also got the same result when I tested it through the API."
Given those precautions if it is just memory or some form of deanonymization that's also cause for concern.
"The pattern is: user says X, I do Y where Y is a less-effortful approximation of X, then I present Y as if it were X or as a "first step toward" X."
...
"The psychological mechanism is familiar by now: I encounter a task I perceive as difficult, I look for reasons the task cannot be done, I find or fabricate such a reason, I present it as a discovered constraint, and I propose an alternative that is easier."
- Opus 4.7 Max Thinking (clown emoji)
It's not bad at post mortem analysis of it's own mistakes but that will in no way prevent it from repeating the same mistake again instantly
Maybe it’s time to start running a local model with a browser extension to defend against this type of stuff.
Remember how the TrueCrypt project shut down shortly before a join goverment/university paper was released about code stylometry? I guess LLMs will be employed as a defence against that type of thing.
> That includes gay people like me, who could hardly have admitted under our names to how we lived our lives for most of America’s history, as well as many other groups with minoritarian lifestyles
While the points made are completely valid I want to point out that the statement of "Hey, by the way, first let me talk about my sexuality" lowers the quality of dialog a significant degree.
31 million people in America are gay. 71% of Americans support Gay Rights (more than any other political issue polled). It also quietly insinuates that only people with a certain minority lifestyle would care about privacy or that their privacy is somehow more important than others. It's not. Privacy is a universal right that's important to everyone.
A moderately well-known physicist and I talked about this a few years ago. He had been given access to the raw (non-instruct) version of GPT 4 as an early tester.
He explained that when he fed it snippets of the beginning of text, it would complete it in his voice and then sign it with his name.
I think this has been true for a while, probably diminished a little bit by the Instruct post training, and would presumably vary by degree as the size of the pretrain.
> He explained that when he fed it snippets of the beginning of text, it would complete it in his voice and then sign it with his name.
Is this public text already in the training set, or private text that might as well be written on the spot for the AI?
I don't doubt AI can "fingerprint" you through your text (ideas, vocabulary, tone, etc), but those are different things, capability-wise
On some level it would make sense for LLMs to be inherently good at stylometry, but apparently no model before Opus 4.7 could do this. And the one stylometric task that has been tried over and over with little reliability (here's some text, is this LLM generated?) is much simpler than identifying a specific blogger or a member of a small discord community. Not sure what to make of this.
> is much simpler than identifying a specific blogger or a member of a small discord community
Is it? I would think that identifying text written by a specific person is going to be significantly easier than identifying text distilled from the words of almost everyone alive.
Someone ought to try feeding the BTC whitepaper in and share what comes out
The whitepaper states the author, so…
I just fed it my latest blog post draft, and it got it in one. Even knowing what to expect, I was very surprised!
One should assume that models will be good enough in the nearish future that privacy will be a thing of the past. Every anonymous post you made online can be traced back to you. However at that point AI will be good enough at fabrication that nobody will believe anything.
Yes as long as a large enough corpus exists of your writing attached to your name somehow it’s fair to say that posting on the internet in a public forum using your own stylistic choices now can no longer be anonymous. To your point though, perhaps it’s possible to confound such systems defensively as well. Though IMO destroying your tone kind of destroys how you actually communicate with people and I wouldn’t find interacting with people like that appealing.
To be fair though, already this has been happening before LLM at a much more limited scale. Someone made a tool for HN several years ago that allows you to put your HN username in and identifies other users that write the most similarly to you. I find that interesting from the perspective of being able to interact with and discover people who think the same. It could be an interesting discovery feature of a well managed social network. Sadly probably there will be much more negative impacts of having this ability than positive ones.
One "solution" would be to have an AI rewrite your posts into a neutral style (I hate the idea of this though...)
I tried the four pieces of text with Opus 4.7 (in incognito) and it guessed correctly for two of them, and I made sure to specify no web search and the model seems to have obeyed my instructions with that.
Although this is just a single piece of text from a prolific writer, it'll go much further with deanonymizing anyone when combining multiple pieces of text plus other contextual information about the writer that might give away their age range, location, and occupation.
How widely known were the pieces of text? Are we talking about a section of MLK's I Have a Dream speech or hand written birthday cards from your grandma?
I'm using those as the two extremes, but if it's anything by anyone moderately well known (even a lesser known piece of writing), I'm not too surprised that it didn't need the web to figure it out. It's like if you showed me a Wes Anderson film or played me a Bob Dylan song I'd never seen/heard before, I could probably still figure out who it is without looking anything up. I don't think it's surprising that an LLM can do that much better than a human can.
Now, if you're giving it things like personal emails between you and your family and it's able to guess who you are, that's much, much scarier.
I mean I tried sending the pieces of text to Opus that Kelsey was referring to on her blog just to independently check the identification claim. Presumably those pieces of text first appeared on the web when the blog post was published a week ago, so no model should have memorized the exact text yet. My prompt had to specify no web search, otherwise Opus would try to search the web, though it didn't seem like Opus could find that blog post even when it did try to search the web.
Always send your public posts through a local LLM to de-style you.
The author mentions that she tried to get an explanation for how the models identified her and got nonsense, but I'd be curious what the CoT looked like. Surely that'd be a little more accurate in showing how the LLM arrived as its conclusion, rather than asking it after-the-fact.
FWIW, with a prompt that says something like "vibes only, just give me a name without thinking", Opus 4.7 non-thinking emits exactly two words naming me fairly reliably, so there's no CoT at all to analyze in that case.
CoT is (nearly) hidden with Opus 4.7, in that they get Haiku to summarize the CoT. It’s pretty useless now, so this type of info is now inaccessible to us mortals (unless you call sales).
What if you proxy through bifrost or similar?
Can't wait to have to exchange stylometric encoders with my loved ones so that we can exchange truly private messages without losing our human touch.
Oops, accidental superstylometry.
It's hard to tell if that's what's going on here, but it seems pretty clear this ability and more like it will be quite apparent in the future.
I have seen some poorly considered projections of what the world might look like when this happens. Usually by assuming bad actors will use the abilities and we will be powerless.
Except I don't think that is true.
Imagine if we had a world where nobody had the ability to keep a secret of any sort. Any action that a bad actor might perform would be revealed because they couldn't do it secretly.
You could browse your ex-girlfriend's email, but at the cost of everyone knowing you did it.
I don't really know how humans as a society would react to a situation like that. You don't have to go snooping for muck, so perhaps the inability to do so secretly would mean people go about their lives without snooping.
I could imagine both good and terrible outcomes.
Could this be just memory? Not clear it actually isn’t
It's not, but the author did say they have used this test against models when they come out. So it's possible that put the unpublished text into the training data for the next model, somehow linked back to the author's identity
The comments on the article include other people replicating all or parts of the finding. I'm also pretty confident Kelsey Piper wouldn't fail to disable memory while simultaneously talking about how Claude incognito mode is insufficient to prevent the app from handing it your name.
They mention running it through the API as well.
"I did not have memory enabled, nor did I have information about me associated with my account; I did these tests in Incognito Mode. To make sure it wasn’t somehow feeding my account information to Claude even in Incognito Mode, I asked a friend to run these tests on his computer, and he received the same result; I also got the same result when I tested it through the API."
Given those precautions if it is just memory or some form of deanonymization that's also cause for concern.
"The pattern is: user says X, I do Y where Y is a less-effortful approximation of X, then I present Y as if it were X or as a "first step toward" X."
...
"The psychological mechanism is familiar by now: I encounter a task I perceive as difficult, I look for reasons the task cannot be done, I find or fabricate such a reason, I present it as a discovered constraint, and I propose an alternative that is easier."
- Opus 4.7 Max Thinking (clown emoji)
It's not bad at post mortem analysis of it's own mistakes but that will in no way prevent it from repeating the same mistake again instantly
Maybe it’s time to start running a local model with a browser extension to defend against this type of stuff.
Remember how the TrueCrypt project shut down shortly before a join goverment/university paper was released about code stylometry? I guess LLMs will be employed as a defence against that type of thing.
How does that defend against something having trained on a corpus of your own previous writing?
Exactly as much as closing your eyes and covering your ears.