DeepMind debuts watermarks for AI-generated text

(spectrum.ieee.org)

68 points | by ambigious7777 17 hours ago ago

94 comments

blintz 6 hours ago

These watermarks are not robust to paraphrasing attacks: AUC ROC falls from 0.95 to 0.55 (barely better than guessing) for a 100 token passage.

The existing impossibility results imply that these attacks are essentially unavoidable (https://arxiv.org/abs/2311.04378) and not very costly, so this line of inquiry into LLM watermarking seems like a dead end.

[-]

jkhdigital 5 hours ago

I spent the last five years doing PhD research into steganography, with a particular focus on how to embed messages into LLM outputs. Watermarking is basically one-bit steganography.

The first serious investigations into "secure" steganography were about 30 years ago and it was clearly a dead end even back then. Sure, watermarking might be effective against lazy adversaries--college students, job applicants, etc.--but can be trivially defeated otherwise.

All this time I'd been lamenting my research area as unpopular and boring when I should've been submitting to Nature!

bko 16 hours ago

This article goes into it a little bit, but an interview with Scott Aaronson goes into some detail about how watermarking works[0].

He's a theoretical computer scientist but he was recruited by OpenAI to work on AI safety. He has a very practical view on the matter and is focusing his efforts on leveraging the probabilistic nature of LLMs to provide a digital undetectable watermark. So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM. It's really clever and apparently he has a working prototype in development.

Some work arounds he hasn't figured out yet is asking for an output in language X and then translating it into language Y. But those may still be eventually figured out.

I think watermarking would be a big step forward to practical AI safety and ideally this method would be adopted by all major LLMs.

That part starts around 1 hour 25 min in.

> Scott Aaronson: Exactly. In fact, we have a pseudorandom function that maps the N-gram to, let’s say, a real number from zero to one. Let’s say we call that real number ri for each possible choice i of the next token. And then let’s say that GPT has told us that the ith token should be chosen with probability pi.

https://axrp.net/episode/2023/04/11/episode-20-reform-ai-ali...

[-]

nicce 10 hours ago

I don't think that provable watermarking is possible in practice. The method you mention is clever, but before it can work, you would need to know the probability of the every other source which could also be used to generate the output for the same purpose. If you can claim that the probability of that model is much higher on that model than in any other place, including humans, then watermark might give some stronger indications.

You would also need to define probability graph based on the output length. The longer the output, more certain you can be. What is the smallest amount of tokens that cannot be proved at all?

You would also need include humans. Can you define that for human? All LLMs should use the same system uniformally.

Otherwise, "watermaking" is doomed to be misused and not being reliable enough. False accusations will be take a place.

[-]

A_D_E_P_T 4 hours ago

I agree. I'd add that not only could human-written content fail the test -- it's also the case that humans will detect the word pairing, just as they detected "delve" and various other LLM tells.

In time most forms of watermarking along those lines will seem like elements of an LLM's writing style, and will quickly be edited out by savvy users.

123yawaworht456 6 hours ago

>So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM.

hah, every single LLM already watermarks its output by starting the second paragraph with "It is important/essential to remember that..." followed by inane gibberish, no matter what question you ask.

[-]

AlienRobot 6 hours ago

I've always felt you'd be able to tell someone uses Reddit because they'll reply to a comment starting the sentence with "The problem is that..."

Now LLMs are trained on Reddit users.

[-]

badsectoracula an hour ago

Sounds like LLMs are trained on my posts because i tend to use both of those phrases :-P

nprateem 35 minutes ago

Or just check whether text contains the word delve and it's most likely AI generated. I fucking hate that word now.

littlestymaar 16 hours ago

Sounds interesting, but it also sounds like something that could very well be circumvented by using a technique similar to speculative decoding: you use the censored model like you'd use the fast llm in speculative decoding, and you check whether the other model agrees with it or not. But instead of correcting the token every time both models disagree like you'd do with speculative decoding, you just need to change it often enough to mess with the watermark detection function (maybe you'd change every other mismatched token, or maybe one every 5 tokens would be enough to reduce the signal-to-noise ratio below the detection threshold).

You wouldn't even need to have access to an unwatermarked model, the “correcting model” could even be watermaked itself as long as it's not the same watermarking function applied to both.

Or am I misunderstanding something?

[-]

jkhdigital 5 hours ago

No you've got it right. Watermarks like this are trivial to defeat, which means they are only effective against lazy users like cheating college students and job applicants.

namanyayg 16 hours ago

"An LLM generates text one token at a time. These tokens can represent a single character, word or part of a phrase. To create a sequence of coherent text, the model predicts the next most likely token to generate. These predictions are based on the preceding words and the probability scores assigned to each potential token.

For example, with the phrase “My favorite tropical fruits are __.” The LLM might start completing the sentence with the tokens “mango,” “lychee,” “papaya,” or “durian,” and each token is given a probability score. When there’s a range of different tokens to choose from, SynthID can adjust the probability score of each predicted token, in cases where it won’t compromise the quality, accuracy and creativity of the output.

This process is repeated throughout the generated text, so a single sentence might contain ten or more adjusted probability scores, and a page could contain hundreds. The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark. This technique can be used for as few as three sentences. And as the text increases in length, SynthID’s robustness and accuracy increases."

Better link: https://deepmind.google/technologies/synthid/

[-]

baobabKoodaa 16 hours ago

I'm fascinated that this approach works at all, but that said, I don't believe watermarking text will ever be practical. Yes, you can do an academic study where you have exactly 1 version of an LLM in exactly 1 parameter configuration, and you can have an algorithm that tweaks the logits of different tokens in a way that produces a recognizable pattern. But you should note that the pattern will be recognizable only when the LLM version is locked and the parameter configuration is locked. Which they won't be in the real world. You will have a bunch of different models, and people will use them with a bunch of different parameter combinations. If your "detector" has to be able to recognize AI generated text from a variety of models and a variety of parameter combinations, it's no longer going to work. Even if you imagine someone bruteforcing all these different combos, trouble is that some of the combos will produce false positives just because you tested so many of them. Want to get rid off those false positives? Go ahead, make the pattern stronger. And now you're visibly altering the generated text to an extent where that is a quality issue.

In summary, this will not work in practice. Ever.

[-]

emporas 6 hours ago

In practice, every programmer or a writer who gets the LLM output, does a lot of rewriting for already existing code, or already existing text. Stitching together parts of many LLM outputs is the only way to use an LLM effectively, even stitching together parts of different LLMs, which i do all the time.

Recognizing only parts of a watermark, and many watermarked parts scattered all around doesn't seem possible at all, in my mind.

They can however develop a software to sell very expensively to universities, schools etc, and it will occasionally catch a very guilty person who uses it all the time and doesn't even try to make the answer better, who always hands over the LLM answer in one piece.

At the end of the day, it will lead to so many false accusations people will stop trusting it. In chess players and tournaments false accusations of cheating happen all the time, for 15 years or more. Right now former world chess champion Kramnik has accused over 50 top chess players of cheating, including the 5 times US champion Nakamura, in the span of 2 months.

If a software like that gets applied to schools and universities, we are gonna have the fun of our lives.

TeMPOraL 6 hours ago

Even with temperature = 0, LLMs are still non-deterministic, as their internal, massively parallelized calculations are done with floating point arithmetic, which is order-dependent. Running the same LLM with the exact same parameters multiple times might still yield slightly different probabilities in the output, making this watermarking scheme even less robust.

[-]

jkhdigital 5 hours ago

This isn't necessarily true, it just depends on the implementation. I can say that because I've published research which embeds steganographic text into the output of GPT-2 and we had to deal with this. Running everything locally was usually fine--the model was deterministic as long as you had the same initial conditions. The problems occurred when trying to run the model on different hardware.

nprateem 28 minutes ago

That's not my experience unless LLM providers are caching results. It's frustratingly difficult to get it to output substantially different text for a given prompt. It's like internally it always follows mostly the same reasoning for step 1, then step 2 applies light fudging of the output to give the appearance of randomness, but the underlying structure is generally the same. That's why there's so much blog spam that all pretty much read the same, but while one "delves" into a topic another "dives" into it.

How long until they can write genuinely unique output without piles of additional prompting?

bgro 16 hours ago

Couldn’t this be easily disrupted as a watermark system by simply changing the words to interfere with the relative checksum?

I suspect sentence structure is also being used or, more likely, the primary “watermark”. Similar to how you can easily identify if something is at least NOT a Yoda quote based on it having incorrect structure. Combine that with other negative patterns like the quote containing Harry Potter references instead of Star Wars, and you can start to build up a profile of trends like this statement.

By rewriting the sentence structure and altering usual wording instead of directly copying the raw output, it seems like you could defeat any current raw watermarking.

Though this hasn’t stopped Google and others in the past using bad science and stats to make unhinged entitled claims like when they added captcha problems everybody said would be “literally impossible“ for bots to solve.

What a surprise how trivial they were to automate and the data they produce can be sold for profit at the expense of mass consumer time.

[-]

scarmig 15 hours ago

In principle, it seems like you could have semantic watermarking. For instance, suppose I want a short story. There are lots of different narrative and semantic aspects of it that each carry some number of bits of information: setting, characters, events, and those lay on a probability distribution like anything else. You just subtly shift the probability distribution of those choices, and then it's resistant to word choice, reordering, and any transformation that maintains its semantic meaning.

[-]

akomtu 3 hours ago

Much simpler: make every sentence contain an even number of words. Then the chances of 10 sentences in a row to be all even is about 0.1%.

ksaj 16 hours ago

Some of the watermarking is really obvious. If you write song lyrics in ChatGPT, watch for phrases like "come what may" and "I stand tall."

It's not just that they are (somewhat) unusual phrases, it's that ChatGPT comes up with those phrases so very often.

It's quite like how earlier versions always had a "However" in between explanations.

[-]

fkyoureadthedoc 16 hours ago

Coheed and Cambria were using ChatGPT this whole damn time, smh

GaggiX 16 hours ago

ChatGPT does not have a watermark.

[-]

sunaookami 16 hours ago

It has a rich tapestry of watermarks.

[-]

becquerel 6 hours ago

A delicate dance of them, perhaps

[-]

mondobe 2 hours ago

It's like a symphony of watermarks, all playing in harmony.

jgalt212 16 hours ago

I suggest we "delve" deeper int this problem.

aleph_minus_one 16 hours ago

What makes you sure about that?

fny 4 hours ago

I think we just need to give up on this. What’s the harm? It’s not like some ground truth is fabricated.

I’m far, far more concerned about photo, video, and audio verification. We need a camera that can guarantee a recording is real.

[-]

ziofill 3 hours ago

I've been thinking about this for a while. Digital signatures can guarantee that a piece of data is authentic, if the author wishes to sign it.

foxglacier 3 hours ago

Why do we need that for photo, video and audio? If it's about the general public believing something false, they're not going to check the watermarks of random internet content or trust anyone who says they checked it. If they really want to know, they can go to the source and if they trust that person or organization, they can also trust the content they published. If it's about use in court, we already have a system for that - the person who recorded it appears in court as a witness and promises that they didn't alter it then if it turns out they did, they can go to prison.

mateus1 16 hours ago

Google is branding this in a positive light but this is just AI text DRM.

[-]

sebstefan 16 hours ago

It's likely more about preventing model incest than digital rights management

gwbas1c 15 hours ago

Like all things a computer can / can't do; DRM isn't inherently bad: It's how its used that's a problem.

IE, DRM can't change peoples' motivations. It's useful for things like national security secrets and trade secrets, where the people who have access to the information have very clear motivations to protect that information, and very clear consequences for violating the rules that DRM is in place to protect.

In this case, the big question of if AI watermarking will work / fail has more to do with peoples' motivations: Will the general public accept AI watermarking because it fits our motivations and the consequences we set up for AI masquerading as a real person, or AI being used for misinformation? That's a big question that I can't answer.

[-]

mateus1 15 hours ago

This is not a “good deed for the public” done by Google, this is just a self serving tool to enforce their algorithms and digital property. There is nothing “bad” here for the public but it’s certainly not good either.

fastball 16 hours ago

I for one am glad we might have a path forward to filtering out LLM-generated sludge.

[-]

pyrale 16 hours ago

> we

If by "we" you mean anyone else than Google and the select few other LLM provider they choose to associate with, I'm afraid you're going to be disappointed.

[-]

fastball 3 hours ago

If there is a detectable fingerprint, we can detect it too. Probably don't even need a Bletchley Park.

playingalong 16 hours ago

> the team tested it on 20 million prompts given to Gemini. Half of those prompts were routed to the SynthID-Text system and got a watermarked response, while the other half got the standard Gemini response. Judging by the “thumbs up” and “thumbs down” feedback from users, the watermarked responses were just as satisfactory to users as the standard ones.

Three comments here:

1. I wonder how many of the 20M prompts got a thumbs up or down. I don't think people click that a lot. Unless the UI enforces it. I haven't used Gemini, so I might be unaware.

2. Judging a single response might be not enough to tell if watermarking is acceptable or not. For instance, imagine the watermarking is adding "However," to the start of each paragraph. In a single GPT interaction you might not notice it. Once you get 3 or 4 responses it might stand out.

3. Since when Google is happy with measuring by self declared satisfaction? Aren't they the kings of A/B testing and high volume analysis of usage behavior?

[-]

varispeed 16 hours ago

> I don't think people click that a lot.

I sometimes do, but I almost always give wrong answer or opposite answer where possible.

[-]

85392_school 6 hours ago

I suspect your account's feedback would be easily filtered out

froh 16 hours ago

but why? what for?

[-]

thebruce87m 16 hours ago

My timesheet SAAS constantly asks for feedback, which I give 0/10 as constantly asking for feedback really annoys me.

They then contact me and ask me why, so I tell them then they say there is nothing they can do. A week later I’ll get a pop up asking for feedback and we go round the same loop again.

varispeed 16 hours ago

Because companies like Google are a cancer and I don't want to give them data they didn't pay for.

[-]

froh 11 hours ago

reminds me of "what have the romans ever done for us?"

but thx for elaborating.

nprateem 17 minutes ago

Google are obviously pushing this as a way to root out AI blog spam.

If only they can get other providers to use it because of 'safety' or something they won't have to change their indexer much. Otherwise page rank is dead due to the ease of creating content farms.

espadrine 15 hours ago

The academic paper: https://www.nature.com/articles/s41586-024-08025-4

They use the last N prefix tokens, hash them (with a keyed hash), and use the random value to sample the next token by doing an 8-wise tournament, by assigning random bits to each of the top 8 preferred tokens, making pairwise comparisons, and keeping the token with a larger bit. (Yes, it seems complicated, but apparently it increases the watermarking accuracy compared to a straightforward nucleus9 sampling.)

The negative of this approach is that you need to rerun the LLM, so you must keep all versions of all LLMs that you trained, forever.

[-]

mmoskal 5 hours ago

They actually run 2^30-way tournament (they derive an equivalent form that doesn't requires 2B operations). You do not need to run the LLM, it only depends on the tokenizer.

jkhdigital 5 hours ago

Why do you need to rerun the LLM? Watermark detection only requires the hash functions (equation (1) from the paper).

harimau777 3 hours ago

This strikes me as potentially a bad thing for regular people. For example, corporations call still use AI filtering to force job seekers to jump through hoops but job seekers won't be able to use AI to generate the cover letters and resumes that those hoops demand.

matthewmorgan an hour ago

Who is going to pay for watermarked output?

tokioyoyo 16 hours ago

Correct me if I’m wrong, but wouldn’t it simply drive people to use LLMs that are not watermarking their content?

[-]

aleph_minus_one 16 hours ago

I think your idea is basically right, but there are two points to consider:

- Your hypothesis only holds if the alternative LLM is also "sufficiently good". If Gemini does not stay competitive with other LLMs, Google's AI plans have a much more serious problem.

- Your hypothesis assumes that many people will be capable of detecting the watermarks (both of Gemini and other LLMs) so that they can make a conscious choice for another LLM. But the idea behind good watermarking is that it is not that easy to detect.

kranner 16 hours ago

According to the article, you can just have another LLM summarise Gemini's watermarked output and that will "likely" defeat the watermark detection.

[-]

scarmig 16 hours ago

But, if all the good models can only be trained by large mega corps with close connections to the government, it's only a matter of time until that other LLM will just add its own watermark.

onion2k 16 hours ago

People use Google Search despite it being littered with adverts and tracking. Maybe Google are counting on either being better than the competition despite watermarking, or simply accepting that people who don't care are enough of a market that it's still worth adding.

ndr 16 hours ago

Agreed, that's the obvious prediction. They're also going to perform worse on 3p benchmarks right?

dartharva 16 hours ago

If Google locks in enterprise clients using Google Workspace to Gemini then they won't really have a choice. It is selling it as an "add-on" already: https://workspace.google.com/solutions/ai/#plan

Suffice to say it is evident that no other LLM will come close in integration with Google Docs and other Workspace apps as Gemini.

nicce 16 hours ago

Correct me if I'm wrong, but watermarking is only possible, if the model has a limited set of input you can provide (affects for the output) and a limited set of output it produces, and it should be completely deterministic. And you should pre-calculate all possible combinations.

And this should be also the case for every possible LLMs; then you can compare which LLMs could produce which outputs based on what inputs. Then there is some certainty that this output is produced by this LLM and this another LLM might produce it as well with these inputs.

So... impossible?

beepbooptheory 16 hours ago

Why does the user care if its watermarked? Surely there are only some use cases for this stuff where it matters. Most of the time isn't it just people having ephemeral chats where this wouldn't matter?

[-]

ajdlinux 16 hours ago

Using LLMs to write your essays and reports for school or uni, in a way that could get you punished if caught, is a reasonably big use case.

[-]

beepbooptheory 14 hours ago

Agreed its probably a big use case in general, but like token per token I bet its relatively small! How many big papers do you have to write a semester? Even if its four, that's nothing compared to the everyday use you will make of it.

highcountess 16 hours ago

I see no scenario where there won’t be an LLM that is deliberately tailored for that purpose, possibly even built by an “intel” agency for the very purpose of having blackmail over someone that may become useful later in their career.

tokioyoyo 16 hours ago

AIs and LLMs have an extremely uphill PR battle to fight right now. Anything that is deemed AI generated is assumed to be borderline trash (lots of exceptions, but you get the point). So, I can see that if someone uses LLM to generate text, they don’t want it to be marked as “low effort content”.

[-]

beepbooptheory 14 hours ago

There are definitely exceptions, and that there are maybe proves that it is less Anti-AI prejudice at play and more just reacting to things that are indeed trashy. It just so happens a lot of it today is from AI I think (for, I hope, obvious reasons).

Just to say, maybe give it a little time, but a watermark like this is not going be thing that decides someone's reaction in the near future, just what it says. (I am just betting here).

But its going to be an uphill battle either way if you are really getting the model to write everything, I do not envy that kind of project.

glenstein 15 hours ago

People made this same argument about DRM escalations, about increasing privacy violations in the browser, and about Google's donations to support climate change misinformation. Even about Facebook interface redesigns. Every variation of "people will be driven to do X" I've ever heard assumes some coherence and unity of collective purpose that rarely matches the reality of how people behave.

There are counter examples, e.g. Unity. But catching that lightning in a bottle is rare and merits special explanation rather than being assumed.

[-]

tokioyoyo 11 hours ago

Using LLMs in exams and homeworks has a different driver. Getting caught results in punishment, so using alternative would be better. None of the aforementioned examples have a “stick” aspect to it when you stick to Google.

ajwin 6 hours ago

Do LLM's always pick the most probable next word? I would have thought this would lead to having the same output for every input? How does this deal with the randomness that you get from prompting the same thing over and over?

[-]

janalsncm 5 hours ago

It depends. If we use beam search we pick the most likely sequence of tokens rather than the most likely token at each point in time. This process is deterministic though.

We can also sample from the distribution, which introduces randomness. Basically, if word1 should be chosen 75% of the time and word2 25% of the time, it will do that.

The randomness you’re seeing can also be due to implementation details.

https://community.openai.com/t/a-question-on-determinism/818...

8note 6 hours ago

There is at least a parameter called Temperature which decides how much randomness to include in the output.

It doesn't get you perfectly deterministic output to set it to 0 though, per https://medium.com/google-cloud/is-a-zero-temperature-determ... as you don't have perfect control over what approximations are being made on your floating point operations

[-]

mmoskal 3 hours ago

The most typical reason argmax (temp 0) is non-deterministic is that your request is running batched with other people requests. The number and size of these affects the matrix sizes and thus tiling decisions. Then you get different floating point order and thus different results.

Nvidia gives some guarantees about deterministic results of their kernels but that only applies when you have exact same input data and this is not the case when in-flight batching.

js8 16 hours ago

I think people are already doing that. I frequently hear people watermarking their speeches with phrases like "are we aligned on this?", or "let's circle back" and similar.

[-]

lcnPylGDnU4H9OF 16 hours ago

I can’t tell if this is satire but that’s just corp-speak. I imagine those people also occasionally suggest “touching base” and “taking this offline”.

The phrases usually mean something useful, if one knows the meaning, but it is amusing how much people seem to stick with the same ones, even across companies.

[-]

js8 15 hours ago

I am not sure whether it was satire. I personally don't like corp speak - it feels like people talking like that are not humans. I am not sure I would welcome our AI overlords speaking like this, either.

But I find the idea that people will subconsciously start copying AI speech patterns (perhaps as a signal of submission) amusing. I think it's gonna throw a wrench into the idea.

IMHO LLMs either should help us communicate more clearly and succinctly, or we can use them as tools for creativity ("rephrase this in 18th century English"). Watermarking speech sabotages both of these use cases.

rany_ 16 hours ago

I really want to be able to try Gemini without the AI watermark. IIRC they've used SynthID from the start and it makes me wonder if it's the source of all of Gemini's issues.

Obviously Google claims that it doesn't cause any issues but I'd think that OpenAI and other competitors would have something similar to SynthID if it didn't impact performance.

[-]

throwaway314155 6 hours ago

> IIRC they've used SynthID from the start

Is that not at odds with what's presented in the article here?

tiffanyh 16 hours ago

OT: The publication (Spectrum by IEEE) has some really good content.

It's starting to become a common destination for when I want to read about interesting things.

cowmix 6 hours ago

"I hope this message finds you well." --- busted!

FilipSivak 16 hours ago

How is this supposed to work? By inserting special unicode characters?

How can you watermark text?

[-]

a2128 16 hours ago

I haven't read how Google is doing it, but one way it could be done is to nudge which tokens get sampled. For example, every other token could have an odd numbered id (where each token is assigned an id from 0 to 32000 or however many it has). Then in order to detect the watermark you just tokenize the text and see if the pattern is there. A problem with this approach is that it harms the accuracy and coherency, for example if you ask "What is 2+2", and the token "4" is token #102, and it has to pick an odd-numbered token, then it may respond with a wrong answer or yap on strangely due to its limited selection of tokens (like "The accurate answer to your mathematical query is the number Four")

hiatus 16 hours ago

You can insert known spelling errors, choose certain phrasings, and more. It doesn't have to be new characters added to the text. Government security services have done stuff like this for decades to weed out moles.

[-]

luigibosco 16 hours ago

moles should know better than to utilize mountweazels! https://en.wikipedia.org/wiki/Fictitious_entry

zorked 16 hours ago

We've been studying unintentional watermarks for years.

https://en.wikipedia.org/wiki/Stylometry

sumtechguy 16 hours ago

You do not even need extra characters (although they help). You can use spaces, missing punctuation, upper/lower case in particular cases, conjunction usage and not using it, word substitution, common misspellings, transposed letters, etc. How many extra spaces/tabs can you add to the end of a paragraph? At the beginning? Between sentences? Inside them? Then you have an AI agent design it and then train another one to detect it.

das_keyboard 16 hours ago

> SynthID-Text works by discreetly interfering in the generation process: It alters some of the words that a chatbot outputs to the user in a way that’s invisible to humans but clear to a SynthID detector. “Such modifications introduce a statistical signature into the generated text,” [...] “During the watermark detection phase, the signature can be measured to determine whether the text was indeed generated by the watermarked LLM.”

voidUpdate 16 hours ago

As stated in the article, it alters the probabilities that the network produces in a predictable way so that a different (but still correct-sounding) word is picked. It subtly alters the wording from what it would have output normally in such a way that you can detect it, while still sounding correct to the user

sebstefan 16 hours ago

There's an article from ieee that explains it:

https://spectrum.ieee.org/watermark#:~:text=How%20Google%E2%...

playingalong 16 hours ago

> It has also open-sourced the tool and made it available to developers and businesses, allowing them to use the tool to determine whether text outputs have come from their own large language models (LLMs), the AI systems that power chatbots. However, only Google and those developers currently have access to the detector that checks for the watermark.

These two sentences next to each other don't make much sense. Or are misleading.

Yeah. I know. Only the client is open source and it calls home.

[-]

falcor84 16 hours ago

Is there significant throttling to prevent us from training a classification model against it?

samatman 16 hours ago

This is information-theoretically guaranteed to make LLM output worse.

My reasoning is simple: the only way to watermark text is to inject some relatively low-entropy signal into it, which can be detected later. This has to a) work for "all" output for some values of all, and b) have a low false positive rate on the detection side. The amount of signal involved cannot be subtle, for this reason.

That signal has a subtractive effect on the predictive-output signal. The entropy of the output is fixed by the entropy of natural language, so this is a zero-sum game: the watermark signal will remove fidelity from the predictive output.

This is impossible to avoid or fix.

[-]

thornewolf 6 hours ago

you are correct of we suppose we are at a global optimum. however, consider this example:

i have two hands

i have 2 hands

these sentences communicate the same thing but one could be a watermarked result. we can apply this equivalent meaning word/phrase change many times over and be confident something is watermark while having avoided any semantic shifts.

jkhdigital 5 hours ago

You're not wrong, but natural language has a lot of stylistic "noise" which can be utilized as a subliminal channel without noticeably degrading the semantic signal.

lowbloodsugar 4 hours ago

I want AI to use just the right word when it’s writing for me. If it’s going to nerf itself to not choose the perfect word so it can be watermarked, then why would I use that product? I’ll go somewhere else. And if it does use just the right word, then how is that different from a great human writer?

[-]

Nasrudith an hour ago

There is the 'loser's litigation' method of getting all of your non-watermarked competitors banned. Usually involving some combination of magical rights removing brain-hacks like national security or 'the children'.