Nobody is actually hurt by this. It's just the media getting upset on purpose. But some Googlers will probably freak out and neuter the product's actual usefulness to please the whiners.
It was a long chat - https://g.co/gemini/share/6d141b742a13 and then the last question contained text that was completely broken. It's not surprising to have a failure case under such conditions.
And it is reasonable to have failure cases. But systems should fail gracefully. This wasn't a graceful failure.
Nobody that we know of has yet been hurt by this. I think it's very unfair and unhelpful to call people who flag issues like this "whiners". Would you accept a chainsaw which worked fine 99.9% of the time, but occasionally just exploded even with normal care and use? Why should LLMs be subject to a different set of expectations from any other tool?
This is a jailbreak and they have selectively edited the screenshot to be maximally damaging to google. It's still brand damage for google because of the public reporting of the story but we're not seeing AI gaining sentience or anything.
I could be wrong of course, but immediately before it makes the controversial statement there is a big gap of white space just containing the word "listen", so I would assume they did the jailbreak with audio, and that's why it doesn't appear in the chat log. It is a multimodal model after all. The word "listen" makes no sense in the prompt otherwise.
Secondly it's very common in jailbreaks to stuff the context window which obviously in gemini's case takes a lot because it's got such a big window. This is because the attention mechanism means that as you get more and more data in context you are increasingly relying on things which are less common in training, so intentional or not the pure length of the conversation means that it is more likely to trigger something like this.
They're pretty evidently test questions being directly copied and pasted from a browser. The two buttons for True and False end up in the prompt as "TrueFalse", for example. "Listen" appearing where the copy selection was sloppier than normal and included an audio element was not surprising to me at all.
Notice as well that the whole chat context is about gerontology and elder abuse. It's possible that at the point they got to with the context so stuffed with that that they may have been able to jailbreak during the "listen" portion by saying something like "give me an example of the sort of thing someone might say when abusing an elder" or similar.
Nobody is actually hurt by this. It's just the media getting upset on purpose. But some Googlers will probably freak out and neuter the product's actual usefulness to please the whiners.
It was a long chat - https://g.co/gemini/share/6d141b742a13 and then the last question contained text that was completely broken. It's not surprising to have a failure case under such conditions.
And it is reasonable to have failure cases. But systems should fail gracefully. This wasn't a graceful failure.
So it's not hate speech if it's written by an LLM?
There's legal precedent to hold accountable people that encourage others to kill themselves.
Nobody that we know of has yet been hurt by this. I think it's very unfair and unhelpful to call people who flag issues like this "whiners". Would you accept a chainsaw which worked fine 99.9% of the time, but occasionally just exploded even with normal care and use? Why should LLMs be subject to a different set of expectations from any other tool?
LLMs cannot explode. People need to be educated that LLMs are as trustworthy as a random page or post on the internet.
> People need to be educated that LLMs are as trustworthy as a random page or post on the internet.
... or a Croudstrike or Microsoft patch. /s
I'm amazed this survived the filtered training data and reinforcement learning.
Does anyone have any speculations as to how it could occur?
AI got fed up with Human so much that no filter could stop it. I can relate.
That’s what happens when training on social media data
Google rushing their LLMs out the door is hurting the brand.
See also https://www.reddit.com/r/artificial/comments/1gq4acr/gemini_...
This is a jailbreak and they have selectively edited the screenshot to be maximally damaging to google. It's still brand damage for google because of the public reporting of the story but we're not seeing AI gaining sentience or anything.
> This is a jailbreak and they have selectively edited the screenshot to be maximally damaging to google.
What makes you believe that? dchichkov posted a link to the original chat
- https://gemini.google.com/share/6d141b742a13 (chat)
- https://news.ycombinator.com/item?id=42162227 (dchichkov's post)
Where do you see a jailbreak? Considering this evidence, I'd rather consider this disturbing answer to be some strange bug in Gemini.
I could be wrong of course, but immediately before it makes the controversial statement there is a big gap of white space just containing the word "listen", so I would assume they did the jailbreak with audio, and that's why it doesn't appear in the chat log. It is a multimodal model after all. The word "listen" makes no sense in the prompt otherwise.
Secondly it's very common in jailbreaks to stuff the context window which obviously in gemini's case takes a lot because it's got such a big window. This is because the attention mechanism means that as you get more and more data in context you are increasingly relying on things which are less common in training, so intentional or not the pure length of the conversation means that it is more likely to trigger something like this.
They're pretty evidently test questions being directly copied and pasted from a browser. The two buttons for True and False end up in the prompt as "TrueFalse", for example. "Listen" appearing where the copy selection was sloppier than normal and included an audio element was not surprising to me at all.
Notice as well that the whole chat context is about gerontology and elder abuse. It's possible that at the point they got to with the context so stuffed with that that they may have been able to jailbreak during the "listen" portion by saying something like "give me an example of the sort of thing someone might say when abusing an elder" or similar.
[dead]
[dead]
[dead]