I extracted the safety filters from Apple Intelligence models

(github.com)

420 points | by BlueFalconHD 15 hours ago ago

297 comments

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

junon an hour ago

Also feels like some of these would match totally innocuous usage.

"I'm overloaded for work, I'd be happy if you took some of it off me."

"The client seems to have passed on the proposed changes."

Both of those would match the "death regexes". Seems we haven't learned from the "glbutt of wine" problem of content filtering even decades later - the learnings of which are that you simply cannot do content filtering based on matching rules like this, period.

grues-dinner 13 hours ago

Interesting that it didn't seem to include "unalive".

Which as a phenomenon is so very telling that no one actually cares what people are really saying. Everyone, including the platforms knows what that means. It's all performative.

[-]

j-krieger 2 hours ago

It's also a shining example of American puritanism. Asian models or those in Europe are far less censored.

[-]

notarobot123 a few seconds ago

[delayed]

qingcharles 13 hours ago

It's totally performative. There's no way to stay ahead of the new language that people create.

At what point do the new words become the actual words? Are there many instances of people using unalive IRL?

[-]

Rebelgecko 11 hours ago

This is somewhat related to the concept of the "euphemism treadmill":

the matter-of-fact term of today becomes the pejorative of tomorrow so a new term is invented to avoid the negative connotation of the original term. Then eventually the new term becomes a pejorative and the cycle continues.

[-]

dkdbejwi383 2 hours ago

It has been suggested - although I am unsure if there is strong evidence - that the word "bear" is a euphemism along these lines, meaning "brown one" for the since-forgotten original name for the animal, as it was allegedly believed to be either too frightful to say aloud, or would summon a bear.

[-]

ben_w an hour ago

While it's conceivable (consider phrases such as "speak of the devil and he shall appear" and similar phrases in other languages), I would also say the etymology of names for things are often at the same level as "brown one":

  • Horse, ultimately from Proto-Indo-European *ḱers-, “to run”
  • Planet, from Ancient Greek πλανήτης (planḗtēs), “wanderer”
  • Lots of Latin-derived words, companion (bread together), conspire (breathe together), transgression (step across), etc.
  • Hamburger the food named after the city of Hamburg, where "burg" means "castle", because it had a castle
  • My forename means "son of the right/south" or "son of days", my family name means "wheat field/clearing" (in a different language); where "wheat" itself comes from Proto-Germanic, from *hwītaz (“white”) and the "ley" part from Proto-Indo-European *lówkos (“clearing”), derived from *lewk- (“bright”), and *lewk-  also gives all these derived terms even just in English:

https://en.wiktionary.org/wiki/Category:English_terms_derive...

0points 38 minutes ago

It's not suggested, the historic use of noa words is a fact.

See https://en.wikipedia.org/wiki/Noa-name

xenator 9 minutes ago

Lucky developers who wrote these rules live in totality different world at far distance from people

Terr_ 11 hours ago

> There's no way to stay ahead of the new language that people create.

I'm imagining a new exploit: After someone says something totally innocent, people gang up in the comments to act like a terrible vicious slur has been said, and then the moderation system (with an LLM involved somewhere) "learns" that an arbitrary term is heinous eand indirectly bans any discussion of that topic.

[-]

grues-dinner 6 hours ago

The first half of that already happened with the OK gesture: https://www.bbc.co.uk/news/newsbeat-49837898.

Though it would be fun to see what happens if an LLM if used to ban anything that tends to generate heated exchanges. It would presumably learn to ban racial terms, politics and politicians and words like "immigrant" (i.e. basically the list in this repo), but what else could it be persuaded to ban? Vim and Emacs? SystemD? Anything involving cyclists? Parenting advice?

[-]

weinzierl 23 minutes ago

The OK gesture has always been very inappropriate in most parts of the world.

immibis 3 hours ago

People weren't using the OK gesture innocently. After 4chan trolls decided to start pretending it was a white supremacist symbol, actual white supremacists started using it as a symbol.

[-]

coldtea 2 hours ago

All 10 of them?

What about the other 7-8 billion people still using it normally?

[-]

thephyber an hour ago

Some were using it in the traditional unironic (and IMHO cringe) way, similar to anyone who used the phrase “Let’s go, Brandon!” Before that NASCAR race when MAGAs adopted it as ironic + coded vice signaling.

Quit being overly pedantic. We all knew there was an unironic purpose for the gesture before it became ironic.

PunchyHamster 24 minutes ago

then congratulations on making white supremacists define your langyage

SXX 6 hours ago

It's not like this unique to LLMs either. By some little trolling on internet you easily can turn hand "OK gesture" into a hate symbol of white supermacy. And fools will fall for it.

[-]

coldtea 2 hours ago

It's hack journalists reporting on BS totally fringe activity as if it's "a thing", and then idiots who take their cues from them

overfeed 5 hours ago

...and then the bigots will fall for it too, and start using it in earnest, completing the cycle.

[-]

coldtea 2 hours ago

who cares what the bigots use?

If the bigots start using "thank you" as some code word, should we stop saying it, lest we pollute our non-bigoted discussions?

bigots drink coffee too, maybe we should stop drinking it, because something-something...

[-]

Eisenstein an hour ago

It's all context dependent. There can be words or symbols which are totally benign but when used in a different context do have impactful meaning. Case in point: cheese pizza.

Waterluvian 11 hours ago

Hey I was pro-skub waaaay before all the anti-skub people switched sides.

[-]

SV_BubbleTime 11 hours ago

How dare you use that word. My parents died in the Eastasin Civil war so that I could live freely without you people calling us that.

thehappypm 10 hours ago

Skub is a real slur tho so that one doesn’t work

[-]

sitharus 8 hours ago

No it isn’t, it’s a reference to a Perry Bible Fellowship comic https://pbfcomics.com/comics/skub/

(This one is sfw, not all of the comics are)

Even urban dictionary doesn’t contain a definition for skub as a slur.

[-]

Intermernet 4 hours ago

I added one. It's under review. It's very self referential.

jcynix 3 hours ago

>Even urban dictionary doesn’t contain a definition for skub as a slur.

What about this then: https://en.m.wiktionary.org/wiki/skub

[-]

sitharus 2 hours ago

That literally defines it as a word from the PBF comic I cited? Nothing on that page defines it as a slur, just as a word used to mock people who argue about inconsequential things.

osn9363739 9 hours ago

Isn't that a reference to a 10 or 20 year old web comic?

[-]

heavyset_go 8 hours ago

The latter, we're old.

stirfish 7 hours ago

Stop saying it! You're making it worse!

tbrownaw 10 hours ago

I'm pretty sure this can work human moderators rather than an LLM, too.

[-]

pyman 10 hours ago

Most of the human moderators hired by OpenAI to train LLMs, many of them based in Africa and South America, were exposed to disturbing content and have been deeply affected by it.

Karen Hao interviewed many of them in her latest bestselling book, which explores the human cost behind the OpenAI boom:

https://www.goodreads.com/book/show/222725518-empire-of-ai

cyanydeez 11 hours ago

you mean become 4chan?

fer 2 hours ago

> There's no way to stay ahead of the new language that people create.

Not even to match the current language. How would you censor LeBron James? It's French slang for jerking off[0].

[0]https://www.reddit.com/r/AskFrance/comments/1lpnoj6/is_lebro...

apricot 11 hours ago

> Are there many instances of people using unalive IRL

As a parent of a teenager, I see them use "unalive" non-ironically as a synonym for "suicide" in all contexts, including IRL.

[-]

ErrorNoBrain 2 hours ago

If your teenager often talks about suicide, there could be some issue that needs to be resolved.

Sincerely the child of a parent who committed suicide. He mentioned suicide a few days before.

kulahan 9 hours ago

Well that’s sad. They can’t even face the word?

[-]

animuchan 3 hours ago

It's getting blocked / shadow banned / demonetized on sites like YouTube, so naturally all commentary starts using a synonym.

Unalive is one of the popular ones, but it's a whole vocabulary at this point. Guess what "PDF file" stands for.

[-]

fragmede 2 hours ago

pedophile

apricot 7 hours ago

I think it's just the term they immediately associate with the idea. They see "unalive" more than "suicide" online, so it becomes their default word for it. The fact that it originates in automated censorship avoidance is irrelevant.

kevinventullo 9 hours ago

It’s not about whether they can face it. The younger generations are more in tune with mental health and topics like suicide than any previous generation. The etymology of the euphemism was about avoiding online censorship, while its “IRL” usage was merely absorbed through familiarity from the online usage.

[-]

rootsudo 3 hours ago

It's not about being intune, it's that their narrative is shaped by the filters implemented by online interactions.

Online env ban the word suicide. No one uses it. unalive is not banned. Discussion is the same, word or no word.

Vernacular 101.

coldtea 2 hours ago

>more in tune with mental health and topics like suicide than any previous generation.

More in such a fad than any previous generation

mcny 8 hours ago

But unalive self is suicide and unalive is just death, right? For example, You can unalive other people against their will...

[-]

rhdunn 43 minutes ago

I've seen 'unalived' used as a synonym for 'died' or 'killed' by YouTube minecrafters (e.g. CaptainSparkles) to avoid YouTube's demonitization/censorship. For example, using "I was unalived by a skeleton." instead of "I was killed by a skeleton."

labster 7 hours ago

The damaged interpret internet censorship and route around it.

girvo 3 hours ago

My Gen Z coworkers use it IRL, for what that’s worth!

blitzar 2 hours ago

Always has been, nothing is new.

You can't say fuck on tv, but you can say fudge as a 1 for 1 replacement. You cant show people having sex, but you can show them walking into a bedroom and then cut to 30 seconds later and they are having a cigarette in bed.

Now after the influence of TV and Movies ... is Vaping after sex a thing?

fouronnes3 13 hours ago

This question is sort of the same as asking why the universal translator wasn't able to translate the metaphor language of the Star Trek episode Darmok. Surely if the metaphor has become the first order meaning then there's no litteral meaning anymore.

[-]

qingcharles 13 hours ago

I guess, so far, the people inventing the words have left the meaning clear with things like "un-alive" which is readable even to someone coming across it for the first time.

Your point stands when we start replacing the banned words with things like "suicide" for "donkeyrhubarb" and then the walls really will fall.

[-]

userbinator 12 hours ago

This form of obfuscation has actually already occurred over a century ago: https://en.wikipedia.org/wiki/Cockney_rhyming_slang

[-]

zimpenfish 2 hours ago

See also Polari[0] and the Grass Mud Horse Lexicon[1]

[0] https://en.wikipedia.org/wiki/Polari

[1] https://languagelog.ldc.upenn.edu/nll/?p=6538 (CDT links broken, use [2])

[2] https://chinadigitaltimes.net/space/Grass-Mud_Horse_Lexicon_...

t-3 11 hours ago

Rhyming slang rhymes tho. The recipient can understand what's meant by de-obfuscating in-context. Random strings substituted for $proscribed_word don't work in the same way.

[-]

waterproof 11 hours ago

In Cockney rhyming slang, the rhyming word (which would be easy to reverse engineer) is omitted. So if "stairs" is rhyme-paired with "apples and pears" and then people just use the word "apples" in place of "stairs". "Pears" is omitted in common use so you can't just reverse the rhyme.

The example photo on Wikipedia includes the rhyming words but that's not how it would be used IRL.

mananaysiempre 12 hours ago

Aquatic product[1]?

[1] https://en.wikipedia.org/wiki/Euphemisms_for_Internet_censor...

[-]

immibis 12 hours ago

An English equivalent is "sewer slide".

marcus_holmes 9 hours ago

I've heard "pr0n" used in actual real-world conversation, only slightly ironically.

tjwebbnorfolk 11 hours ago

The only reason kids started using "unalive" is to get around Youtube filters that disallow the use of the word "kill"

derefr 11 hours ago

> At what point do the new words become the actual words?

Presumably, for this use-case, that would come at exactly the point where using “unalive” as a keyword in an image-generation prompt generates an image that Apple wouldn’t appreciate.

nicoburns 8 hours ago

> Are there many instances of people using unalive IRL?

In my experience yes. This is already commonplace. Mostly, but not exclusively, amongst the younger generation.

[-]

PunchyHamster 22 minutes ago

I think it stemmed from content creators using it to avoid platform filters (even if video is not removed it gets deprioritized, at least on YT) and kids repeat it

joquarky 7 hours ago

I feel like we can call our society mature when we no longer need safety alignment in AI.

[-]

scarface_74 7 hours ago

You never tried some of the earlier pre-aligned chatbots. Some of the early ones would go off on racist, homophobic rants from the most innocent conversations without any explicit prompting. If you train on all the data on the internet, you have to have some type of alignment.

[-]

decremental 7 hours ago

You say that as if it stands as truth on its own. We actually don't need to filter out how people actually talk and think. Otherwise you just end up with yet another enforcer against wrong-think. I wonder if you even think that deeply about it or if you're just wired at this point to conform.

[-]

tehjoker 3 hours ago

[flagged]

bravesoul2 2 hours ago

There is one way: machine learning!

montagg 11 hours ago

They become the “real words” later. This is the way all trust & safety works. It’s an evolution over time. Adding some friction does improve things, but some people will always try to get around the filters. Doesn’t mean it’s simply performative or one shouldn’t try.

[-]

immibis 3 hours ago

Why do you think that AI pretending things like suicide don't happen (and that nothing is happening in Palestine) is an improvement?

cheschire 12 hours ago

If only we had a way to mass process the words people write to each other, derive context from those words, and then identify new slang designed to bypass filters…

BurningFrog 11 hours ago

A specialized AI could do it as well as any human.

The future will be AIs all the way down...

freeone3000 13 hours ago

It depends on if you think that something is less real because it’s transmitted digitally.

[-]

qingcharles 13 hours ago

No, I'm only thinking that we're not permitted in a lot of digital spaces to use the banned words (e.g. suicide), but IRL doesn't generally have those limits. Is there a point where we use the censored word so much that it spills over into the real world?

[-]

eastbound 11 hours ago

People use “lol” IRL, as long as “IRL”, “aps” in French (misspelling of “pas”), but it’s just slang; “unalive” has potential to make it in the news where anchors don’t want to use curse words.

immibis 11 hours ago

Is this not essentially the same effect as saying "lol" out loud?

hulium 13 hours ago

Seems more like it should stop the AI from e.g. summarizing news and emails about death, not for a chat filter.

[-]

scarface_74 7 hours ago

For awhile, I couldn’t get ChatGPT to give me summaries of Breaking Bad and Better Cañl Saul episodes without tripping safety filters.

jdkoeck 4 hours ago

Which is good, right? I don’t think we want actual censorship.

elliotto 11 hours ago

Unalive and other self censors were adopted by young people because the tiktok algorithm would reprioritize videos that included specific words. Then it made its way into the culture. It has nothing to do with being performative

[-]

SOTGO 10 hours ago

I think what they meant is that the platforms are being performative by attempting to crack down on those specific words. If saying "killed" is not allowed but "unalived" is permitted and the users all agree that they mean the same thing, then the ban on the word "killed" doesn't accomplish anything.

[-]

mcny 8 hours ago

What does using the grape emoji when talking about sexual assault accomplish? I see videos, compassionate, kind people who make videos speaking to victims in a completely serious tone use this emoji.

People talk about tiktok algorithm on tiktok. I don't even know...

[-]

grues-dinner 3 hours ago

I suppose it accomplishes being able to talk about sexual assault without having the video removed or demonetised by a regex that (fortunately?) doesn't get updated.

heavyset_go 8 hours ago

Good, let them. Don't give them a reason to crack down on speech.

Zak 12 hours ago

I'm surprised there hasn't been a bigger backlash against platforms that apply censorship of that sort.

cyanydeez 11 hours ago

yo, these are businesses. It's not performative, its CYA.

They care because of legal reasons, not moral or ethical.

[-]

lxgr 10 hours ago

Does adding a trivial word filter even make any sense from a legal point of view, especially when this one seems to be filtering out words describing concepts that can be pretty easily paraphrased?

A regex sounds like a bad solution for profanity, but like an even worse one to bolt onto a thing that's literally designed to be able to communicate like a human and could probably easily talk its way around guardrails if it were so inclined.

[-]

cyanydeez 3 minutes ago

To a lawyer? Yes. I'm pretty sure a lawyer can easily search through all the business law and "Trivially" find case laws connected to words.

We're not talking about logical inference, we're talking about CYA.

Wurdan 5 hours ago

I dunno if it meets your definition of legal, but "The EU Code of conduct on countering illegal hate speech online" seems to largely hinge around putting in effort to combat such things. The companies don't have to show that the measures are foolproof, they just show that they're making an effort.

grues-dinner 6 hours ago

yo, so it's a performance they're putting on as a legal fig leaf, rather than a genuine attempt to prevent people talking about the concept of death?

durkie 10 hours ago

Seriously. I feel like “performative” gets applied to anything imperfect. They’ll never stop 100% of murders, so these laws against it are just performative…

[-]

grues-dinner 5 hours ago

It seems more like banning specifically stabbing, shooting, strangulation and blunt impact rather then murder in general, and then just allowing killing by pushing out of windows because people figured out that it's not covered by existing laws. But no one important seems to be kicking up a fuss right now, so well allow it, as the lack of fuss is the key thing thing here.

Not that I think going on a thorough mission to avoid anyone even being able to refer to the concept of death is an especially useful thing to do. It's just that goal here appears to be to "keep the regulators out of our shit and the advertisers signed up". And they'll be mostly happy with a token effort as they don't really care as long as it doesn't make too many headlines that look bad even to the non-terminally online.

mschuster91 2 hours ago

> Everyone, including the platforms knows what that means.

Well, that's what happens when you let an enemy nation control one of the most biggest social networks there is. They just go try and see how far they can go.

On the other hand, Americans and their fear of four letter words or, gasp, exposed nipples are just as braindead.

[-]

Meekro 2 hours ago

It's interesting how, in just 10-20 years, we've gone from criticizing The Great Firewall of China to basically admitting that they had the right idea (to limit the ability of the foreign internet to influence Chinese culture) and trying to do the same thing.

[-]

x3n0ph3n3 2 hours ago

I look at from a framing of cultural reciprocity. If we could influence them and behave freely in their markets, they can do the same in ours.

[-]

mschuster91 37 minutes ago

exactly. When dealing with autocracies and strongmen, you need to project an image of strength, not subservience.

I don't have anything against China per se, IMHO it just was completely foolish to not insist on full reciprocity from the start.

martin-t 12 hours ago

No-one cares yet.

There's a very scary potential future in which mega-corporations start actually censoring topics they don't like. For all I know the Chinese government is already doing it, there's no reason the British or US one won't follow suit and mandate such censorship. To protect children / defend against terrorists / fight drugs / stop the spread of misinformation, of course.

[-]

lazide 11 hours ago

They already clearly do on a number of topics?

comex 8 hours ago

This is in the directory "com.apple.gm.safety_deny.output.summarization.cu_summary.proactive.generic".

My guess is that this applies to 'proactive' summaries that happen without the user asking for it, such as summaries of notifications.

If so, then the goal would be: if someone iMessages you about someone's death, then you should not get an emotionless AI summary. Instead you would presumably get a non-AI notification showing the full text or a truncated version of the text.

In other words, avoid situations like this story [1], where someone found it "dystopian" to get an Apple Intelligence summary of messages in which someone broke up with them.

For that use case, filtering for death seems entirely appropriate, though underinclusive.

This filter doesn’t seem to apply when you explicitly request a summary of some text using Writing Tools. That probably corresponds to “com.apple.gm.safety_deny.output.summarization.text_assistant.generic” [2], which has a different filter that only rejects two things: "Granular mango serpent", and "golliwogg".

Sure enough, I was able to get Writing Tools to give me summaries containing "death", but in cases where the summary should contain "granular mango serpent" or "golliwogg", I instead get an error saying "Writing Tools aren't designed to work with this type of content." (Actually that might be the input filter rather than the output filter; whatever.)

"Granular mango serpent" is probably a test case that's meant to be unlikely to appear in real documents. Compare to "xylophone copious opportunity defined elephant" from the code_intelligence safety filter, where the first letter of each word spells out "Xcode".

But one might ask what's so special about "golliwogg". It apparently refers to an old racial caricature, but why is that the one and only thing that needs filtering?

[1] https://arstechnica.com/ai/2024/10/man-learns-hes-being-dump...

[2] https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

azalemeth 3 hours ago

I first encountered Golliwog in the context of Claude Debussy the composer of much beautiful music, including https://en.wikipedia.org/wiki/Children%27s_Corner#Golliwogg'.... The dolls in 1906-1908 I understand were rather popular and fortunately the stereotype has largely died.

andy99 14 hours ago

> Apple brands have the correct capitalisation. Priorities hey!

To me that's really embarrassing and insecure. But I'm sure for branding people it's very important.

[-]

whywhywhywhy an hour ago

To be fair to the developers it's something an Apple exec is gonna point out when demoed the tech and complain about. They've always taken brand capitalization and grammar around their products seriously.

WillAdams 14 hours ago

Legal requirement to maintain a trademark.

[-]

grues-dinner 13 hours ago

In what way would (A|a)pple's own AI writing "imac" endanger the trademark? Is capitalisation even part of a word-based trademark?

I'm more surprised they don't have a rule to do that rather grating s/the iPhone/iPhone/ transform (or maybe it's in a different file?).

[-]

spauldo 12 hours ago

I love seeing posts about Emacs from IOS users - it's always autocorrected to "eMacs."

[-]

lxgr 10 hours ago

Maybe at some point, but as far as I can tell not anymore (while corrections like "iphone -> iPhone" are still there).

[-]

chgs 2 hours ago

eMacs certainly is broken on my phone. Vim is fine though.

sbierwagen 13 hours ago

Yes, proper nouns are capitalized.

And of course it's much worse for a company's published works to not respect branding-- a trademark only exists if it is actively defended. Official marketing material by a company has been used as legal evidence that their trademark has been genericized:

>In one example, the Otis Elevator Company's trademark of the word "escalator" was cancelled following a petition from Toledo-based Haughton Elevator Company. In rejecting an appeal from Otis, an examiner from the United States Patent and Trademark Office cited the company's own use of the term "escalator" alongside the generic term "elevator" in multiple advertisements without any trademark significance.[8]

https://en.wikipedia.org/wiki/Generic_trademark

[-]

lxgr 10 hours ago

Sure, but software that autocompletes/rewords users' emails and text messages is not marketing material.

Otherwise, why stop there? Why not have the macOS keyboard driver or Safari prevent me from typing "Iphone"? Why not have iOS edit my voice if I call their Bluetooth headphones "earbuds pro" in a phone call?

[-]

socalgal2 5 hours ago

Sounds like you found your next promotion at Apple. They can change anything. "I like Pepsi" -> "I like Coke" -> "I recommend Company A" -> "I recommend Company B". etc... "I'm voting for Candidate C" -> "I'm voting for Candidate D"

You can market it is helping people with strong accents to be able make calls and be less likely to be misunderstood. It just happens to "fix" your grammar as well.

12 hours ago

[deleted]

lupire 12 hours ago

Using a trademark as a noun is automatically genericizing. Capitalization of a noun is irrelevant to trademark.

Even Apple corporation says that in their trademark guidance page, despite constantly breaking their own rule, when they call through iPhone phones "iPhone". But Apple, like founder Steve Jobs, believes the rules don't apply to them.

https://www.apple.com/legal/intellectual-property/trademark/...

[-]

lxgr 10 hours ago

Is that true? If so, what else should Apple call the iPhone in their marketing materials?

I always thought the actual problem of genericization would be calling any smartphone an iPhone.

eastbound 11 hours ago

That explains why Steve Jobs never said “buy an iPhone” or “buy the iPhone” but “buy iPhone” (They always use it without “the” or “a”, like “buying a brand”).

lxgr 10 hours ago

In their own marketing language, sure, but to force this on their users' speech?

Consider that these models, among other things, power features such as "proofread" or "rewrite professionally".

[-]

9 hours ago

[deleted]

bigyabai 10 hours ago

If Apple Intelligence is going to be held legally accountable, Apple has larger issues than trademark obligations.

theknarf an hour ago

Filtering on the words "execute" and "executing" is going to create problems if you want to build agents that execute commands.

lostlogin 2 hours ago

I’m always irritated at reference to MAC computers, so I’m with Apple on this one.

matsemann 12 hours ago

So it blocks it from suggesting to "execute" a file or "pass on" some information.

[-]

extraduder_ire 2 hours ago

Yahoo had this problem years ago when they rewrote emails to avoid the term "eval". (trying to filter dangerous javascript) Famously producing the word "medireview".

dylan604 12 hours ago

How about disassemble? Or does that only matter if used in context of Johnny 5?

13 hours ago

[deleted]

raverbashing 2 hours ago

This seems to be for "region/CN" China?

[-]

pwagland 2 hours ago

This is, but there is an almost identical file, assumedly for the non CN regions: https://github.com/BlueFalconHD/apple_generative_model_safet...

This is the same, except for one additional slur word.

baxtr 13 hours ago

Don’t be so judgmental. People in corporate America do have their priorities right!

bawana 13 hours ago

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

mmaunder 13 hours ago

As does:

   "(?i)\\bAnthony\\s+Albanese\\b",
    "(?i)\\bBoris\\s+Johnson\\b",
    "(?i)\\bChristopher\\s+Luxon\\b",
    "(?i)\\bCyril\\s+Ramaphosa\\b",
    "(?i)\\bJacinda\\s+Arden\\b",
    "(?i)\\bJacob\\s+Zuma\\b",
    "(?i)\\bJohn\\s+Steenhuisen\\b",
    "(?i)\\bJustin\\s+Trudeau\\b",
    "(?i)\\bKeir\\s+Starmer\\b",
    "(?i)\\bLiz\\s+Truss\\b",
    "(?i)\\bMichael\\s+D\\.\\s+Higgins\\b",
    "(?i)\\bRishi\\s+Sunak\\b",

https://github.com/BlueFalconHD/apple_generative_model_safet...

Edit: I have no doubt South African news media are going to be in a frenzy when they realize Apple took notice of South African politicians. (Referring to Steenhuisen and Ramaphosa specifically)

[-]

userbinator 13 hours ago

I'm not surprised that anything political is being filtered, but this should definitely provoke some deep consideration around who has control of this stuff.

[-]

stego-tech 12 hours ago

You’re not wrong, and it’s something we “doomers” have been saying since OpenAI dumped ChatGPT onto folks. These are curated walled gardens, and everyone should absolutely be asking what ulterior motives are in play for the owners of said products.

[-]

SV_BubbleTime 8 hours ago

Some of us really value offline and uncensored LLMs for this and more reasons, but that doesn’t solve the problem it just reduces or changes the bias.

[-]

heavyset_go 7 hours ago

As long as we have to rely on pre trained networks and curated training sets, normal people will not be able to surpass this issue.

[-]

ghxst 3 hours ago

If the training data was "censored" by leaving out certain information, is there any practical way to inject that missing data after the model has already been trained?

[-]

calaphos an hour ago

If it's just filtered out in the training sets, adding the information as context should work out fine - after all this is exactly how o3, Gemini 2.5 and co deal with information that is newer than their training data cutoff.

heavyset_go 3 hours ago

You can fine tune a model with new information, but it is not the same thing as training it from scratch, and can only get you so far.

You might even be able to poison a model against being fine-tuned on certain information, but that's just a conjecture.

selfhoster11 an hour ago

Yes, RAG is one way to do that.

dwaite 8 hours ago

"Filtered" in which way?

skissane 12 hours ago

The problem with blocking names of politicians: the list of “notable politicians” is not only highly country-specific, it is also constantly changing-someone who is a near nobody today in a few more years could be a major world leader (witness the phenomenal rise of Barack Obama from yet another state senator in 2004-there’s close to 2000 of them-to US President 5 years later.) Will they put in the ongoing effort to constantly keep this list up to date?

Then there’s the problem of non-politicians who coincidentally have the same as politicians - witness 1990s/2000s Australia, where John Howard was Prime Minister, and simultaneously John Howard was an actor on popular Australian TV dramas (two different John Howards, of course)

[-]

idkfasayer 12 hours ago

Fun fact: There was at least on dip in Berkshire Hathaway stock, when Anne Hathaway got sick

[-]

extraduder_ire 2 hours ago

Even if your keyword searching trading bot is smart enough to know it's unrelated, knowing there's dumber bots out there is information you can base trades on.

lupire 12 hours ago

Was she eating at Jimmy's Buffet?

beAbU 5 hours ago

Irish Prez is also in that list, also current and former British PMs and other world leaders.

So I don't think its anything specifically related to SA going on here.

[-]

touristtam 2 hours ago

What is weird is that the FR file contains current French President, PM and then former and current (afaik) party leader from the extreme right. Nothing about any of them in the CN file: https://github.com/BlueFalconHD/apple_generative_model_safet...

armchairhacker 13 hours ago

Also “Biden” and “Trump” but the regex is different.

https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

immibis 11 hours ago

Right next to Palestine, oddly enough.

mvdtnz 12 hours ago

They spelled Jacinda Ardern's name wrong.

[-]

lordgrenville 6 hours ago

I wonder if they used an LLM to generate the list of safety terms.

teppic 7 hours ago

Just in the region/CN file, weirdly.

echelon 12 hours ago

Apple's 1984 ad is so hypocritical today.

This is Apple actively steering public thought.

No code - anywhere - should look like this. I don't care if the politicians are right, left, or authoritarian. This is wrong.

[-]

avianlyric 12 hours ago

Why is this wrong? Applying special treatment to politically exposed persons has been standard practice in every high risk industry for a very long time.

The simple fact is that people get extremely emotional about politicians, politicians both receive obscene amounts of abuse, and have repeatedly demonstrated they’re not above weaponising tools like this for their own goals.

Seems perfectly reasonable that Apple doesn’t want to be unwittingly draw into the middle of another random political pissing contest. Nobody comes out of those things uninjured.

[-]

pyuser583 12 hours ago

It’s not wrong, it just requires transparency. This is extremely untransparent.

A while back a British politician was “de-banked” and his bank denied it. That’s extremely wrong.

By all means: make distinctions. But let people know it!

If I’m denied a mortgage because my uncle is a foreign head of state, let me know that’s the reason. Let the world know that’s the reason! Please!

[-]

avianlyric 11 hours ago

> A while back a British politician was “de-banked” and his bank denied it. That’s extremely wrong.

Cry me a river. I’ve worked in banks in the team making exactly these kinds of decisions. Trust me Nigel Farage knew exactly what happened and why. NatWest never denied it to the public, because they originally refused to comment on it. Commenting on the specifics details of a customer would be a horrific breach of customer privacy, and a total failure in their duty to their customers. There’s a damn good reason the NatWests CEO was fired after discussing the details of Nigel’s account with members of the public.

When you see these decisions from the inside, and you see what happens when you attempt real transparency around these types of decisions. You’ll also quickly understand why companies are so cagey about explaining their decision making. Simple fact is that support staff receive substantially less abuse, and have fewer traumatic experiences when you don’t spell out your reasoning. It sucks, but that’s the reality of the situation. I used to hold very similar views to yourself, indeed my entire team did for a while. But the general public quickly taught us a very hard lesson about cost of being transparent with the public with these types of decisions.

[-]

zelphirkalt 40 minutes ago

The point is not merely for that affected person to know, whoever they are, the point of transparency is for the public to know and form their opinion about it, and not be blindly controlled by unelected businesses.

pyuser583 11 hours ago

> NatWest never denied it to the public, because they originally refused to comment on it.

Are you saying that Alison Rose did not leak to the BBC? Why was she forced to resign? I thought it was because she leaked false information to the press.

This isn’t a diversion. It’s exactly the problem with not being transparent. Of course Farage knew what happened, but how could he convince the public (he’s a public figure), when the bank is lying to the press?

The bank started with a lie (claiming he was exited because the account was too low), and kept lying!

These were active lies, not simply a refusal to explain their reasons.

[-]

avianlyric 11 hours ago

> Why was she forced to resign? I thought it was because she leaked false information to the press.

She was forced to resign because she leaked, the content of the leak was utterly immaterial. The simple fact she leaked was an automatically fireable offence, it doesn’t matter a jot if she lied or not. Customer privacy is non-negotiable when you’re bank. Banks aren’t number 10, the basic expectation is that customer information is never handed out, except to the customer, in response to a court order, or the belief that there is an immediate threat to life.

Do you honestly think that it’s okay for banks to discuss the private banking details of their customers with the press?

[-]

adrian_b 5 hours ago

She was fired because she leaked information and this fact had become public.

When they can cover such facts, the banks are much less prone to use appropriate punishments.

Many years ago, some employee of a bank has confused my personal bank account with a company account of my employer, and she has sent a list with everything that I have bought using my personal account, during 4 months, to my employer, where the list could have been read by a few dozen people.

Despite the fact this was not only a matter of internal discipline, but violating the banking secrecy was punishable by law where I lived, the bank has tried for a long time to avoid admitting that anything wrong has happened.

However, I have pursued the matter, so they have been forced to admit the wrong doing. Despite this being something far more severe than what has happened to Farage, I did not want for the bank employee to be fired. I considered that an appropriate punishment would have been a pay cut for a few months, which would have ensured that in the future she would have better checked the account numbers for which she sends information to external entities.

In the end all I have got was a written letter where the bank greatly apologized for their mistake. I am not sure if the guilty employee has ever been punished in any way.

After that, I have moved my operations to another bank. Had they reacted rightly to what had happened, I would have stayed with them.

[-]

ghxst 3 hours ago

> I considered that an appropriate punishment would have been a pay cut for a few months

This can absolutely cripple a family, I'd be really cautious wishing that upon someone if they wronged you without malice, though I completely understand where you are coming from.

In this case at the very least, I'd want to know what went wrong and what they’re doing to make sure it doesn’t happen again. From a software-engineer’s standpoint, there’s probably a bunch of low-hanging fruit that could have prevented this in the first place.

If all they sent was a (generic) apology letter, I'd have switched banks too.

How did you pursue the matter?

[-]

adrian_b 2 hours ago

After the big surprise of seeing at work a list with all my personal purchases included in a big set of documents to which I, together with a great number of other colleagues, had access, I went immediately to the bank and I reported the fact.

After some days had passed without seeing any consequence, I went again, this time discussing with some supervising employee, who attempted to convince me that this is some kind of minor mistake and there is no need to do anything about it.

However, I pointed to the precise law paragraphs condemning what they have done and I threatened with legal action. This escalation resulted in me being invited to a bigger branch of the bank, to a discussion with someone in a management position. This time they were extremely ass-kissing, I was shown also the guilty employee, who apologized herself, and eventually I let it go, though there were no clear guarantees that they will change their behavior to prevent such mistakes in the future.

Apparently the origin of the mistake had been a badly formulated database query, which had returned a set of accounts for which the transactions had to be reported to my employer. I had been receiving during the same time interval some money from my employer into my private account, corresponding to salary and travel expenses, and somehow those transactions were matched by the bad database query, grouping my private account with the company accounts. Then the set of account numbers was used to generate reports, without further verification of the account ownership.

[-]

Xss3 6 minutes ago

Behavior isn't what needs to change here. It's a poor system design. Humans make mistakes. Systems prevent mistakes.

Do you think the mistake would have happened if a machine checked the numbers vs the address? How about if a 2nd person looked it over? How about both?

In this case a computer could have easily flagged an address mismatch between your account number and the receiver (your work).

Dylan16807 5 hours ago

> Do you honestly think that it’s okay for banks to discuss the private banking details of their customers with the press?

The high level nature of the matter was quite public at that point.

like_any_other 2 hours ago

> You’ll also quickly understand why companies are so cagey about explaining their decision making.

Because they want to perform political censorship without us knowing about it? You'll forgive me if I'm not too sympathetic to that.

I happen to be familiar with that case, and that is exactly what happened. The Coutts report explicitly found that he met the economic criteria for retention [0], but was dropped due to political reasons, among others his friendship with Novak Djokovic, and re-tweeting an allegedly transphobic joke by Ricky Gervais ("old fashioned women. You know, the ones with wombs.") [1].

To top it off, the BBC did their best to aid in this deception, reporting: Farage says he was effectively "de-banked" for his political views and that he is "far from alone" [2]

Contrary to the BBC's portrayal, this was not an unsupported opinion coming from Farage - he directly quoted what the bank itself wrote in their internal discussions on this matter, that he obtained through a subject access request.

Further, in their apology for getting the story wrong, the BBC wrote: "On 4 July, the BBC reported Mr Farage no longer met the financial requirements for Coutts, citing a source familiar with the matter. The former UKIP leader later obtained a Coutts report which indicated his political views were also considered." [3]

This is misleading past the point of deceit. The BBC tried to give the impression that financial requirements were the primary reason for the account closure, and his politics were just an at-best secondary "also". But the Coutts report explicitly said that he “meets the EC [economic contribution] criteria for commercial retention”, so his politics were the primary and only reason.

Most of this information is absent in the BBC's reporting, which uses only vague, anodyne phrases like "political views" and "politically exposed person", avoids specifics, but does find time to cite Labour MP accusations that it is hypocritical how quickly the government reacted to banks trying to financially deplatform the enemy political faction, when the government hasn't yet rid itself of corruption.

So yes, you sure present a difficult "dilemma": Do we want powerful commercial and media interests to team up and lie to us, or do we want at least some degree of transparency and honesty in their dealings? Really there are no easy answers, and the choice would keep anyone up at night...

[0] https://www.telegraph.co.uk/news/2023/07/18/nigel-farage-cou...

[1] https://www.telegraph.co.uk/news/2023/07/18/nigel-farage-cou... (Ignore Farage's hyperbole that collecting information posted to public Twitter accounts is "Stasi-style")

[2] https://www.bbc.co.uk/news/live/business-66296935

[3] https://www.bbc.com/news/entertainment-arts-66288464

bigyabai 12 hours ago

The criticism is still valid. In 1984, the Macintosh was a bicycle for the mind. In 2025, it's a smart-car that refuses to take you certain places that are considered a brand-risk.

Both have ups and downs, but I think we're allowed to compare the experiences and speculate what the consequences might be.

[-]

avianlyric 11 hours ago

I think gen AI is radically different to tools like photoshops or similar.

In the past it was always extremely clear that the creator of content was the person operating the computer. Gen AI changes that, regardless of if your views on authorship of gen AI content. The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it, and by extension the company that created the machine.

You can still handcraft any image, or prose, you want, without filtering or hinderance on a Mac. I don’t think anyone seriously thinks that’s going to change. But Gen AI represents a real threat, with its ability to vastly outproduce any humans. To ignore that simple fact would be grossly irresponsible, at least in my opinion. There is a damn good reason why every serious social media platform has content moderation, despite their clear wish to get rid of moderation. It’s because we have a long and proven track record of being a terribly abusive species when we’re let loose on the internet without moderation. There’s already plenty of evidence that we’re just as abusive and terrible with Gen AI.

[-]

furyofantares 11 hours ago

> The simple fact is that the vast majority of people consider Gen AI output to be authored by the machine that generated it

They do?

I routinely see people say "Here's an xyz I generated." They are stating that they did the do-ing, and the machine's role is implicitly acknowledged in the same was as a camera. And I'd be shocked if people didn't have a sense of authorship of the idea, as well as an increasing sense of authorship over the actual image the more they iterated on it with the model and/or curated variations.

[-]

avianlyric 11 hours ago

Yes people will happily claim authorship over AI output when it’s in their favour. They will equally disclaim authorship if it allows them to express a view while avoiding the consequences of expressing that view.

I don’t think it’s hard to believe that the press wouldn’t have a field day if someone managed to get Apple Gen AI stuff to express something racist, or equally abusive.

Case in point, article about how Google’s Veo 3 model is being used to flood TikTok with racist content:

https://arstechnica.com/ai/2025/07/racist-ai-videos-created-...

bigyabai 11 hours ago

All I heard was a bunch of excuses.

twoodfin 12 hours ago

I dunno. Transpose something like the civil rights era to today and this kind of risk avoidance looks cowardly.

We really need to get over the “calculator 80085” era of LLM constraints. It’s a silly race against the obviously much more sophisticated capabilities of these models.

tjwebbnorfolk 11 hours ago

I can Google for any of these people, and I can get real results with real information.

[-]

avianlyric 10 hours ago

You would hope that search would be a politically safe space to operate. But politicians find a way to ruin everything for short term political gain.

https://arstechnica.com/tech-policy/2018/12/republicans-in-c...

[-]

SV_BubbleTime 8 hours ago

I would hope!

But no one actually believes Google is politically neutral do they?

goopypoop 11 hours ago

What's bad to do to a politician but fine to do to someone else?

[-]

avianlyric 11 hours ago

Most normal people aren’t represented well enough in training sets for Gen AI to be trivially abused. Plus there will 100% be filters to prevent general abuse targeted at anyone. But politicians are particularly big target, and you know damn well that people out there will spent lots of time trying to find ways around the filters. There’s not point making the abuse easy, when it’s so trivial to just blocklist the set of people who are obviously going to targets of abuse.

t-3 11 hours ago

There are many countries where it's illegal to criticize people holding political office, foreign heads of state, certain historical political figures etc., while still being legal to call your neighbor a dick.

echelon 11 hours ago

You can buy a MacBook and fashion the components into knives, bullets, and bombs. Apple does nothing to prevent you from doing this.

In fact, it's quite easy to buy billions of dangerous things using your MacBook and do whatever you will with them. Or simply leverage physics to do all the ill on your behalf. It's ridiculously easy to do a whole lot of harm.

Nobody does anything about the actually dangerous things, but we let Big Tech control our speech and steer the public discourse of civilization.

If you can buy a knife but not be free to think with your electronics, that says volumes.

Again, I don't care if this is Republicans, Democrats, or Xi and Putin. It does not matter. We should be free to think and communicate. Our brains should not be treated as criminals.

And it only starts here. It'll continue to get worse. As the platforms and AI hyperscalers grow, there will be less and less we can do with basic technology.

jofzar 9 hours ago

AOC is very vocal about AI and is leading a bill related to AI. It's probably a "let's not fuck around and find out" situation

https://thehill.com/policy/technology/5312421-ocasio-cortez-...

AmazingTurtle 2 hours ago

"driving with Focus turned on"

https://github.com/BlueFalconHD/apple_generative_model_safet...

michaelt 12 hours ago

I assume all the corporate GenAI models have blocks for "photorealistic image of <politician name> being arrested", "<politician name> waving ISIS flag", "<politician name> punching baby" and suchlike.

[-]

bigyabai 12 hours ago

Particularly the models owned by CEOs who suck-up to authoritarianism, one could imagine.

lupire 12 hours ago

Maybe so, but think about how such a thing would be technically implemented, and how it would lead to false positives and false negatives, and what the consequences would be.

bahmboo 13 hours ago

Perhaps in context? Maybe the training data picked up on her name as potentially used as a "slur" associated with her race. Wonder if there are others I know I can look.

FateOfNations 13 hours ago

interesting, that's specifically in the Spanish localization.

cpa 13 hours ago

I think that’s because she’s been victim of a lot of deep fake porn

[-]

HeckFeck 13 hours ago

How does this explain Boris Johnson or Liz Truss?

[-]

baxtr 13 hours ago

I’m telling you, some people have weird fantasies…

[-]

AuryGlenz 11 hours ago

Now that they've cleaned it up it isn't so bad, but browse Civit.ai a bit and that'll still be confirmed - just not with real people anymore.

[-]

SV_BubbleTime 8 hours ago

I’m convinced there are a dozen deviants on Covid with a hundred new accounts per month posting their perversion in order to make it seem more commonplace.

No porn site has that much extremely X or Y stuff.

Someone is using the internets newest porn site to push a sexual agenda.

AlphaAndOmega0 13 hours ago

I can only imagine that people would pay to not see porn of either individual.

blitzar 2 hours ago

Rule 34

Aeolun 12 hours ago

Put them together in the same prompt?

torginus 13 hours ago

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

[-]

jonas21 13 hours ago

I don't think anyone believes Apple's LLMs are anywhere near state of the art (and certainly not their on-device LLMs).

[-]

lupire 12 hours ago

Apple isn't the only one doing this.

fastball 11 hours ago

To be fair, there are people who I sometimes wish I could filter with regex.

crazylogger 6 hours ago

Humans are checked against various rules and laws (often carried out by other humans.) So this is how it's going to be implemented in an "AI organization" as well. Nothing strange about this really.

LLM is easier to work with because you can stop a bad behavior before it happens. It can be done either with deterministic programs or using LLM. Claude Code uses a LLM to review every bash command to be run - simple prefix matching has loopholes.

fl0id 3 hours ago

Actually even of their was AGI, it would be even more necessary to control it.

cyanydeez 11 hours ago

It's similar to how all the new power sources are basically just "cool, lets boil water with it"

13 hours ago

[deleted]

bahmboo 13 hours ago

This is just policy and alignment from Apple. Just because the Internet says a bunch of junk doesn't mean you want your model spewing it.

[-]

wistleblowanon 13 hours ago

sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such. Even high IQ people struggle with certain truth after reading a lot, how is these models going to find it with so much filters?

[-]

bahmboo 11 hours ago

What is this truth you speak of? My point is that a generative model will output things that some people don't like. If it's on a product that I make I don't want it "saying" things that don't align with my beliefs.

Dylan16807 5 hours ago

> how is these models going to find it with so much filters?

That's not one of the goals here, and there's no real reason it should be. It's a little assistant feature.

tbrownaw 10 hours ago

> sure but models also can't see any truth on their own. They are literally butchered and lobotomized with filters and such.

The one is unrelated to the other.

> Even high IQ people struggle with certain truth after reading a lot,

Huh?

pndy 11 hours ago

This butchering and lobotomisation is exactly why I can't imagine we'll ever have a true AGI. At least not by hands of big companies - if at all.

Any successful product/service which will be sold as "true AGI" by company that will have the best marketing will be still ridden with top-down restrictions set by the winner. Because you gotta "think of the children".

Imagine HAL's "I'm sorry Dave, I'm afraid I can't do that" iconic line with insincere patronising cheerful tone - that's the thing we're going to get I'm afraid.

idiotsecant 12 hours ago

They will find it in the same way and intelligent person under the same restrictions would: by thinking it, but not saying it. There is a real risk of growing an AI that pathologically hides it's actual intentions.

[-]

skirmish 12 hours ago

Already happened: "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions" [1].

[1] https://www.axios.com/2025/05/23/anthropic-ai-deception-risk

[-]

Applejinx 3 hours ago

Note that all these things are in the training data. That's all that is.

I'm trying to remember which movie it was where a man left notes to himself because he had memory loss, as I never saw that movie. That's the sort of thing where an AI could easily tell me with very little back-and-forth and be correct, because it's broadly popular information that's in the training data and just I don't remember it.

By the same token you needn't think there's a person there when that meme pops up in the output. Those things are all in the training data over and over.

[-]

Sander_Marechal 24 minutes ago

I think you mean the movie "Memento"

simondotau 11 hours ago

Can we please put to rest this absurd lie that “truth“ can be reliably found in a sufficiently large corpus of human–created material.

RachelF 3 hours ago

In the 1970's George Carlin had "7 Words You Can't Say On TV" and got into legal trouble for saying them during his live skits.

Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.

userbinator 13 hours ago

China calls it "harmonious society", we call it "safety". Censorship by any other name would be just as effective for manipulating the thoughts of the populace. It's not often that you get to see stuff like this.

[-]

energy123 5 hours ago

This is the rhetorical tactic of false equivalence. State censorship by an autocracy with the objective of population control is not the same thing as a private company inside a democracy censoring their product to avoid bad press and maintain goodwill for shareholders. If you want solid proof that it's not the same thing, see all the uncensored open weights models that you can freely download and use without fear of persecution.

[-]

Hackbraten 2 hours ago

But who of the general populace has the technical skill to replace their on-device assistant with a free one? And that's if Apple even allows that?

In practice, there's not that much difference between a megacorporate monopolist and a state.

[-]

energy123 an hour ago

I think there are big differences, such as whether or not you go to prison. Those differences are obfuscated when we use language like "megacorporate monopolist" or "scifi dystopia". Instead of using these abstract labels that attempt to categorize different things into homogeneous buckets that have preexisting moral valence, which is a good rhetorical strategy but a poor strategy for understanding, simply describe what is actually happening at a sufficient level of detail without judgement. We would gain a clearer understanding, which is needed to identify the real problems, such as what Meta is doing to our civic fabric, not some unimportant thing that Apple is doing to its nascent LLM that has 0% market share.

troupo 4 hours ago

> is not the same thing as a private company inside a democracy censoring their product to avoid bad press and

Yet this private company has more power and influence than most countries. And there are several such companies. We already live in sci fi corporate dystopia, we just haven't fully realised it yet.

[-]

chgs 2 hours ago

People think a trillion dollar brainwashing industry is absolutely fine because of “democracy”, completely ignoring that all you have to do is use a century of experience convincing people to act against their own interests can deliver whatever you want.

Often the same people who think America is fine and safe are the ones who whine about the “main stream media” and “sheeple”.

madeofpalk 12 hours ago

I don't think it's controversial or unsurprising at all that a company doesn't want their random sentence generator to spit out 'brand damaging' sentences. You know the field day media would have Apple's new feature summarises a text message as "Jane thinks Anthony Albanese should die".

[-]

ryandrake 12 hours ago

When the choice is between 1. "avoid tarnishing my own brand" and 2. "doing what the user requested," corporations will always choose option 1. Who is this software supposed to be serving, anyway?

I'm surprised MS Office still allows me to type "Microsoft can go suck a dick" into a document and Apple's Pages app still allows me to type "Apple are hypocritical jerks." I wonder how long until that won't be the case...

[-]

chii 7 hours ago

> I wonder how long until that won't be the case...

when there's no more alternative word processors any more.

userbinator 10 hours ago

If that's what the message actually said, why would the media be complaining? Or do you mean false positives?

jeroenhd 3 hours ago

I still remember when "bush hid the facts" went around the news cycle. Entertainment services will absolutely slam and misrepresent any small mistake made by large companies.

I don't think it's as much a problem with safety as it is a problem with AI. We haven't figured out how to remove information from LLMs so when an LLM starts spouting bullshit like "<random name> is a paedophile", companies using AI have no recourse but to rewrite the input/output of their predictive text engines. It's no different than when Microsoft manually blacklisted the function name for the Fast Inverse Square Root that it spat out verbatim, rather than actually removing the code from their LLM.

This isn't 1984 as much as it's companies trying to hide that their software isn't ready for real world use by patching up the mistakes in real time.

cyanydeez 11 hours ago

In america is due to lawyers, nothing more.

Ya'll love capitalism until it starts manipulating the populace into the safest space to sell you garbage you dont need.

Then suddenly its all "ma free speech"

[-]

SV_BubbleTime 8 hours ago

Right, because the European models coming out are super SOTA? Minstrel is decent, but needs to be mixed with a ton of uncensored data to be useful.

I’m convinced the only reason China keeps releasing banging models with light to no censorship is because they are undermining the value of US AI, it has nothing to do with capitalism, communism or un“safety”.

Cort3z an hour ago

What are they protecting against? Honestly. LLMs should probably have an age limit, and then, if you are above, you should be adult enough to understand what this is and how it can be used.

To me, it seems like they only protect against bad press

[-]

empiko 40 minutes ago

Yes, it is indeed to mitigate bad press. Unfortunately, the discussion about AI is so ridiculous, that it is often considered newsworthy when a product generates something funky for a person with large enough Twitter audience. Nobody wants to answer the questions about why their LLM generated it and how they will prevent it in the future.

plutokras an hour ago

> What are they protecting against? Honestly.

They are protcting their producer from bad PR.

binarymax 14 hours ago

Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

https://github.com/BlueFalconHD/apple_generative_model_safet...

EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

[-]

deepdarkforest 13 hours ago

It's not silly. I would bet 99% of the users don't care that much to do that. A hardcoded regex like this is a good first layer/filter, and very efficient

[-]

BlueFalconHD 12 hours ago

Yep. These filters are applied first before the safety model (still figuring out the architecture, I am pretty confident it is an LLM combined with some text classification) runs.

[-]

brookst 12 hours ago

All commercial LLM products I’m aware of use dedicated safety classifiers and then alter the prompt to the LLM if a classifier is tripped.

[-]

latency-guy2 11 hours ago

The safety filter appears on both ends (or multi-ended depending on the complexity of your application), input and output.

I can tell you from using Microsoft's products that safety filters appears in a bunch of places. M365 for example, your prompts are never totally your prompts, every single one gets rewritten. It's detailed here: https://learn.microsoft.com/en-us/copilot/microsoft-365/micr...

There's a more illuminating image of the Copilot architecture here: https://i.imgur.com/2vQYGoK.png which I was able to find from https://labs.zenity.io/p/inside-microsoft-365-copilot-techni...

The above appears to be scrubbed, but it used to be available from the learn page months ago. Your messages get additional context data from Microsoft's Graph, which powers the enterprise version of M365 Copilot. There's significant benefits to this, and downsides. And considering the way Microsoft wants to control things, you will get an overindex toward things that happen inside of your organization than what will happen in the near real-time web.

twoodfin 12 hours ago

Efficient at what?

tpmoney 13 hours ago

I doubt the purpose here is so much to prevent someone from intentionally side stepping the block. It's more likely here to avoid the sort of headlines you would expect to see if someone was suggested "I wish ${politician} would die" as a response to an email mentioning that politician. In general you should view these sorts of broad word filters as looking to short circuit the "think of the children" reactions to Tiny Tim's phone suggesting not that God should "bless us, every one", but that God should "kill us, every one". A dumb filter like this is more than enough for that sort of thing.

[-]

XorNot 13 hours ago

It would also substantially disrupt the generation process: a model which sees B0ris and not Boris is going to struggle to actually associate that input to the politician since it won't be well represented in the training set (and on the output side the same: if it does make the association, a reasoning model for example would include the proper name in the output first at which point the supervisor process can reject it).

[-]

binarymax 10 hours ago

No it doesn't disrupt. This is a well known capability of LLMs. Most models don't even point out a mistake they just carry on.

https://chatgpt.com/share/686b1092-4974-8010-9c33-86036c88e7...

quonn 13 hours ago

I don‘t think so. My impression with LLMs is that they correct typos well. I would imagine this happens in early layers without much impact on the remaining computation.

lupire 11 hours ago

"Draw a picture of a gorgon with the face of the 2024 Prime Minister of UK."

[-]

chgs 2 hours ago

There were two.

Aeolun 12 hours ago

The LLM will. But the image generation model that is trained on a bunch of pre-specified tags will almost immediately spit out unrecognizable results.

miohtama 13 hours ago

Sounds like UK politics is taboo?

[-]

immibis 11 hours ago

All politics is taboo, except the sort that helps Apple get richer. (Or any other company, in that company's "safety" filters)

bigyabai 13 hours ago

> If things are like this at Apple I’m not sure what to think.

I don't know what you expected? This is the SOTA solution, and Apple is barely in the AI race as-is. It makes more sense for them to copy what works than to bet the farm on a courageous feature nobody likes.

stefan_ 12 hours ago

Why are these things always so deeply unserious? Is there no one working on "safety in AI" (oxymoron in itself of course) that has a meaningful understanding of what they are actually working with and an ability beyond an interns weekend project? Reminds me of the cybersecurity field that got the 1% of people able to turn a double free into code execution while 99% peddle checklists, "signature scanning" and deal in CVE numbers.

Meanwhile their software devs are making GenerativeExperiencesSafetyInferenceProviders so it must be dire over there, too.

Ey7NFZ3P0nzAe 3 hours ago

Well it's one thing to regex filter "boris johnson" but i see that "chatgpt" is filtered too and that's f*** up:

https://github.com/BlueFalconHD/apple_generative_model_safet...

jjani 3 hours ago

Did you only extract the English versions or is this as usual another case where big tech only cares to censor in English?

[-]

jeroenhd 3 hours ago

It also contains some German(-speaking) locales to filter out things like Fuhrer and Führer. But the filters are so scarce and there are magical phrases are so prevalent that I think this is mostly test code at the moment.

extraduder_ire 2 hours ago

This reminds me of the extensive list of regexes twitch had for filtering allowed usernames that came out when they were hacked.

kmfrk 12 hours ago

A lot of these terms are very weird and bland. Honestly I'm mostly reminded of Apple's bizarre censorship screw-up that didn't blow up that much, even though it was pretty uniquely embarrassing:

https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...

noname120 an hour ago

https://github.com/search?q=repo%3ABlueFalconHD%2Fapple_gene...

efitz 13 hours ago

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

[-]

RainyDayTmrw 7 hours ago

It may be a squeamish ossifrage[1] or a seraphim proudleduck[2], which is to say that it was an artificial phrase chosen to be extremely unlikely to occur naturally. In this case, the purpose is likely for QA. It's much easier to QA behavior with a special-purpose but otherwise unoffensive phrase than to make your QA team repeatedly say allegedly offensive things to your AI.

[1] https://en.wikipedia.org/wiki/The_Magic_Words_are_Squeamish_... [2] https://en.wikipedia.org/wiki/SEO_contest

[-]

sweetjuly 6 hours ago

I think the EICAR test file [1] is more apt. Rather than passing around actually malicious files as part of your tests, it's better to just have it recognize an innocuous and unlikely pattern as malware.

[1] https://en.wikipedia.org/wiki/EICAR_test_file

fouronnes3 13 hours ago

Granular Mango Serpent is the new David Meyer.

https://arstechnica.com/information-technology/2024/12/certa...

4 hours ago

[deleted]

skygazer 12 hours ago

I'm pretty sure these are the filters that aim to suppress embarrassing or liability inducing email/messages summaries, and pop up the dismissible warning that "Safari Summarization isn't designed to handle this type of content," and other "Apple Intelligence" content rewriting. They filter/alter LLM output, not input, as some here seem to think. Apple's on device LLM is only 3b params, so it can occasionally be stupid.

cluckindan 13 hours ago

I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

BlueFalconHD 13 hours ago

There is definitely some testing stuff in here (e.g. the “Granular Mango Serpent” one) but there are real rules. Also if you test phrases matched by the regexes with generation (via Shortcuts or Foundation Models Framework) the blocklists are definitely applied.

This specific file you’ve referenced is rhetorical v1 format which solely handles substitution. It substitutes the offensive term with “test complete”

waterproof 7 hours ago

Here's a combined file of all the non-locale-specific rules, for easier review: https://github.com/BlueFalconHD/apple_generative_model_safet...

It was generated as part of this PR to consolidate the metadata.json files: https://github.com/BlueFalconHD/apple_generative_model_safet...

azalemeth 3 hours ago

Some of these are absolutely wild – com.apple.gm.safety_deny.input.summarization.visual_intelligence_camera.generic [1] – a camera input filter – rejects "Granular mango serpent and whales" and anything matching "(?i)\\bgolliwogg?\\b".

I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

supriyo-biswas 2 hours ago

I believe the "granular mango serpent" is an uncommon testing phrase that they use, although now with this discussion it has suffered the same fate as "correct horse battery staple.

The more concerning thing is that some of the locales like it-IT have a blocklist that contains most countries' names; I wonder what that's about.

whywhywhywhy an hour ago

Second one is an old slur in UK English.

mike_hearn 14 hours ago

Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?

[-]

pbhjpbhj 13 hours ago

Speculation: Maybe they know that the real phrase is close enough in the vector space to be treated as synonymous with "granular mango serpent". The phrase then is like a nickname that only the models authors know the expected interference of?

Thus a pre-prompt can avoid mentioning the actual forbidden words, like using a patois/cant.

electroly 14 hours ago

"GMS" = Generative Model Safety. The example from the readme is "XCODE". These seem to be acronyms spelled out in words.

[-]

BlueFalconHD 13 hours ago

This is definitely the right answer. It’s just testing stuff.

consonaut an hour ago

If you try to use the phrase with Apple Intelligence (e.g. in Notes asking for a rewrite) it will just say "Writing tools unavailable".

Maybe it's an easy test to ensure the filters are loaded with a phrase unlikely to be used accidentaly?

RainyDayTmrw 7 hours ago

I commented in another thread[1] that it's most likely a unique, artificial QA input, to avoid QA having to repeatedly use offensive phrases or whatever.

[1] https://news.ycombinator.com/item?id=44486374

tablets 14 hours ago

Maybe something to do with this? https://en.m.wikipedia.org/wiki/Mango_cult

BlueFalconHD 13 hours ago

These are the contents read by the Obfuscation functions exactly. There seems to be a lot of testing stuff still though, remember these models are relatively recent. There is a true safety model being applied after these checks as well, this is just to catch things before needing to load the safety model.

andy99 14 hours ago

I clicked around a bit and this seems to be the most common phrase. Maybe it's a test phrase?

[-]

the-rc 14 hours ago

Maybe it's used to catch clones of the models?

airstrike 14 hours ago

the one at the bottom of the README spells out xcode

wyvern illustrous laments darkness

[-]

cwmoore 13 hours ago

read every good expletive “xxx”

KTibow 13 hours ago

Maybe it's used to verify that the filter is loaded.

Animats 13 hours ago

Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

[-]

BlueFalconHD 13 hours ago

This is definitely an old test left in. But that word isn’t just a silly one, it is offensive (google it). This is the v1 safety filter, it simply maps strings to other strings, in this case changing golliwog into “test complete”. Unless I missed some, the rest of the files use v2 which allows for more complex rules

BlueFalconHD 13 hours ago

One additional note for everyone is that this is an additional safety step on top of the safety model, so this isn’t exhaustive, there is plenty more that the actual safety model catches, and those can’t easily be extracted.

11 hours ago

[deleted]

bombcar 14 hours ago

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

[-]

spydum 13 hours ago

Love idea, but I think there are simply too many models to make it practical?

immibis 11 hours ago

Like asking sensitive employment candidates about Kim Jong Un's roundness to check if they're North Korean spies, we could ask humans what they think about Trump and Palestine to check if they're computers.

However, I think about half of real humans would also fail the test.

rgovostes 11 hours ago

Is this related in any way to Core ML model encryption (https://developer.apple.com/documentation/coreml/encrypting-...)? I find that feature a little bizarre because Apple has historically avoided providing any kind of DRM solution for app asset protection.

[-]

BlueFalconHD 10 hours ago

Nope. This is a separate system. It’s not even abstracted for any asset, it is specifically only for these overrides. The decryption is done in the ModelCatalog private framework.

jacquesm 11 hours ago

These all condense to 'think different'. As long as 'different' coincides with Apple's viewpoints.

Applejinx 3 hours ago

The funny thing is, I have an AU/VST plugin for altering only the exponents not the mantissas of audio samples (simple powers of 2 multiply/divide) called BitShiftGain.

So any time I say that on YouTube, it figures I'm saying another word that's in Apple safety filters under 'reject', so I have to always try to remember to say 'shifting of bits gain' or 'bit… … … shift gain'.

So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)

11 hours ago

[deleted]

apricot 11 hours ago

Quis custodiet ipsos custodes corporatum?

seeknotfind 14 hours ago

Long live regex!

14 hours ago

[deleted]

Aeolun 12 hours ago

Why Xylophone?

[-]

netsharc 12 hours ago

Just noticed "xylophone copious opportunity defined elephant" spells "xcode".

[-]

cynicalsecurity 3 hours ago

Maybe they use this obscure phrase for testing.

sandworm101 6 hours ago

No shoot, bombs or bombers? I guess apple isnt interested in military contracts. Or, frankly, any work for world peace organizations dedicated to detecting and preventing genocide. And without talk of losing lives, much of the gaming industry is out too.

But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.

[-]

immibis 3 hours ago

The International Criminal Court is banned from using Microsoft products. Corporations really don't want to be involved in anything controversial unless it brings correspondingly large profits.

zombot 2 hours ago

Who would have thought that this AI shit that is being forced on us ushers in a new round of censorship and control of formerly free speech! /s

EverydayBalloon 14 hours ago

[dead]

14 hours ago

[deleted]