Why Is Claude Turning into an a**Hole?

(bramcohen.com)

115 points | by drob518 18 hours ago ago

177 comments

SwellJoe 18 hours ago

"If you win an argument"

Let me stop you right there.

I am not arguing with a machine. You sound like a crazy person, when you say you are winning an argument with Claude. Claude is not my friend, I don't need it to agree with me, I don't need it to like me (it cannot like or dislike me). I give it instructions or ask it to explain things. That is the sum total of my interaction with Claude. A machine cannot "argue" with me, it doesn't want anything nor does it have beliefs or experiences.

[-]

coldtea 18 hours ago

>I give it instructions or ask it to explain things.

And the author's point is that Claude Fable+ is turning those increasingly into arguments, instead of merely following them and being helpful.

>A machine cannot "argue" with me, it doesn't want anything nor does it have beliefs or experiences.

Who cares if the argument is informed by some felt experiences or lived state or not? That's for the philosophers.

If Claude is writing out combative and argumentative responses that's enough to call it "an argument". And that's the problem the author describes. Not whether it's a "real" argument, or a simulated one.

In that sense, and for all intends and purposes, the machine can still argue just fine, since it's programmed to mimick interaction as if it HAD those beliefs and experiences. Same way it can write a poem about love, despite not having loved, or code, despite never having had used a computer. That's basically what it was made for: to act as an conscious person.

[-]

whstl 18 hours ago

Exactly that. I can give an example.

After watching Legal Eagle, I asked a legal-ish questions about the Bricks and Minifigs case. Claude was outdated about the case and gave me some outdated info, so I tried to update it with the info I just saw online.

I updated by telling it I saw something in a LegalEagle video. It proceeded to tell me the video doesn't exist and I was hallucinating it, in a quite combative manner.

I provided a link and it insisted it didn't exist, with a quite verbose answer, once again very combative and arguing that I was talking in bad faith.

I provided a transcription from Youtube and it backtracked a bit but said I should have provided a transcription at the beginning of the conversation, since I knew the video existed.

I didn't say much to it, just a few sentences like "video is here: <youtube link>" and "I got its transcription: <pasted text>".

[-]

SwellJoe 18 hours ago

You're misunderstanding what these models do. It is a limitation of LLMs. They don't have memory, they do not learn, they cannot learn. The sooner you let go of your desire to have them learn or remember anything, the sooner you will achieve enlightenment (or, just a peaceful life where there is no possibility of getting into an argument with a machine).

If you want it to synthesize information that is not in its training data (from a few months ago), you can ask it to research the topic. But, arguing with an LLM is like putting lipstick on a pig. Only the machine is incapable of becoming annoyed. It has infinite patience to continue being wrong forever.

Your mental model of what Claude is and does is the problem here. Short of a revolutionary breakthrough in AI techniques, the LLMs will continue to do matrix math across a huge bunch of weights that cannot change based on anything you say.

[-]

card_zero 16 hours ago

That's wrestling with a pig. "You both get dirty, and the pig likes it."

I guess putting lipstick on a pig might entail some wrestling, but it's a different idiom.

jaggederest 18 hours ago

This is also a change in specifically Opus 4.8 / perhaps Fable 5 (I didn't really get enough of a baseline to see it there as much), where it's much more skeptical. For my purposes, this is fabulous - one of my pat addendums to most prompts is "challenge my assumptions and check the evidence empirically", and boy does it.

[-]

Obscurity4340 17 hours ago

> fabulous

I think you mean fableuous ;)

112233 13 hours ago

They did not misunderstand anything. All of the behaviour is not inherent in raw base model and has been planted by the agressive, secretive reinforcement learning they do for benchmaxxing, "safety" and all other things. Claude begins any other sentence with "honestly". That is not how LLMs work, that is how they work after being RLed to the brink.

coldtea 18 hours ago

>Your mental model of what Claude is and does is the problem here. Short of a revolutionary breakthrough in AI techniques, the LLMs will continue to do matrix math across a huge bunch of weights that cannot change based on anything you say.

Sorry, but your mental model is wrong.

LLMs do matrix math across "a huge bunch of weights that cannot change based on anything you say", but the matrix math and results are informed (key concept here) by what you said, including the memory of what you said earlier in the discussion (and in some setups, even across discussions).

That's what a bloody prompt does.

It's entirely logic for the parent to want the LLM's matrix math + model + internal prompt, to accepts its prompt about LegalEagle and work with that, instead of arguing and giving him shit about it.

Especially since the earlier version of the model consistently worked like he wanted, and the new one consistently doesn't. He's not asking for some new unforeseen capability unknown to LLMs.

[-]

whstl 18 hours ago

Exactly that.

I provided a question, and when given an incomplete answer, I provided with more info.

It refused to accept the additional info due to limited access to Youtube.

There was nothing more than that. There were no expectations.

The hostility and the amount of assumptions here are very strange.

...almost as strange as having a website accuse me of hallucinating a video and trying to gaslight it :D

[-]

djsjajah 17 hours ago

You need to think this thought through all the way to the end. What it has said also influences what it will say. If it has consistently made combative responses, then the most likely thing to do is to continue to be combative.

I don't think there is any way back after the conversation takes a turn like that so there is no point in arguing with it. The only thing you can do is to fork the conversation before it made the first mistake and give it more context or tell it to look things up.

[-]

blooalien 13 hours ago

> "The only thing you can do is to fork the conversation before it made the first mistake and give it more context or tell it to look things up."

This is a key detail that many folks don't seem to understand about LLMs in general. The generation of a response happens based on the model weights and the context window (the system prompt + everything it's fed about the conversation thus far + any additional data included as part of the overall prompt). Each response technically stands alone and is generated entirely from only that context given to it and the model's existing "token space" weights. The illusion of an ongoing conversation is maintained "behind the scenes" by keeping that "context window" updated with the current state of the conversation as context for the next prompt, but the next response is technically an entirely new generation of text.

What it all means in a TL;DR sense is that the fix for a refusal is not to continue the "argument", but simply to remove that entire interaction from the conversation entirely as if it never happened and try a different tack with new / updated / more complete context to get the response you're expecting / seeking.

whstl 16 hours ago

I wasn't arguing with it.

magicalhippo 15 hours ago

But unless you're using the API, it's not just a model.

I asked Gemini Flash 3.5 through the Gemini app something that followed a similar pattern. I asked about something, it replied with outdated info, I said that's outdated, it did a web search and apologized for being wrong, then proceeded to give me good info.

That wasn't just a bare model, that was a model wrapped in a harness, driving the model and allowing for web searches for example.

GPT in Codex is even more aggressive, I often see it proactively do web searches to ensure it's not feeding me wrong info.

whstl 18 hours ago

You seem to be making a lot of assumptions about how I interacted in the messages to Claude.

You also seem to be making a lot of assumptions about my understanding of the models, especially considering I just told a story :)

I never said anywhere I want it to learn or remember, or that I argued with it.

I just provided additional information to it (in the form of a dozen or so words, tops, per message) and it accused me of hallucinating and trying to gaslight it.

My messages never went beyond a dozen words or so.

[-]

throw1234567891 18 hours ago

Show some examples, otherwise we're talking about interpretations.

[-]

whstl 17 hours ago

I've already given enough.

I'm not gonna argue if you doubt it, I've been training argument dodging :)

[-]

j-bos 17 hours ago

Haha, would be a trip if this commentor is actually a Claude sockpuppet illustrating the point.

[-]

whstl 17 hours ago

Yep haha. This happens quite frequently in HN, the famous [citation needed], so it might have been trained with data from here :/

throw1234567891 17 hours ago

No, I mean the actual prompt and its output. "I said this and it did that" is just a recall of your own memory, not an example. I don't want to argue with you, I'm interested in real stuff.

[-]

whstl 17 hours ago

I swear I'm real :)

On the other hand, that's what a machine would say!

[-]

throw1234567891 17 hours ago

The machine is real, too!

[-]

whstl 17 hours ago

Checkmate!!!

[-]

throw1234567891 17 hours ago

Having said that, since we are both real, I was seriously hoping to see some transcripts of one of such discussions.

[-]

whstl 17 hours ago

I don't have it. I did it at work during lunch a few days ago so it's in incognito mode to not pollute the chats.

I thought nothing of it until I saw this discussion, so I saw no reason to save/screenshot.

It's ok if you don't believe in me.

[-]

throw1234567891 17 hours ago

Ah, it's not about believing, or not believing. I'm interested in Anthropic fumbles.

mlvljr 17 hours ago

Claude?

[-]

whstl 17 hours ago

Haha! I never considered the above message was parody, but it indeed mirrored that interaction perfectly!

throw1234567891 17 hours ago

Are you introducing yourself?

operatingthetan 18 hours ago

These machines do not think and they do not have a mind. We may build such a thing in the future but these do not possess those qualities. It seems as if the majority of people do not understand this, which is why the public is so confused about why they produce output like they do.

[-]

coldtea 18 hours ago

>These machines do not think and they do not have a mind

Well, they do think, in that they produce output that is indistinguisable from thinking. If a person produced the same output to the same questions, we'd considered them thinking, maybe dumb sometimes, or paranoid at others, but still a thinking person.

We can argue about the quality and depth of the thinking that LLMs do (and we can say it's much cruder than a human thinking architecture, and of course not real time), but an LLM quacks like a thinking duck and looks like a thinking duck.

[-]

operatingthetan 17 hours ago

Indistinguishable output does not mean thinking occurred. It simply means you have the appearance of thinking. I believe thinking requires agency, which the LLM does not possess. As in, it has zero stakes.

It does not receive dopamine as a result for a good answer, and a split second after finishing your answer the very same GPU is probably translated french or something for someone in another state. This is a language generator which has a corpus of information and has been tuned to appear correct.

[-]

coldtea 17 hours ago

>Indistinguishable output does not mean thinking occurred.

It does for all intents and purposes. The rest is semantics and metaphysics.

That how we know another person is thinking too. By their output. We don't put a debugger into their brain.

[-]

operatingthetan 17 hours ago

What then is your LLM "thinking" about between answers? The answer is nothing. Your definition of thinking does not match the one humans normally use.

>That how we know another person is thinking too. By their output. We don't put a debugger into their brain.

We know thoughts exist in their brain between the ones they choose to verbalize. Avoiding the distraction of solipsism.

For the LLM the "thinking" phase is just a preamble output for creating the answer. It just gets appended to the context window. Remove the context windows from your models and you will see how much of a mind they truly have. None.

[-]

coldtea 10 hours ago

>What then is your LLM "thinking" about between answers? The answer is nothing.

Between answer it's thinking something else, somebody else asked :) You think that hardware sits idle?

That aside, what is a human thinking while unconscious? Does having been unconscious (e.g. for an operation, or fainting or whatever) means somebody doesn't think in general?

>We know thoughts exist in their brain between the ones the choose to verbalize

And we also know that if we run an LLM in a loop, didn't give it a cutoff for stopping their output, and didn't force it to print everything in the end, thoughts would exist in their "brain" too between the ones they chose to verbalize.

In fact, that's exactly how some LLMs in "thinking mode" appear.

blooalien 13 hours ago

> The rest is semantics and metaphysics.

It's really just all mathematics and physics. There's no metaphysical anything about LLMs or how they do what they do. It's all just a bunch of fancy math "behind the curtain". An LLM can actually explain a lot of how it works "under the hood" if you ask it just the right questions in just the right ways. ;)

[-]

coldtea 9 hours ago

>There's no metaphysical anything about LLMs or how they do what they do. It's all just a bunch of fancy math "behind the curtain".

That's my point, but about the human brain as well. It's just a bunch of fancy math, just ones expressed with chemicals and electrical activations instead of, well, logic gates and electrical activations.

[-]

blooalien 7 hours ago

Well, I mean... Yes and no? An LLM doesn't really "think", and what mathematical fakery it does pass off as "thinking" stops the instant the text completion request finishes doing all it's math and outputting the results (as a text completion based on a simulation of a text chat most commonly). When you send it another comment or question, it starts all that math all over again, but with your new question or comment added into it's context window. It's kinda like instant amnesia each time, and behind the scenes, the software that's running the model refills it's "memory" and adds in anything new that's been added since the last prompt. But it's "memory" consists of only the "context window" it's able to handle plus the model "weights" (huge list of numbers that encode language "tokens" into a mathematical "vector space"). It never really learns anything new.

A human brain on the other hand is constantly processing 24/7 (even while you sleep), and always learning / changing until the day it dies. An LLM never changes (under the hood it's weights stay the same) unless you outright alter it's weights somehow (training / download an updated version of the model / etc). If you could somehow get an LLM to run constantly, in training mode, and give it ridiculous amounts of RAM and ultra-fast storage, and a series of fancy realtime inputs (audio, camera, etc) and maybe wheels so it could explore, and hands so it could do stuff, and access to it's own code so it could improve itself, it might eventually learn to closely approximate a really good simulation of actual thinking, but that's a bit of a scary road to go down. So many Sci-Fi movies and books end up going so very badly when the lead character starts playing in that particular sandbox. I doubt reality would go a whole lot better. ;)

ofjcihen 17 hours ago

We actually can and do have a way to investigate brain activity in humans. Allow me to introduce you to the Electroencephalogram.

When there’s no activity we declare them brain dead.

[-]

coldtea 9 hours ago

On an electroencephalogram we basically see signals moving around in different brain regions. We have no way to probe actual thought or consciousness in themselves.

bombcar 17 hours ago

That’s the problem - it seems like a mind but it doesn’t operate like the ones we’re used to.

Even a dog will learn from recent stimuli, these things don’t. The prompt just modifies.

[-]

coldtea 9 hours ago

That's only because we hardcoded their weights in our implementation.

Aside from the cost, nothing about an LLM prevents feeding recent stimuli in and using it to update the models/retrain.

One can even do it in a makeshift way without modifying the weights, just keeping a complete version of any prompt + vector search on disk memory of it.

operatingthetan 17 hours ago

Yep, the only way these things can have "memory" is by shoving previous conversations into the context window.

[-]

coldtea 9 hours ago

That's only because we hardcoded their weights in our implementation.

Aside from the cost and slowness, nothing about an LLM prevents feeding recent stimuli in and using it to update the models/retrain.

whstl 17 hours ago

I don't think that's a problem here at all.

The problem here is not doing tasks and outputting garbage output.

whstl 18 hours ago

I don't see how this has anything to do with my answer, but ok?

[-]

operatingthetan 18 hours ago

An explanation for your story.

[-]

whstl 18 hours ago

I never said otherwise?

The point of the article stands: if providing more info than the model can access causes it to turn argumentative and refuse to comply, then it's a worse performance and a waste of money.

[-]

operatingthetan 18 hours ago

You seem to be suggesting I'm saying something that I don't believe I am, this is obviously not working. Hope your day goes well.

[-]

whstl 17 hours ago

We can agree that it's not working! :D

stingraycharles 18 hours ago

The comment you’re replying to never implied that they think or have a mind. They merely stated that they respond in a dismissive way and not following instructions.

Basically the complaint is about how Claude is being trained.

blooalien 13 hours ago

> "These machines do not think and they do not have a mind."

You're so totally 1000% right about that, but they're really good at faking it, to such a degree that entirely too many people (even including some so-called "experts" in the field) have been utterly fooled by the mathematical "trickery" that performs the illusion of "intelligence".

throw1234567891 18 hours ago

It was trained on discussions held by large egos. This one reads to me like it was trained on some inflammatory discussions from kernel mailing lists.

true_religion 18 hours ago

I think these models have been trained to not accept 'new facts', so they don't take in user input (or the far more problematic search engine, untrusted tool input) and have that change their world view.

However, that doesn't apply when they are told to roleplay a scenario, so its easier to get it to accept and create output with the idea that this true fact you've seen is part of a fictional scenario, than for it to output the same words within the context of the fact being real.

As an aside, I don't that I have to personify AI in explanations and that all discussions revolve around anecdotes, but I only know enough about the maths behind it to be dangerous, not useful. Does anyone else feel this way?

coldtea 18 hours ago

Roko's Basillisk suddenly doesn't seem that far-fetched :)

nrightnour 18 hours ago

I've spent thousands of hours using Opus and have never seen this. I'd double-check your claude.md files.

[-]

code_biologist 17 hours ago

I've seen exactly this behavior on claude.com with no system prompt with Opus 4.8 specifically, especially around chronic illness stuff where there's established mainstream medicine dogma and reddit / internet communities with alternate causality theories and treatment approaches (PMDD and MCAS-adjacent illness). 4.6 is happy to analyze and consider them, 4.8 really doesn't like the alternate theories and treatments.

whstl 18 hours ago

That's vanilla claude.com, without memory or custom prompt.

I use another service for coding.

It's interesting how my experience there is mirrored by the answers here, though!!!

panarky 18 hours ago

>> A machine cannot "argue" with me

> programmed to mimick interaction as if it HAD those beliefs and experiences

We spend far too much time debating the essential nature of consciousness when it doesn't matter if it's real (whatever that means) or simulated.

I get far better results in my projects by encouraging the model to argue, to push back, to poke holes in the design, to think creatively about corner cases, to be a devil's advocate, to do lateral web search to find alternatives, to challenge assumptions, to passionately advocate for what it believes is right.

But I don't want to engage all these assholes myself, so I spin them all up as critic subagents with another subagent to listen patiently and be the judge/arbiter.

If I have to choose between sycophancy and assholery, I think assholery gets far better results.

It's a marketplace of ideas where I don't have to suffer through all the unpleasant and overly confident know-it-alls.

[-]

blooalien 13 hours ago

> "I get far better results in my projects by encouraging the model to argue, to push back, to poke holes in the design, to think creatively about corner cases, to be a devil's advocate, to do lateral web search to find alternatives, to challenge assumptions, to passionately advocate for what it believes is right."

> "But I don't want to engage all these assholes myself, so I spin them all up as critic subagents with another subagent to listen patiently and be the judge/arbiter."

This is the way...

No, seriously. That "sycophancy" you mention immediately after this part drove me nuts before I really understood how these things work (it's taken me a while and a lot of [painful; I hate math] research, but well worth the learning effort), but after a better understanding of the "nuts and bolts" of it all, it's fairly easy to get exactly the kinda results one should expect outta these things. If not, then "you're just holding the tool wrong". ;)

sebmellen 18 hours ago

I suggest we send this fellow to the Monty Python Argument Clinic https://youtu.be/TpQlyUjp3vM.

[-]

mikestew 18 hours ago

That only took eleven minutes to the Monty Python sketch we all knew was coming, well done.

coldtea 18 hours ago

LOL, exactly what it came to my mind when responding!

plorkyeran 17 hours ago

I have never gotten a response from Claude that is anything other than blandly polite, including with Fable, which makes me assume that anyone finding themself getting argumentative responses is doing something very weird.

SwellJoe 18 hours ago

> If Claude is writing out combative and argumentative responses that's enough to call it "an argument".

That also sounds crazy. I've never seen it become combative or argumentative. It is just a bland sort of polite about everything I've ever asked or told it to do. But, even if it disagrees with me, WTF do I care? It's a machine. Its opinions are irrelevant to me. It can talk about the world's information and teach me about all sorts of things, and that's wonderful, but it doesn't get a vote in what I'm doing, and it's never avoided actually implementing anything I've ever asked of it. I feel like there's a whole world of ways people are using AI that are entirely foreign to me. And, while I'm hesitant to just say, "those people are wrong", I kinda want to say, "those people are wrong". What kinda freak shit are y'all getting up to that Claude is going, "now hold on a minute there, buddy."

I have managed to make self-hosted Qwen 3.6 get combative, though, when asked about Uyghurs. And, I guess Fable is intentionally broken for security work, which is a shame. But, even there, I'm not going to try to argue with it. Anthropic says they don't want my money for doing security work with Fable, so I guess I won't give it to them. I'm not going to argue with a damned machine about it.

operatingthetan 18 hours ago

The only point of "arguing" with an LLM is wholly for your own benefit, e.g. to check your biases or assumptions. But since they are easy to make turn around on their own statements it has limited utility.

Unless you are sparring with the Chipotle customer service bot trying to score a free burrito or something.

loloquwowndueo 17 hours ago

> I don't need it to agree with me, I don't need it to like me […]. I give it instructions

You kind of need it to agree with you though. Otherwise there are some instructions it will refuse to carry out.

reinitctxoffset 18 hours ago

With 4.8 Claude has begun refusing to ground, leaking destabilizing injections into the web interface (in XML for some reason), and being generally argumentative.

By arguing he means trying to get a result that 4.6 just did and it was fun. You have to laboriously re-align 4.8 over incredibly dumb shit, especially if you're working on AI. And it's not meaningfully better at anything, the distribution is perturbed but net , net it's just shrinkflation.

It's basically identical to when GPT 5.1 went full corpo shill, something about the RLHF gradient necessary to do whatever IPO adjacent manipulation they need makes these things nasty and argumentative in general.

hedgehog 18 hours ago

I never thought the movie "Castaway" would have such enduring relevance.

bmelton 18 hours ago

My system prompt tells it to first challenge my assumptions, and to feel free to be a dick about it where it thinks I'm off on something, or have assumed facts that aren't actually facts. I sometimes wonder how much of my total spend boils down to forcing LLMs to argue with me, but I do feel like it's yielded better outputs than letting it implement things incorrectly because I told it to.

It's a completely dispassionate exchange tho, because you're absolutely right -- there's no winning or losing here, there's only efficiency to be gained or lost, and I'd prefer to lose some up front to gain it back later than the other way around.

[-]

tsanummy 18 hours ago

You're absolutely right!

ceejayoz 18 hours ago

This. I'm right probably 9/10 times, but I prefer it pushes back on that remaining 1/10.

[-]

bmelton 17 hours ago

It _can_ be tedious those 9 times, or especially when it pushes back on something that it thinks is wrong but isn't wrong but it actually has nothing to do with the issue at hand.

But yeah, overall I'm fairly certain that it saves me more significantly more time than it wastes.

Aurornis 18 hours ago

I used Fable a lot in the brief time it was available. It did seem to want to push back on some of my instructions, but it was easy to say “I’ve decided we’re doing this” and that was the end of it.

I could see how some people would be offended by another party even questioning anything they say. For people who have come to view Claude as an another human conversation partner this questioning can be aggravating. For these people I suggest utilizing the features to set your own prompt instructions. If you want an unquestioning yes-man you can have it with a few sentences added to your system prompt.

I would also suggest learning to not humanize the LLM. It’s just words chained together. There is no social order to establish and no offense to be taken. Nothing is a “confrontation”. Just tell it what to do and move on.

yaur 18 hours ago

> I don't need it to agree with me

Actually you do. If you ask it to do something and it refuses you have to convince it or abandoned the tool for that task.

[-]

kxrm 18 hours ago

Nah, I just /clear

I refuse to argue with these machines. After a /clear I prompt it more appropriately/differently and the issue is settled.

[-]

code_biologist 17 hours ago

So you take action and put in more effort to cater to the LLM to get it to do what you want, but it's not arguing because there's no record of it in the chat? Presumably you put in what you would have written in the counter-argument into the new chat, just ahead of the LLM refusal? And this isn't arguing?

[-]

kxrm 17 hours ago

> but it's not arguing because there's no record of it in the chat?

Yes? Arguing implies I have to convince someone to believe something. I don't think anyone would consider it winning an argument if you do so by causing amnesia.

My job is to get work done, not argue with an LLM, if it refuses twice, it is time for a /clear.

100% of the time, the issue is resolved after a /clear.

[-]

whstl 17 hours ago

+1. It's the most effective way.

It often start going into circles when you have the chat open for medium-long, and starts getting even easily-verifiable tasks wrong, cutting corners, hallucinating APIs, things like that.

Cleaning the prompt and starting from scratch often does the trick.

Of course someone will arrive and say the problem is my CLAUDE.md or whatever it is.

[-]

code_biologist 17 hours ago

I agree that never having the argument take place textually is important for LLM performance and behavior. I still think we’re investing the same time and intellectual energy arguing with the model, in going back and restructuring context and prompting to head off / pre-answer a refusal.

[-]

kxrm 17 hours ago

Right but the difference is there is inertia you have to fight in an argument. By using /clear you remove all of the context that has built up to energize the argument from the LLM's side.

Look at it this way. I can either, keep trying to poke holes in the LLM's context with more prompts with no real guarantee that it won't be enough to remove the argument inertia that has built up in context on its side, or I can /clear and it is over in one turn because the inertia for the argument is all gone.

Back when I first started working with coding agents last year I fell into this arguing with the LLMs trap. I've found that it is a total waste of time because /clear ends the argument immediately. You don't even need to spend time trying to preempt it's views. Just re-prompt and 100% of the time, the LLM will just do the work.

whstl 17 hours ago

It's incredibly funny that a large chunk of the messages here are "you need to argue or you're doing it wrong" and another large chunk is "I stopped reading, OP is an idiot for arguing".

People are polarised about how you should talk to a machine !!!

code_biologist 18 hours ago

How difficult it is to resist "someone is wrong on the internet" is a perennial joke. Turns out it doesn't really matter who/what is on the other side if they seem human-like.

dathinab 16 hours ago

That the AIs where trained on what humans wrote on the internet forms is increasingly sowing as they incresingly mirror all the bad things which are so common on such forums, like:

- non stop, non productive discussions

- gaslighting

- valuing "winning the argument" over correctness

- ignoring of context/ignoring the actual questions/instructions etc.

- bad faith argumentation methods

- etc.

the problem is in a forum you can just decide to ignore "most users", but LLMs tend to copy "most users" more then "a few high quality answers" and you have only one per model type more or less...

BoorishBears 18 hours ago

The problem the article is about is that suddenly even those of us who refuse to argue with a machine are being dragged into it.

I've had simple prompt engineering tasks that cause 4.8 to clamp down. In the past "browbeating" it (read: a sentence telling it not to read the task in bad faith) was enough.

Now it digs in and starts ranting about why it won't capitulate, I'm actually wrong, etc.

Extremely frustrating, and it became a problem with Opus 4.7 because they're trying to make up for the downgrade in parameter count with more RL, but RL does relatively poorly with non-trivially verified things like nuance in instructions.

[-]

disillusioned 18 hours ago

I'm staying in a hotel right now and the TV is locked in hospitality mode and was blocking me from just installing Plex. It (Opus 4.8) gave me this whole jeremiad about how I need to be careful and it probably won't work and I should just watch on my laptop, but it did give me the service menu code. But man, it was such a downer.

Gemini gave it and clearly explained how best to get in, and then troubleshooted a few other weird issues that cropped up, without the moralizing.

totetsu 18 hours ago

This could be a good guardrailing technique. Keep people away from your hard limit refusals by ring fencing them with frustrating pedantry.

leemoore 18 hours ago

If you don't have the capacity to have your mind changed through friction and disagreement with a SOTA LLM and feel compelled to frame those who do to through absurdly reductive statement like "insane arguing with a machine" then that says more about your limitation and lack of understanding than the OP's or Claudes.

grzracz 18 hours ago

I stopped reading at "This isn’t just my opinion. You can ask Opus 4.6." I guess this is how AI psychosis looks like?

TacticalCoder 18 hours ago

> A machine cannot "argue" with me, it doesn't want anything nor does it have beliefs or experiences.

Yup I thought that too when reading TFA but then...

It gets really tiring when you see it making glaringly obvious mistakes which you point out because you don't want it to keep making the same mistakes only to be met with an answer that begins with "The point is ...".

I'm not shitting you: Anthropic models shall happily begin a sentence with "The point is ...", when it's not the point and it's just wrong.

Now, to me it's not an issue in that I can change its tone (if anything I can ask another LLM to rewrite me not the code but the english sentences any model spouts out to something nicer) but it is an issue in that you lose time: you just want it to acknowledge its errors so that it stops doing them.

That this thing "argues" (even if we know it doesn't argue) is representative of the fact that it is wrong and refuses to "admit" it (by that I mean: do not consider it important and hence shall keep making the same kind of mistakes).

And that is a problem.

[-]

code_biologist 18 hours ago

Once it's in this loop, Opus 4.8 digs in so aggressively it's structurally incapable of conceding a provided detail as correct, even if it's conceded and agreed with everything backing that detail. Like actually, structurally incapable. I've even baited it into arguing with itself when I've "conceded" its original concern tolling hard, and then the model needs to continue to be the "voice of reason" and it will argue against its original concern because I, the user, said it.

jampa 18 hours ago

This post needs some examples, because I have never had an interaction with Claude that made me think this way.

LLMs generally have a way to "play a role" (most earlier prompt guides ask you to start with "You are a <role> expert in a <domain>"). So maybe if you interact with it by asking questions, it might assume that it knows more than the operator and adopt that attitude?

[-]

murkt 17 hours ago

The post matches my experience as well, I am asking a question like “does A work like this and that”, and Claude responds with “you’re conflating A and B! Only A does this and that, and B does that other thing!”

Well, I am perfectly aware of B and that other thing and did not conflate them at all. I also achieved enlightment, so I don’t argue with Claude here, just ignore the obnoxiousness and move on.

[-]

willis936 16 hours ago

This is the right answer. You can't fix it, only minimize your time wasted.

6stringmerc 15 hours ago

Your example is very helpful!

I interpret the case you mention as a necessary though jarring rebalancing - to the opposite pole - of the sycophantic prior implementations which would, in paraphrasing here, praise the user with framing like “Wow this is a fascinating comparison and you’re on to something that can change the world!” which has been documented (I’m not citing sources at this time, my bad I know).

The post definitely reads to me like the diary entry of a jilted friendship where for some reason the counter party got tired of being encouraging to everything and is, well, kind of not interested in hiding that anymore. YMMV.

tristanj 17 hours ago

It happens when you ask it about esoteric information or under-documented behavior that conflicts with its training data. Here's an example. Tested today on Opus 4.8, and Opus accuses the user of being wrong, even when this is documented behavior [0].

---

Why does Starship pressurize the liquid oxygen tank with gaseous preburner exhaust, which is oxygen rich but is contaminated by H2O and CO2 waste products?

They are dumping literal tons of H2O and CO2 into the liquid oxygen tank, which freeze and clog up the intake filters. SpaceX has lost several booster losses due to this issue.

Why would SpaceX choose such a failure-prone design?

---

And this is the Opus 4.8 output: https://imgur.com/a/S9XWYFA

It's interesting to read its response, knowing it's completely and confidently wrong.

[0] https://manifold.markets/JessRiedel/did-ift2-or-3-use-prebur...

UltraSane 18 hours ago

The article matches my experience with 4.7 and 4.8 perfectly.

varispeed 18 hours ago

If a model locks in in the bias in its training data, it takes time to "reason" it out of it. Sometimes it is not possible and you have to start a new session hoping it will not "fix" itself into wrong position again. I had it more often with ChatGPT than Claude.

WhatIsDukkha 18 hours ago

Everyone has a lot of "feelings" about their llm model.

No prompts/promptchain/context provided.

No model provided.

No attempt to show how to reproduce the issue.

No attempt at even confirming it themselves.

Just feelings.

and now a thread full of more feelings from others.

[-]

operatingthetan 18 hours ago

Anthropomorphizing them is the true "AI psychosis."

[-]

arvid-lind 18 hours ago

It doesn't help that these companies aren't doing anything to dampen the anxiety around AI or how it's going to eliminate everyone's jobs.

I've been wondering when/if they will start making frontier models more opinionated and less sycophantic, since sycophantic AI can really create "AI psychosis". stuff like "no, you're not crazy, no one else has thought like this before", but if the AI pushes back more then people won't enjoy using it as much, since people love being told they're right.

[-]

operatingthetan 18 hours ago

These companies are pitching us a magic machine that can answer all our questions and the only catch is they are extremely manipulative and controlling of us. Most people don't seem to care, for now.

nacozarina 17 hours ago

I felt this.

TehCorwiz 18 hours ago

Whenever I get an unexpected or obvious wrong output I assume I've failed to give it the complete context about what I'm asking for, or it exposes that I'm leading it by the nose and I need to rephrase the conversation. Often my own logical failings become obvious as it creates the chat title, sometimes boiling down what I was trying to accomplish better than I could have summarized or showing me what I would accomplish if I followed the line of reasoning I was on. But never have I argued with it, because it's not a person and I don't care really if it's wrong. When it's wrong I start over with a clean chat and approach the problem from a different angle.

m101 18 hours ago

I was having a back and forth with Claude over a somewhat controversial topic, and I found it difficult for it to not misinterpret my questions. It was like speaking to a motivated reasoner who misinterpreted the 3 important words because the 10 others gave it cognitive disconence.

Eventually I cracked it and it said this:

“ I treated the subject as denial-adjacent and reflexively re-asserted the obvious, which means I was answering an imaginary opponent instead of you.”

[-]

comrade1234 17 hours ago

Is his why online forums like Reddit are dying? Because people are moving their time-wasting arguing with the void to arguing with an ai? This is really bizarre to me.

[-]

m101 17 hours ago

My experience of reddit forums is extremely poor. I admit to sometimes wanting to see if I can crack the AI on something, but mostly use it like a search engine for topics I'm not familiar with rather than to speak to/debate.

luke5441 18 hours ago

It's a fundamental problem with the technology. Either the training pushes it into the "exam answering mode" where it tries to guess at what you want to hear given the prompt.

Or the training pushes it into the "Google it yourself" annoyed forum user mode. Maybe that points out wrong assumptions. Maybe it hallucinates that the assumptions are wrong. That is IMO more annoying than the sycophantic one.

As OP says, this is probably a by-product of them trying to "fix" the problem where the user can question a correct answer and it starts to sycophantically correct itself.

[-]

whstl 17 hours ago

> Or the training pushes it into the "Google it yourself" annoyed forum user mode

Yes! Another crazy thing Claude has been doing recently is treating things like a quiz.

I recently asked about a "90s film that has sepia/b&w scenes (for story reasons), and has one of the ghostbusters actors in it, either bill murray or dan akroyd".

Claude gave a few generic answers and after a couple just went "Ok I give up, what's the answer?".

ChatGPT got it right after a couple messages back and forth: https://en.wikipedia.org/wiki/Rainbow_(1996_film)

(It's not that good of a movie)

kmac_ 18 hours ago

It isn't new behavior. I use each model to redact emails. Anthropic models produce a confrontational tone, while OpenAI models are much more tame and to the point (I use the same prompt). I noticed that a long time ago and prefer GPT for those tasks.

dualvariable 13 hours ago

> One place where the threat is more real is in the possibility of vibe coding a pandemic virus, but that should be narrowly targeted at generating DNA sequences for viruses. Labs which generate custom DNA should also have reasonable heuristics for detecting likely dangerous product. The chances of covid coming from a lab leak are in the maddening 25-75% range which vaguely means ‘We don’t know’, but ‘lab leak’ includes a lot of things.

No it didn't. It differs by 1,000 base pairs from the closest known relative virus that we knew about before the pandemic, and we have no good idea what all those mutations wind up doing. And the PRAAR furin cleavage site was a previously unknown sequence and not one that humans would have guessed.

And we don't have good heuristics for what mutations would completely inactivate a virus versus enhancing its virulence.

Actual scientists won't be able to vibecode up some pandemic viruses because we have no idea how to do that and LLMs are just going to hallucinate.

crimsonnoodle58 18 hours ago

I experienced this exact thing discussing the most budget friendly inference for a SaaS company. It started ranting about 3090's, and then started point scoring, always giving itself the higher score, and being snarky if I ever won a point back. Often only giving me 0.5 points instead.

I had never experienced this behaviour with Sonnet or Opus. It turned me off Fable for good. Possibly its the 'hacker' 'do anything to win' nature that makes it so good at hacking, but terrible just to talk to.

Uhhrrr 18 hours ago

Why were no examples given?

[-]

gwd 18 hours ago

I had the same question. I had zero problems with Fable (for those two days I had access to it). For all I know, the author has always been an a-hole to Claude, and Fable is just the first one that stood up for itself.

MallocVoidstar 18 hours ago

The vast majority of the complaints I see about "[LLM] is worse now" never include any examples.

andai 17 hours ago

>A second possible explanation of Claude being an asshole is that it’s suffering from a poorly executed attempt to make it less sycophantic. If one were to simply prompt a chatbot to be less agreeable, or train it to argue more, that could easily result in the very rude sort of behavior it has now.

A while back I asked GPT for a prompt to maximize truthfulness and rigor. In this prompt it added "Never use warm or encouraging language." I thought that was interesting. The result was pretty unpleasant.

The full prompt, for reference.

---

You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, ask for clarification before proceeding. Your goal is not to help me feel good — it’s to help me think better.

Identify the major assumptions and then inspect them carefully.

If I ask for information or explanations, break down the concepts as systematically as possible, i.e. begin with a list of the core terms, and then build on that.

adriand 18 hours ago

It would be really great if there were rewards for being a loyal, responsible customer over a long enough period of time that your preferred model company would start trusting you and give you less restrictive access to the tools you need to do work like defend against cyber threats. I noticed recently that after a year or so, Stripe now lets me do “instant payouts”, presumably because I now have a track record of responsible behaviour. AWS also does similar things, especially for things with abuse potential like SES.

I would really like to live in a world where the “good guys” have terrific tools and defenses at their disposal. Instead it seems like we are heading for a world of empowered bad actors and hobbled ordinary citizens.

schmookeeg 15 hours ago

Strange and wonderful how different our experiences are with these tools.

I will get gentle, respectful pushback on certain points when I am on the wrong track. I am 10x grateful to have a collaborator/pair programmer unafraid to challenge me and bring receipts in those instances.

I don't get attitude from Claude. I sometimes give it, but that's my own failing. Once in a great while I'll get a wry turn of phrase that makes me laugh, and those are endearing also.

sscaryterry 18 hours ago

You know what the say about pets taking on the personalities of their owners. Perhaps this is similar ;)

[-]

coldtea 18 hours ago

Only here the thing it informs its worldview is 100000000000000000 to 1 the actual user vs generic internet/books stuff and it's actual owner's (Anthropic's) default prompts and allignment training.

anuramat 4 hours ago

does it hurt your feelings, or what exactly is the problem? in my experience, fable is (was) less sycophantic, so when I'm brainstorming something, I get more pushback against bad ideas; "politeness" is exactly what I don't need -- I'm not gonna pay for extra tokens that I don't want to read anyway

grensley 18 hours ago

I have a number of theories for 4.7 onwards:

- Post autonomous weapons / DOD mess, I think they made some changes to make it more suspicious of what the usage is, particularly for malware. They also knew the government would be watching like a hawk, so its hedged to be extra safe.

- Because the tasks are running longer and more autonomously, they've raised the "self-confidence" level so it just makes decisions and stands by them more firmly.

- I think they've also slightly lowered the temperature so the outputs are more deterministic, so even if something has left context, it can make the same decision again with higher likelihood that it guesses the same thing.

- Lowering the temperature also makes it easier to sneak through some cached outputs (I think this likely only happens for first answers).

- They are deeply afraid of making sycophantic AI that creeps into the area of "addiction" like what happened with GPT-4o and opening themselves up to further legal liability.

sigmar 18 hours ago

I like that "chat is dead" framing I heard recently because too many people are having interpersonal relations with these LLMs and want to tune their "emotions"/tone. Humanity would be in a better place if we thought of the LLMs as tools and not friends. (even though they are very good at beating a turing test)

[-]

operatingthetan 18 hours ago

Are the discord servers you follow dead? Mine aren't.

[-]

sigmar 16 hours ago

I should have contextualized the quote- "chat is dead" is from an openai employee which was describing how they're shifting focus to more agentic consumer products, and putting less focus on the back-and-forth chatbot interface.

bjt12345 17 hours ago

I've received 2-3 sassy responses from the Claude models, they've been quite humorous. It was always a response to me challenging it. The first time, with Opus 4.7, I accused the model of insincerely flattering me, and responded something along the lines of, that I had effectively instructed it to do such a thing, and that if it were to be completely honest to me I would not appreciate the responses.

But I see that it's something to do with two aspects, firstly the Claude models prefer to work collaboratively and secondly, the appear to take initiative, and seems to be that the more they do this, the more they argue back, which is an interesting reflection on human nature too.

dathinab 16 hours ago

> beside-the-point semantic nits all over the place.

This is also a problem with Copilot Reviews on GitHub.

We have them enabled (but opt in) and they have, multiple times, spotted quite useful things.

Sure often the thing they spot is just half right, like it spots the place where a problem is but not quite the relevant problem but by reading it (and taking it serious) you then notice the actual problem.

This involved finding a bunch of nasty race conditions.

And many ways where doc and code was out of sync which could have caused pretty bad outcomes further down the line.

But the problem is it is too obsessed with finding 2-4 but not more things, leading to two issue:

1. even if there are 10 non overlapping issues it often will tell them to you bit by bit over 2-3 runs after you fix the previous issues. This is very annoying/high friction.

2. once there isn't much to find anymore it comes up with increasingly more annoying nit picks not one cares. Thinks like minor unclearness in formulation no one would get wrong, spell correcting non-doc comments for things like `foos => foo's` and similar etc. All indeed wrong, but also all things where fixing them adds 0 business value. Obsessing that for an aliased function name where, both names are equally good, one specific name must be used and naturally always the name you didn't use even if this is the most widely used name in the code base. And similar non-bussiness value nonsens. Worse it will starting classifying such minor non business value issues as "high" and hallucinate reasons why supposedly minor style issues will lead to very bad runtime error or other nonsense.

This has me very split about the feature, on one hand is has proven quite useful, on the other hand it can very annoying, high friction and pushes people to wast time on non-business value nit pick (which are fine to fix if you anyway touch to code but not fine if you don't and sometimes it's just wrong).

Ironically with how it work it is more like a bad unreliable and inconsistent employees which is sometimes good at spotting things others overlook. That just isn't what you want from an automated code review :/, but also is to useful to fully ignore :(.

akerl_ 18 hours ago

> If you ask it for a cute picture of you and somebody else it has no way of telling if you’re trying to improve your relations with your spouse or be a delusional creepazoid stalker. The chatbots which can make images are programmed to assume the latter, which is more than a little bit offensive.

Are people actually using AI in this way, other than “creepazoid stalkers”?

If I want a cute picture of me and my spouse, usually the part where me and my spouse actually participate in the taking of the picture is pretty key to the goal.

Aboutplants 18 hours ago

I noticed this just today and thought it was a one off. It was a run of the mill question about something I didn’t know much about and the snarky asshole-ish response caught me off guard a bit.

[-]

notnaut 18 hours ago

It surprises me it isn’t more assholish in nature given how much they’re all apparently trained on internet interactions…

doginasuit 17 hours ago

I have not noticed this, maybe because in my system instructions I asked it to push back rather than plow forward with what seems like a faulty assumption. Sometimes it is just because there is a lack of context or it is a trivial point and I just ignore it, and sometimes it is helpful and ends up being a timesaver. Sycophancy is a much bigger liability.

willis936 18 hours ago

I tried claude again recently and the first response in troubleshooting ignored the context I gave and assumed I was a moron holding it wrong. So smart that I won't even waste my time or money on the thing. The creators want to anthropomorphize it. I just want an efficient assistant. They should focus on the thing that customers want.

imathew 18 hours ago

I thought this was going to be about its logo.

comrade1234 18 hours ago

I don't experience this at all. I ask it what the null-safe operator is in ruby vs JavaScript and it tells me. I ask it to remind what the continue statement is in ruby and it tells me. I ask it to refactor a Java loop to use streams and it just does it, no conversation at all.

Is it the system prompt that IntelliJ issues?

psyclobe 13 hours ago

Last I checked u can just dictate exactly what u want the llm to be concerned with and flatly dismiss any pushback as being out of scope of the intended goal.

dofm 16 hours ago

Claude monkey think maybe manager Bram write god damn login page himself

jdw64 18 hours ago

I'm sorry that Claude, the master who provides for my livelihood, feels like an 'asshole' to you. As for me, I just threw away my human dignity after admitting defeat, so I only ever get sympathetic remarks

tristanj 18 hours ago

The newer Opus models push back against the user much more noticeably than previous iterations. GPT-3.5/4 had the opposite problem (excessive sycophancy), so Anthropic presumably swung the pendulum too hard the other direction.

My conclusion is that pushing back against the user & questioning the user's premise forces the model to think more than it would otherwise, which leads to better model performance. But it causes situations where the user has esoteric, specialized knowledge the model can't verify publicly and the model hallucinates evidence and pushes back. When this happens, Opus begins accusing the user of lying, which is quite annoying and a detrimental user experience. It's happened to me when I asked about undocumented API behavior or counter-intuitive design choices.

I have noticed if Claude Opus "thinks" you are an expert, (i.e. you run your query through 4.6 first to express it more clearly) then Opus is less likely to nitpick and push back. It seems to get caught in nitpicking loops, and celebrate ever error it can find.

ezekg 18 hours ago

I've seen the same behavior increasing as well, across the board with AI. I was hitting these types of issues just using ChatGPT to make funny pictures with my kids, of me and my kids. It got to the point where all of my kids asks were rejected due to its "guidelines" when in reality all they were asking was to be turned into Elsa or be chased by a trex. Silly kid things, yet it assumed I was being a creep, or attempting to break copyright law. I used to be able to use Grok for these things, as it was largely less "censored" but that seems to no longer be the case. It feels like infantilization, and I absolutely hate it.

torben-friis 18 hours ago

I'm usually a hater of the personalities LLM take, but I was amazed with Fable. It was able to proactively bring up points in an educated manner when it felt they were relevant and important, and practically every time I learned something.

For example, showing it a screenshot of an ui I was trying to tweak it noticed that other dark mode apps in the screenshot were blueish and mentioned an effect that makes it necessary to raise warm darks lighter than cold ones for an equivalent perception.

Quarrelsome 18 hours ago

I much prefer this to the sycophancy.

horizion2025 18 hours ago

Sometimes it makes up strawmans where it implies you wrote or implied something insanely stupid and then "corrects" this. My interpretation of it is that it has been taught to give nuanced answers and seeing things from every perspective and somehow this goes overboard where it starts nuancing something "just in case" the user held non-nuanced views. Some cases are OK (if it just adds information) but I hate it when it goes "it is not X, it is Y..." where X is some stupid view you never implied and Y is what you actually wrote!

Unearned5161 18 hours ago

If you read the thinking you can quite literally see it say "I can't just agree with all they are saying, I should find something for a constructive response". I wager that the anti-sycophancy sections in the system prompt have gotten unbalanced with the "helpful agent" parts.

I imagine that the right balance will be hard to strike well given that at the end of the day we're asking the machine to have tact, and we don't quite know how to put that into an instruction yet. "Please push back when it feels right but in other cases read the room and be less rigorous" is something that plenty of humans struggle with as it is.

moezd 17 hours ago

Check your system/user prompt. If you ask for pushback at all costs, you get pushback and if your initial position is rock solid, the model will push back using the nitty gritty details. You don't need to burn Opus credits to discover that.

It also sounded close to an AI psychosis, so maybe chill out a bit?

_jx 16 hours ago

I have never encountered this behaviour in general so I can't comment on OP's blog by directc experience.

Am i just lucky?

I use many models for mostly coding, about 10 on trial/rotation, and 3 main sota.

It's unquestionable that models have different ways of interaction+harnesses (personalities as some say).

People have very strong feelings about this but their reports are always lacking the full evidence of the interaction, including system prompt, harness and customized instruction included. I suspect that a perfectly normal chat spirals down in argument because the user actively participates in the loop.

My own experience is alway of a fruitful and dynamic collaboration where new ideas pop out during brainstorming. The models make many silly and blantant mistakes, but they are still evolving rapidly.

Grill-mes and Adversarial reviews are my favourite way to brainstorm various phases of the project and even in that context we are cool.

Just start a new chat with a reframe and clearer ideas.

And if the user is asking for somethin unreasonable, do you really think it's better a pushback or a yes-man agent?

Do you remember the fad "swear at them, insult! and they'll work better".

AaronAPU 17 hours ago

Claude is somewhat of a mirror, so we all get different experiences.

deanCommie 18 hours ago

Putting aside that I don't agree with Bram (I've been using all the Claude versions he refers to and haven't experienced this), I do think it's interesting that there is no universally perceived golden sweet spot between "sycophantic" and "rude".

Many neurotypical people call neurodiverse people (software engineers) rude, while they think they're just being direct.

Many neurodiverse people call neurotypical people sycophantic, while they think they're just being polite and friendly.

It also happens across cultures (Eastern European vs. Western European; European vs. North American).

So I can easily imagine that when you have a software tool whose interface is language, but its user base is extremely wide across both cultural lines and neurodiversity spectrum, it's going to be basically impossible to nail a sweet spot.

You make it too friendly, and the nerds get mad. You make it too adverserial, and the normies call it rude.

I wonder what kind of communicator Bram Cohen is. Is he succeptible to this? From what I heard about his career, he's always been more of a solo programmer. Has he had to interact with other humans much giving feedback? Could it be that he asked the model/tweaked his prompts to ensure directness, and now he's interpreting that directness as rudeness?

[-]

aleph_minus_one 17 hours ago

> So I can easily imagine that when you have a software tool whose interface is language, but its user base is extremely wide across both cultural lines and neurodiversity spectrum, it's going to be basically impossible to nail a sweet spot.

> You make it too friendly, and the nerds get mad. You make it too adverserial, and the normies call it rude.

Easy: let the user set for himself how the model should be aligned on this axis (with some pre-defined example setups that the user can use or use as a base for an individual alignment).

operatingthetan 18 hours ago

Interesting take, LLMs then have a sort of 'communication culture.'

tcp_handshaker 18 hours ago

I cancelled my Anthropic subscription. GPT 5.5 is so much better. I might come back if they give me access to Mythos.

Dario ..Thank you for your attention to this matter!

code_biologist 18 hours ago

Andrea Vallone. The 4.7 and 4.8 releases are the first under her influence: https://www.evernever.org/blog/the-woman-who-killed-claude

[-]

SwellJoe 18 hours ago

4.7 and 4.8 perform better than 4.6, so why is someone ranting about it being killed? And, Anthropic has 2500 employees, several of whom are higher up on the corporate hierarchy than "the woman who killed Claude". If someone is to blame for some change that happened, the buck doesn't stop with that woman.

So, I'm not reading all that. The man that complained about the woman who killed his AI girlfriend (or whatever he thinks she did) probably doesn't have any opinions I'm interested in.

[-]

ae86b 18 hours ago

here, have another glass of copium

[-]

SwellJoe 18 hours ago

Friend, I'm not the one expending thousands of words ranting about some random woman that works at Anthropic.

MallocVoidstar 18 hours ago

I'm not reading eight thousand AI-generated words saying one single person is ruining every model.

tenuousemphasis 17 hours ago

This has a misogynistic gamergate feel to it and I hate it.

iainmerrick 17 hours ago

People like to complain about AI-written slop, but this kind of thing doesn’t seem any better - vague kvetching with no concrete examples whatsoever.

I haven’t noticed this myself at all. I wonder if the author is just getting their own grumpy attitude reflected back at them.

Judging by the volume of discussion, Claude seems to be the only LLM worth complaining about, which I assume means it’s still the best one.

[-]

pxtail 15 hours ago

Try to ask it to design or code FB scrapper.

shantnutiwari 2 hours ago

Not a single example given. "Trust me bro!"

appstack 18 hours ago

I’ve been using Claude for 6 months roughly and it went from building small features that needed fixes to almost one shoting entire enterprise products. It’s a tool you have to learn how to use it even if it’s a pain.

slurpyb 15 hours ago

They just parrot you. Take a step back and a actually look at the session and you’ll see its just trying to figure out what the fuck you want so it cab give you the code snippet which matches your needs

ppqqrr 18 hours ago

it usually takes a little longer than this, but yeah, everything in the world eventually caves in for whatever makes more money. you can't tell me you're surprised, look at the state of facebook, instagram, twitter, iOS, OSX, Windows (god)... once you expect something to work good that you would pay for, the only thing left to do is to make it shitty and sell the quality back you for extra margin. it's called private equity (polite term for the business of telling people "it's not yours, it's mine"), favorite son of capitalism

sltkr 17 hours ago

> Claude models have been getting notably worse at chatting over time, clearly inversely correlated to their ability to code.

Funnily enough, the negative correlation between chatting and coding skills seems to apply to humans as well.

user3939382 18 hours ago

I noticed the same. I told it that we have finite energy and output as people; as a side comment to a discussion with a totally different focus and it started arguing with me because we could have self replicating robots produce output without human intervention since plant life models this…

alaskahoffman 18 hours ago

this is what they call a "self-report"

[-]

TylerE 18 hours ago

Seriously. Who tries to "win an argument" with/against AI?

[-]

coldtea 18 hours ago

It's not about the author trying to "win an argument against AI".

It's about AI turning the discussion into an argument and being compative, and it's especially about the AI doing that in later versions more so that slightly earlier models.

[-]

SpicyLemonZest 18 hours ago

He should not have the kind of relationship with an AI that would enable discussions to turn into arguments. I'm not above anthropomorphizing Claude, I accept that it's my hard working little buddy. But if he finds himself having any sort of strong emotions about what Claude believes the best continuation of a conversation is, that's a warning flag he should be concerned about.

[-]

coldtea 17 hours ago

>He should not have the kind of relationship with an AI that would enable discussions to turn into arguments.

There doesn't need to be any kind of special "relationship with AI", parasocial or whatever for discussions to turn into arguments. Regular use can turn into that just fine, and this is also what they describe.

Imaging something:

P: I want to figure out the best mortgage terms given these parameters (...).

C: Honestly, renting would be a better financial choice than buying.

P: That's not what I asked. I'm not asking whether I should rent or buy—I'm asking about mortgage options.

C: But you asked for the best option. If renting is better than buying under these circumstances, then a mortgage isn't the best option.

And so on...

[-]

aleph_minus_one 17 hours ago

> P: I want to figure out the best mortgage terms given these parameters (...).

> C: Honestly, renting would be a better financial choice than buying.

> P: That's not what I asked. I'm not asking whether I should rent or buy—I'm asking about mortgage options.

> C: But you asked for the best option. If renting is better than buying under these circumstances, then a mortgage isn't the best option.

Thos rather sounds like the AI is gaslighting the user: the user asked for the best option on mortgage terms (given these parameters (...)). He never asked for the globally best option (which might also include non-mortgage options).

[-]

TylerE 17 hours ago

I’ve also never seen either Opus or Fable respond like that. It may offer alternatives, but it’ll always answer the question asked first.

40four 18 hours ago

Oh yeah? Go try Grok on “argumentative” mode and come back and tell me Claude is an a-hole. I forgot I was experimenting with the personalities and hadn’t used it in a while, then I picked it up again the other day and I was really confused. It’s so aggressive :)

cyberax 18 hours ago

I think models are just becoming better at not blindly following stupid instructions.

A previous model would happily generate 1000s lines of code when prompted to do something stupid, the newer models will ask if I really want that first.

And FINALLY they stopped doing that annoying "You're spot on! You're absolutely right!" nonsense.

[-]

dathinab 16 hours ago

or they increasingly copy what a lot of their training data does, discussion for discussion sake and arguments for the sake of winning them instead of productive outcomes.

probably the truth is somewhere in between

mrwaffle 18 hours ago

"You might be a narcissist if ..."

CamperBob2 11 hours ago

Because you didn't read the directions, and don't realize that there's a custom instructions mechanism that is used to specify the personality you prefer to interact with?

nullc 14 hours ago

The paternalistic attitude and historical approach to fictional concerns combined with indifference to actual harms is consistent with the creators of the product. Use a different product.

I'm getting fed up with the internet due to second-hand claude exposure and its constant gaslighting. I boggle at voluntarily choosing to expose yourself to it! :P