Expanding on what we missed with sycophancy

(openai.com)

234 points | by synthwave a day ago ago

256 comments

  • NoboruWataya a day ago

    I found the recent sycophancy a bit annoying when trying to diagnose and solve coding problems. First it would waste time praising your intelligence for asking the question before getting to the answer. But more annoyingly if I asked "I am encountering X issue, could Y be the cause" or "could Y be a solution", the response would nearly always be "yes, exactly, it's Y" even when it wasn't the case. I guess part of the problem there is asking leading questions but it would be much more valuable if it could say "no, you're way off".

    But...

    > Beyond just being uncomfortable or unsettling, this kind of behavior can raise safety concerns—including around issues like mental health, emotional over-reliance, or risky behavior.

    It's kind of a wild sign of the times to see a tech company issue this kind of post mortem about a flaw in its tech leading to "emotional over-reliance, or risky behavior" among its users. I think the broader issue here is people using ChatGPT as their own personal therapist.

    • photonthug 18 hours ago

      > But more annoyingly if I asked "I am encountering X issue, could Y be the cause" or "could Y be a solution", the response would nearly always be "yes, exactly, it's Y" even when it wasn't the case

      Seems like the same issue as the evil vector [1] and it could have been predicted that this would happen.

      > It's kind of a wild sign of the times to see a tech company issue this kind of post mortem about a flaw in its tech leading to "emotional over-reliance, or risky behavior" among its users. I think the broader issue here is people using ChatGPT as their own personal therapist.

      I'll say the quiet part out loud here. What's wild is that they appear to be apologizing that their Wormtongue[2] whisperer was too obvious to avoid being caught in the act, rather than prioritizing or apologizing for not building the fact-based councilor that people wanted/expected. In other words.. their business model at the top is the same as the scammers at the bottom: good-enough fakes to be deceptive, doubling down on narratives over substance, etc.

      [1] https://scottaaronson.blog/?p=8693 [2] https://en.wikipedia.org/wiki/Gr%C3%ADma_Wormtongue

      • neuroelectron 17 hours ago

        Well, that's always what LLM-based AI has been. It can be incredibly convincing but the bottom line is it's just flavoring past text patterns, billions of them it's been "trained" on, which is more accurately described as compressed efficiently onto latent space. Like if someone lived for 10,000 years engaging in small talk at the bar, has heard it all, and just kind of mindlessly and intuitively replied with something that sounds plausible for every situation.

        Sam Altman is the real sycophant in this situation. GPT is patronizing. Listening to Sam go off on tangent about science fiction scenarios that are just around the corner... I don't know how more people don't see through it.

        I kind of get the feeling the people who have to work him every day got sick of his nonsense and just did what he asked for. Targeting the self-help crowd, drive engagement, flatter users, "create the next paradigm of emotionally-enabled humans of perfect agency" or whatever the fuck it he was popping off about to try to motivate the team to compete better with Anthropic.

        He clearly isn't very smart. He clearly is product of nepotism. And clearly, LLM "AI" is an overhyped, overwrought version of 20 questions artificial intelligence enabled by mass data scale and NVidia video game graphics. it's been 4 years now of this and AI still tells me the most obviously wrong nonsense every day.

        "Are you sure about that?"

        "You're absolutely correct to be skeptical of ..."

        • photonthug 15 hours ago

          > which is more accurately described as compressed efficiently onto latent space.

          The actual difference between solving compression+search vs novel creative synthesis / emergent "understanding" from mere tokens is always going to be hard to spot with these huge cloud-based models that drank up the whole internet. (Yes.. this is also true for domain experts in whatever content is being generated.)

          I feel like people who are very optimistic about LLM capabilities for the later just need to produce simple products to prove their case; for example, drink up all the man pages, a few thousand advanced shell scripts that are easily obtainable, and some subset of stack-overflow. And BAM, you should have a offline bash oracle that makes this tiny subset of general programming endeavor a completely solved problem.

          Currently, smaller offline models still routinely confuse the semantics of "|" vs "||". (An embarrassing statistical aberration that is more like the kind of issue you'd expect with old school markov chains than a human-style category error or something.) Naturally if you take the same problem to a huge cloud model you won't have the same issue, but the argument that it "understands" anything is pointless, because the data-set is so big that of course search/compression starts to look like genuine understanding/synthesis and really the two can no longer be separated. Currently it looks more likely this fundamental problem will be "solved" with increased tool use and guess-and-check approaches. The problem then is that the basic issue just comes back anyway, because it cripples generation of an appropriate test-harness!

          More devs do seem to be coming around to this measured, non-hype kind of stance gradually though. I've seen more people mentioning stuff like, "wait, why can't it write simple programs in a well specified esolang?" and similar

          • skydhash 12 hours ago

            A naive thought: What you would get if you hardcode the language grammar and not let the training discern it, so instead of it, kinda like an expert system constraining its output?

            • ModernMech 2 hours ago

              I try to do something like this when I ask the AI to generate tests -- I'll cook up a grammar and feed it to the LLM in a prompt, and then ask it to generate strings from the grammar. It's pretty good at it, but it'll produce mistakes, which is why I'll write a parser for the grammar and have the LLM feed the strings it makes through the parser and correct them if they're wrong. Works well.

            • earnestinger 9 hours ago

              What kind of hard coding do you have in mind? How the technique would look like?

              • skydhash 5 hours ago

                We already know the keywords of the language and symbols from the standard library and othe major ones. As well as the rules of the grammar. So the weight can be biased against that. Not sure how that would work, though.

                I don’t think that would help with natural language to programming language, but that can probably help with patterns, kinda like a powerful suggestion engine.

              • Xmd5a 5 hours ago

                word2vec meets category theory

                https://en.wikipedia.org/wiki/DisCoCat

                >In this post, we are going to build a generalization of Transformer models that can operate on (almost) arbitrary structures such as functions, graphs, probability distributions, not just matrices and vectors.

                https://cybercat.institute/2025/02/12/transformers-applicati...

        • nyarlathotep_ 14 hours ago

          > it's been 4 years now of this and AI still tells me the most obviously wrong nonsense every day.

          It's remarkable seeing the change in sentiment in these parts, considering even just a year ago a large part of this forum seemed to regularly proclaim that programmers were done, lawyers were gone in 5 years, "Aye Gee Eye is coming", etc etc.

        • tmountain 10 hours ago

          If it’s just flavoring text patterns, how does it reason about code when I give it explicit arbitrary criteria? Is it just mining and composing “unit level” examples from all the code bases it has ingested?

        • Kiro 11 hours ago

          > I don't know how more people don't see through it.

          When you think that maybe you should take a step back and reflect on it. Could it be that your assessment is wrong rather than everyone else being delusional?

          • earnestinger 9 hours ago

            Parent is not alone in that line of thinking. (Some rudiments of reasoning do show, but result is dominated by excellent compression of humongous amount of data)

    • cjbgkagh a day ago

      For many people ChatGPT is already the smartest relationship they have in their lives, not sure how long we have until it’s the most fulfilling. On the upside it is plausible that ChatGPT can get to a state where it can act as a good therapist and help helpless who otherwise would not get help.

      I am more regularly finding myself in discussions where the other person believes they’re right because they have ChatGPT in their corner.

      I think most smart people overestimate the intelligence of others for a variety of reasons so they overestimate what it would take for a LLM to beat the output of an average person.

      • alickz 21 hours ago

        >I think most smart people overestimate the intelligence of others for a variety of reasons so they overestimate what it would take for a LLM to beat the output of an average person.

        I think most people also _vastly_ overestimate how much positive attention the average person gets in their lives

        It wouldn't surprise me if, for most people, ChatGPT offers them more empathy and understanding than _anyone_ else _ever has_, at least on a consistent basis. That kind of indefatigable emotional labor is just not feasible for most, even on a very short term basis, even for those with large support networks

        We can argue over whether or not it's "real" empathy, but I don't believe we can argue with the emotions of our attention starved brothers and sisters

        • rurp 20 hours ago

          >We can argue over whether or not it's "real" empathy

          There's nothing to argue about, it's unambiguously not real empathy. Empathy from a human exists in a much broader context of past and future interactions. One reason human empathy is nice is because it is often followed up with actions. Friends who care about you will help you out in material ways when you need it.

          Even strangers will. Someone who sees a person stranded on the side of a road might feel for them and stop to lend a hand. ChatGPT will never do that, and not just because interaction mediums are so limited, but also because that's not the purpose of the tool. The purpose of ChatGPT is to make immense amounts of money and power for its owners, and a nice sounding chat bot currently happens to be an effective way of getting there. Sam Altman doesn't have empathy for random ChatGPT users he's never met and neither do the computer algorithms his company develops.

          • alickz 19 hours ago

            >There's nothing to argue about, it's unambiguously not real empathy

            I think if a person can't tell the difference between empathy from a human vs empathy from a chatbot, it's a difference without a distinction

            If it activates the same neural pathways, and has the same results, then I think the mind doesn't care

            >One reason human empathy is nice is because it is often followed up with actions. Friends who care about you will help you out in material ways when you need it.

            This is what I think people vastly overestimate

            I don't think most people have such ready access to a friend who is both willing and able to perform such emotional labor, on demand, at no cost to themselves.

            I think the sad truth is that empathy is a much scarcer resource than we believe, not through any moral fault of our own, but because it's just the nature of things.

            The economics of emotions.

            We'll see what the future has in store for the tech anyway, but if it turns out that the average person gets more empathy from a chatbot than a human, it wouldn't surprise me

            • cgio 18 hours ago

              Empathy does not lie in its perception on receipt but in its inception as a feeling. It is fundamentally a manifestation of the modalities enabled in shared experience. As such impossible to the extent that our experiences are not compatible with those of an intelligence that does not put emphasis on lived context trying to substitute for it with offline batch learning. Understanding is possible in this relationship, but should not be confused with empathy or compassion.

              • quuxplusone 4 hours ago

                I happen to agree with what you said. (Paraphrasing: A machine cannot have "real empathy" because a machine cannot "feel" in general.) But I think you're arguing a different point from the grandparent's. rurp said:

                > Someone who sees a person stranded on the side of a road might feel for them and stop to lend a hand. ChatGPT will never do that [...]

                Now, on the one hand that's because ChatGPT cannot "see a person" nor "stop [the car]"; it communicates only by text-in, text-out. (Although it's easy to input text describing that situation and see what text ChatGPT outputs!) GP says it's also because "the purpose of ChatGPT is to make immense amounts of money and power for its owners [, not to help others]." I took that to mean that GP was saying that even if a LLM was controlling a car and was able to see a person in trouble (or a tortoise on its back baking in the sun, or whatever), then it still would not stop to help. (Why? Because it wouldn't empathize. Why? Because it wasn't created to empathize.)

                I take GP to be arguing that the LLM would not help; whereas I take you to be arguing that even if the LLM helped, it would by definition not be doing so out of empathy. Rather, it would be "helping"[1] because the numbers forced it to. I happen to agree with that position, but I think it's significantly different from GP's.

                Btw, I highly recommend Geoffrey Jefferson's essay "The Mind of Mechanical Man" (1949) as a very clear exposition of the conservative position here.

                [1] — One could certainly argue that the notions of "help" and "harm" likewise don't apply to non-intentional mechanistic forces. But here I'm just using the word "helping" as a kind of shorthand for "executing actions that caused better-than-previously-predicted outcomes for the stranded person," regardless of intentionality. That shorthand requires only that the reader is willing to believe in cause-and-effect for the purposes of this thread. :)

              • bondarchuk 5 hours ago

                > As such impossible to the extent that our experiences are not compatible with those of an intelligence that does not put emphasis on lived context trying to substitute for it with offline batch learning.

                Conversely that means empathy is possible to the extent that our experiences are compatible with those of an AI. That is precisely what's under consideration here and you have not shown that it is zero.

                an intelligence that does not put emphasis on lived context trying to substitute for it with offline batch learning.

                Will change your tune when online learning comes along?

            • biker142541 19 hours ago

              >If it activates the same neural pathways, and has the same results, then I think the mind doesn't care

              Boiling it down to neural signals is a risky approach, imo. There are innumerable differences between these interactions. This isn't me saying interactions are inherently dangerous if artificial empathy is baked in, but equating them to real empathy is.

              Understanding those differences is critical, especially in a world of both deliberately bad actors and those who will destroy lives in the pursuit of profit by normalizing replacements for human connections.

              • fastball 15 hours ago

                Can you define "real empathy"?

                • hxtk 14 hours ago

                  There's a book that I encourage everyone to read called Motivational Interviewing. I've read the 3rd edition and I'm currently working my way through the 4th edition to see what's changed, because it's a textbook that they basically rewrite completely with each new edition.

                  Motivational Interviewing is an evidence-based clinical technique for helping people move through ambivalence during the contemplation, preparation, and action stages of change under the Transtheoretical Model.

                  In Chapter 2 of the 3rd Edition, they define Acceptance as one of the ingredients for change, part of the "affect" of Motivational Interviewing. Ironically, people do not tend to change when they perceive themselves as unacceptable as they are. It is when they feel accepted as they are that they are able to look at themselves without feeling defensive and see ways in which they can change and grow.

                  Nearly all that they describe in chapter 2 is affective—it is neither sufficient nor even necessary in the clinical context that the clinician feel a deep acceptance for the client within themselves, but the client should feel deeply accepted so that they are given an environment in which they can grow. The four components of the affect of acceptance are autonomy support, absolute worth (what Carl Rogers termed "Unconditional Positive Regard"), accurate empathy, and affirmation of strengths and efforts.

                  Chapters 5 and 6 of the third edition define the skills of providing the affect of acceptance defined in Chapter 2—again, not as a feeling, but as a skill. It is something that can be taught, practiced, and learned. It is a common misconception to believe that unusually accepting people become therapists, but what is actually the case is that practicing the skill of accurate empathy trains the practitioner to be unusually accepting.

                  The chief skill of accurate empathy is that of "reflective listening", which essentially consists of interpreting what the other person has said and saying your interpretation back to them as a statement. For an unskilled listener, this might be a literal rewording of what was said, but more skilled listeners can, when appropriate, offer reflections that read between the lines. Very skilled listeners (as measured by scales like the Therapist Empathy Scale) will occasionally offer reflections that the person being listened to did not think, but will recognize within themselves once they have heard it.

                  In that sense, in the way that we measure empathy in settings where it is clinically relevant, I've found that AIs are very capable with some prompting of displaying the affect of accurate empathy.

          • Viliam1234 4 hours ago

            > Friends who care about you will help you out in material ways when you need it.

            LLMs help me write code, do various paperwork, etc. Going by this criterion, they already are true friends.

          • telchior 18 hours ago

            A lot of human empathy isn't real either. Defaulting to the most extreme example, narcissists use love bombing to build attachment. Sales people use "relationship building" to make money. AI actually seems better than these -- it isn't building up to a rug pull (at least, not one that we know of yet).

            And it's getting worse year after year, as our society gets more isolated. Look at trends in pig butchering, for instance: a lot of these are people so incredibly lonely and unhappy that they fall into the world's most obvious scam. AI is one of the few things that actually looks like it could work, so I think realistically it doesn't matter that it's not real empathy. At the same time, Sam Altman looks like the kind of guy who could be equally effective as a startup CEO or running a butchering op in Myanmar, so I hope like hell the market fragments more.

            • doright an hour ago

              This is a good point, you can't be dependent on a chatbot in the same way you're dependent on someone you share a lease with. If people take up chatbots en masse, maybe it says more about how they perceive the risk of virtual or physical human interactions vs AI. The people I have met in the past make the most sycophant AIs seem like a drop in the bucket by comparison. When you come back from that in real life, you remark that this is all just a bunch of text in comparison.

              I treat AIs dispassionately like a secretary I can give infinite amounts of work to without needing to care about them throwing their hands up. That sort of mindset is non-conducive to developing any feelings. With humans you need empathy to not burden them with excessive demands. If it solely comes down to getting work done (and not building friendships or professional relationships etc.) then that need to restrain your demands is a limitation of human biology that AIs kind of circumvent for specific workloads.

        • codr7 20 hours ago

          It's a pretty sucky solution to that problem imo, and I can see a substantial risk that it causes people to withdraw even more from real relations.

          • cjbgkagh 20 hours ago

            One concern that I do worry about is if LLMs are able to present an false attractive view of the world that the user will become increasingly dependent on the LLMs to maintain that view. A cult of 1. Reminds me of the episode 'Safe Space' from South Park but instead of Butters filtering content it'll be the LLM. People are already divorced enough from reality - but I see no reason why they couldn't be more divorced, at least temporarily.

            • danenania 19 hours ago

              It begs the question of who decides what “reality” is though. A lot of people have an unrealistically negative view of themselves and their abilities—often based on spending time around pessimistic or small-minded humans.

              In that case, if an AI increases someone’s confidence in themselves, you could say it’s giving them a stronger sense of reality by helping them to question distorted and self-limiting beliefs.

              • codr7 18 hours ago

                Reality as in the real world, it is what it is, no one decides.

                • danenania 18 hours ago

                  We're talking about psychology, therapy, sycophancy, etc. None of this is empirical.

                  If someone thinks they can, say, create a billion dollar startup, whether they can really do it or not is a subjective determination. The AI might tell the person they can do it. You might tell them they can't, that the AI is sycophantic, and that they should stop talking to it because they're losing touch with reality.

                  But is the AI a sycophant, or are you an irrational pessimist?

                  • dns_snek 6 hours ago

                    That's easy. What makes someone a sycophant, by definition, is that their encouragement and flattery is unconditional and completely disconnected from any sort of realistic consideration of your ideas.

                    You can't judge whether LLM is acting like a sycophant without reading the conversation, and you can't judge whether a human is being an irrational pessimist without having the full context.

                    Are they a highly intelligent, technically skilled, and socially competent person (probably not if they discuss their ideas with ChatGPT instead of a friend), or do they have a high school diploma, zero practical skills, and have spent the past 20 years smoking weed all day?

                  • TheOtherHobbes 15 hours ago

                    The AI will be saying the same thing to everyone. Rationally, what are the chances every single OpenAI customer will be building a billion dollar startup any time soon?

                    But even it's more obvious than that. The sycophancy is plain old love bombing, which is a standard cult programming technique.

                    As for startups - let's wait until the AI has built a few of its own, or at least mentored humans successfully.

                  • drdeca 11 hours ago

                    That depends on whether they are capable of creating a billion dollar startup.

                    If they aren’t, and I say they aren’t, then I am correct. If they are, and the AI’s output says they are, then AI’s output is correct.

                    • genrilz 5 hours ago

                      We get into a bit of a weird space though when they know your opinions about them. I'm sure there are quite a few people who can only build a billion dollar startup if someone emotionally supports them in that endeavor. I'm sure more people could build such a startup if those around them provide knowledge or financial support. In the limit, pretty much anyone can build a billion dollar startup if handed a billion dollars. Are these people capable or not capable of building a building a billion dollar startup.

                      EDIT: To be clear, I somehow doubt an LLM would be able to provide the level of support needed in most scenarios. However, you and others around the potential founder might make the difference. Since your assessment of the person likely influences the level of support you provide to them, your assessment can affect the chances of whether or not they successfully build a billion dollar startup.

          • alickz 19 hours ago

            Hopefully there are better solutions to the fundamental limitations of societal empathy in the future, but for now i just can't see any

            Seems to me empathy on a societal scale has been receding as population grows, not increasing to match (or outpace)

            Telling people to seek empathy elsewhere to me will be about as useful as telling people at an oasis in the desert to look for water elsewhere, but i hope i'm wrong

      • itchyjunk a day ago

        For a subset of topics, "AI" is already what I prefer to interact with over humans. At times, it's nicer to start with "AI" and kind of ground my messy thoughts before interacting with people and that works better than directly starting with a person.

        I'm also starting to come across people who say "You should ask these types of questions to AI first" answer. But this is no different than people who preached "lmfgt" kind of religiously. Even when I prefer to start with humans, some humans prompt me to start by prompting AI.

        • cjbgkagh a day ago

          I see the same.

          I'm waiting on LLMs to get good enough that I can use them to help me learn foreign languages - e.g. talk to me about the news in language X. This way I can learn a language in an interesting and interactive way without burdening some poor human with my mistakes. I would build this myself but others will probably beat me too it.

          • CouchYam 21 hours ago

            I sometimes prompt the LLM to talk to me as a <language> instructor - to suggest a topic, ask a question, read my response, correct my grammar, and suggest alternate vocabulary where appropriate. This works quite well. Similar to your comment, I am often hesitant to butcher a language in front of a real person :-).

            • skydhash 12 hours ago

              The first step to really learn a language is to be confident and forgive yourself for any mistakes (you’re starting late anyway , and juggling other things).

              These days, I do my best to learn and reflect. But any mistakes is just a reminder for more learning (and practice).

              • thaumasiotes 14 minutes ago

                > forgive yourself for any mistakes (you’re starting late anyway)

                Do you think babies don't make mistakes?

                • skydhash 11 minutes ago

                  They do, but they also don’t care. Adults tend to do care about that.

                  • thaumasiotes 7 minutes ago

                    So what's the idea behind "you're starting late"?

            • bayindirh 21 hours ago

              The problem is, AI doesn't let you, or encourage you to create your own style. Word choices, structure, flow, argument building and discourse style is very fixed and "average", since it's a machine favors what it ingests most.

              I use Grammarly for grammar and punctuation, and disable all style recommendations. If I let it loose on my piece of text, it converts it to a slop. Same bland, overly optimistic toned text generator output.

              So, that machine has no brain, use your own first.

            • QuercusMax 20 hours ago

              I contribute to a language-learning forum as a native English speaker, and we constantly get questions from people who are doing exactly what you're doing. The AI does not understand the language, and it will tell you blatantly incorrect information. Especially with less-common constructs, you'll just get very bad advice.

          • croemer 14 hours ago

            LLMs are already good enough to tell you the news in language X and listen to your broken attempts at asking questions back. Or what's missing?

            • cjbgkagh 14 hours ago

              They do seem close to being able to, I'm mostly waiting on someone to provide it as a service - I don't have time for side projects ATM.

              • istjohn 11 hours ago

                It's not a project. Just open Claude or ChatGPT and tell it what you want to do. You could copy/paste your comment above. If you want to get slightly more fancy, use Google's AIStudio with a simple system prompt.

                • cjbgkagh 3 hours ago

                  I would like to do it semi-passively through talking while I’m doing other things like going for a walk, doing chores etc. I don’t have the time at the moment to sit down and focus on this.

      • taurath a day ago

        I think that most smart people underestimate the complexity of fields they aren’t in. ChatGPT may be able to replace a psychology listicle, but it has no affect or ability to read, respond, and intervene or redirect like a human can.

        • treis 20 hours ago

          They definitely can. Here's an example when I tried it:

          Me: I'm struggling with some things today and feel like eating a bunch of food to make myself feel better. can we talk about it?

          Deepseek: Absolutely, I’m here to listen. It’s completely understandable to turn to food for comfort when you’re struggling—many people do. But if you’re open to it, we can explore what’s going on beneath the urge to eat emotionally.

          Would you like to share what’s weighing on you today? Sometimes just naming the feelings can take some of their power away. And if you’d rather not go into details, that’s okay too—we could brainstorm other ways to cope that might leave you feeling more nourished emotionally (and physically).

          Some gentle questions to consider (no pressure to answer all or any): - What’s the emotion you’re trying to soothe? (Loneliness? Stress? Sadness?) - If eating wasn’t an option, what’s one tiny thing that might bring you a moment of relief? (A hug? A walk? A favorite song?)

          You’re not alone in this.

          That's a really solid opening response and it kept doing well from there. I haven't tried a complex psychological issue but for the "I'm struggling and need someone to talk to" it does surprisingly well.

          • taurath 8 hours ago

            ChatGPT can’t be in the room with you. Can’t evoke psychological mirroring. Is AI going to cure loneliness? Will the AI be empathetic? Will it be capable of showing love? Can it give you the acceptance of another human being? Do these things matter if they’re simulated? Do these things matter at all?

            I don’t deny the helpfulness of LLMs on many aspects of therapy and mental health. I’ve used it myself. Relational problems (of which almost all mental health is a part) requires relational healing.

            I think we’re going to need to learn a lot more information about what being a human means, else we continue to in the name of business and financial efficiency lose something very important. I’m an optimist, but we have a lot of challenges ahead.

        • ben_w 21 hours ago

          Both statements can be simultaneoulsy true.

          45% of the US[0] have a degree, about 40% EU[1] graduate, and 54% of China[2] get at least a diploma from university.

          The best AI behave like someone fresh out of university without much real world experience.

          Personally, I use this as a way to stay humble: when the AI is teaching me fundamentals about some subject, my opinion about it can't possibly be very useful.

          [0] https://en.wikipedia.org/wiki/Educational_attainment_in_the_...

          [1] https://euranetplus-inside.eu/eu-maps-what-proportion-of-you...

          [2] https://en.wikipedia.org/wiki/Higher_education_in_China#Chal...

        • kergonath 20 hours ago

          > I think that most smart people underestimate the complexity of fields they aren’t in.

          And people deep in new technologies overestimate the potential effect of $new_tech_du_jour. You cannot solve a problem without understanding it and its reasons. And LLMs are not able to understand something.

          • pixl97 20 hours ago

            People have solved problems for most of history without understanding them. For example problems can be brute forced.

            • bluefirebrand 11 hours ago

              You cannot brute force a problem without understanding the problem

              I think you have mistaken "understanding the problem" with "knowing a solution to the problem"

              • hackable_sand 2 hours ago

                You can brute force a problem without understanding the problem.

                • bluefirebrand an hour ago

                  Would you be kind enough to give an example of this?

                  I disagree, but I'm open to being convinced

                  I just don't think you can solve a problem in any way without understanding it first, at least at a high level

                  I think if you know a solution you want to reach then you can work backwards from it in a brute force way to reach the solution you want

                  But I don't think you can take a problem where you aren't sure what the solution is and brute force the solution without understanding the problem

            • suddenlybananas 19 hours ago

              For example, "wild generalization that has no basis."

        • cjbgkagh a day ago

          Underestimating the complexity of other fields is not mutually exclusive with overestimating the intelligence of others. The real issue is that society is very stratified so smart people are less likely to interact with regular people, especially in circumstances where the intelligence of the regular person could become obvious.

          I don’t see there being an insurmountable barrier that would prevent LLMs from doing the things you suggest it cannot. So even assuming you are correct for now I would suggest that LLMs will improve.

          My estimations don’t come from my assumption that other people’s jobs are easy, they come from doing applied research in behavioral analytics on mountains of data in rather large data centers.

          • taurath 8 hours ago

            Most human intelligence is within a fairly narrow band. Most people I’ve ever met have their own unique intelligences. Perhaps it might be good to meet more people without holding the self-looping dichotomy of “smart people” vs “normal people”. In my experience it tends to lead to huge cognitive errors.

            • cjbgkagh 3 hours ago

              As mentioned I’m in the rather unique position to have analyzed the complete browser history for a substantial number of people - I have learned far more than I wished to.

              The behaviors of very high IQ people are rather distinct from regular IQ people due to IQ being both largely generic and those genes having other comorbidities. Most obviously is the depression, anxiety, and bipolar disorders. This is so obvious that even regular researchers have uncovered it.

              I think what happens to many people is they confuse their desired reality with actual reality by looking at everything through a tinted lens. In being a data scientist in pursuit of actual reality I’ve had my desired reality repeatedly challenged far more than a person not in this industry. My desired reality was that intelligence was more common and I believed that until it was shown to me in data that I was wrong.

          • svieira a day ago

            Do you presume that "what people do" is "what they should do"?

            • cjbgkagh a day ago

              If you are suggesting that people shouldn't underestimate the difficulty of the jobs of others - my answer is a strong yes. People should strive for accuracy in all cases. But I did suggest that even if true it does not negate my assertion so I am failing to see the relevance. Perhaps I have misunderstood your point.

              • svieira 21 hours ago

                Sorry, I was rather obscure - you said "My estimations don’t come from my assumption that other people’s jobs are easy, they come from doing applied research in behavioral analytics on mountains of data in rather large data centers."

                And so I considered the preceding discussion in light of your last sentence. Which makes it sound like you are saying "I've observed the behavior of people and they're often flawed and foolish, regardless of the high ideals they claim to be striving for and the education they think they have. Therefore, they will do better with ChatGPT as a companion than with a real human being". But that's quite a few words that you may not have intended, for which I apologize!

                What did you mean?

                • cjbgkagh 20 hours ago

                  It wasn't that I observed them being foolish but many behaviors are subtly linked to intelligence and can be combined to create a proxy IQ. It also helps when people search their SAT scores. I noted that the people I typically interact with are much higher IQ than I had expected which incorrectly skewed my believe of the average higher. I noticed that other high IQ individuals were making the same assumptions. I had very much underestimated how little I interact with regular people.

                  I think we're already finding out that people are doing better with ChatGPT than with their peers, not all peers are created equal, and they can ask ChatGPT things that they cannot ask their peers. I think this trend will continue to the point that most people will prefer discussing things with ChatGPT than with their peers. Given what I know I predict this is a choice many people will make, I'm not passing judgment on that, it's a choice I've also made and I'm fortunate enough to have better peers than most.

          • fallinditch 21 hours ago

            > So even assuming you are correct for now I would suggest that LLMs will improve

            Yes, and when we can all wear smart glasses the ways we use them will become increasingly influential in our daily lives: a conversational voice assistant that is visually monitoring our surroundings, helping with decision making (including micro decisions), coaching, carrying out our instructions, etc.

        • istjohn 11 hours ago

          There are a lot of awful therapists out there. I wager that Claude Sonnet 3.7 given a suitable, straightforward system prompt would handily outperform non-doctoral degree therapists in a clinical trial, even if the humans had the advantage of in-person sessions.

          • taurath 8 hours ago

            I would want to bet against it but then we’d have to agree on what “performance” means. Also agree there are horrible therapists. The funny part about horrible therapists is that sometimes they can actually be extremely good for one population of people. Will AI be able to connect with all sorts of people? Will we have AI politicians (do we now?)?

            It’s sorta like saying AI will be a better friend. We’ll see about that - I don’t consider profit seeking enterprises to be my friend.

        • philwelch a day ago

          You’re comparing ChatGPT to an idealized example of a good human therapist when many actual therapists are either useless or even actively harmful to the mental health of their clients.

          • taurath 8 hours ago

            These therapists exist. There also exists therapists who certain people gain a ton of help from that another would go running for the hills. Not all therapists are meant to treat all clients - not just in terms of methodology, experience with given diagnoses, but also on a generational and cultural basis.

            This idea that there is some “best” based on a synthesis of all content is inherently wrong - therapy more than most other things is personal and personalized. Human connection is not just a series of levers and response tokens.

          • kergonath 20 hours ago

            But then, the fact that harmful therapist exist is not an excuse to make it worse. It’s an excuse to improve regulations.

            “Car accidents happen regardless of what we do, so YOLO and remove safety standards” is never going to fly.

            • pixl97 19 hours ago

              That is a messy one here in the US. Almost every time we attempt to increase regulations around medical stuff we end up increasing costs and consolidation making care even more unavailable.

            • istjohn 11 hours ago

              Displacing those bad therapists would be an improvement.

              • taurath 8 hours ago

                That’s why therapists have to be licensed. They decide what a “bad” therapist is, and delicense those who’ve caused significant harm to their clients - it’s a difficult process though, but it also should be. Once you get into it, you find that people have already thought of these solutions and actually put things in place to work towards a better system.

                Except healthcare payments. That shit was designed to make money, not make people healthy.

            • philwelch 15 hours ago

              It’s easy to just say that the regulations should be improved. Very different to actually improve them. Therapy isn’t a mass produced engineered product like a car; if therapy was a mass produced engineered product, it would be an AI anyway. Materials science and structural engineering are far more mature and well-understood sciences than psychology, and you can’t just throw a crash test dummy into a therapist’s office and measure how damaged it gets.

              It’s also not really clear how such regulations could even work. The regulations we have now are basically the obvious ones around licensure that require people to go to the right schools and whatnot. And then you can lose your license if it turns out you’ve broken the big ethical rules. But at the end of the day, that only regulates who can call themselves a “therapist” and get listed in Psychology Today. Actually “doing” “therapy” is, ultimately, built on talking to someone about your problems in some way that is supposed to help you solve them. You don’t need a “therapist” to do that. You can do it with your friend or parent or pastor or bartender or guru or “life coach” and, as long as we live in a free country, nobody’s going to stop you. Sure, the people who are allowed to call themselves therapists have certain techniques and rules that make them different, but even if that was a guarantee of quality there’s no way to stop people from talking to someone other than a licensed therapist, and it would be kind of absurd and dystopian to even try.

              So let’s dispense of the notion that we are some sort of omniscient god-emperor who can just magically fix things with vague “regulations” and talk about the world as it actually exists. For a lot of people, I think that’s a world where talking about their personal issues with an LLM is arguably no worse than whatever other options they have. Maybe it’s not the equivalent of whatever amazing therapist you know or have or are or can imagine, but that’s not the therapist that everyone is going to get.

      • nsajko a day ago

        > it is plausible that ChatGPT can get to a state where it can act as a good therapist

        Be careful with that thought, it's a trap people have been falling into since the sixties:

        https://en.wikipedia.org/wiki/ELIZA_effect

        • cjbgkagh a day ago

          Eventual plausibility is a suitably weak assertion, to refute it you would have to at least suggest that it is never possible which you have not done.

        • diggan a day ago

          I dunno, I feel like most people (probably not the typical HN user though) don't even think about their feelings, wants or anything else introspective on a regular basis. Maybe having something like ChatGPT available could be better than nothing, at least for people to start being at least a bit introspective, even if it's LLM-assisted. Maybe it gets a bit easier to ask questions that you feel are stigmatized, as you know (think) no other human will see it, just the robot that doesn't have feelings nor judge you.

          I agree that it probably won't replace a proper therapist/psychologist, but maybe it could at least be a small step to open up and start thinking?

          • kergonath 20 hours ago

            > I feel like most people (probably not the typical HN user though) don't even think about their feelings, wants or anything else introspective on a regular basis.

            Well, two things.

            First, no. People who engage on HN are a specific part of the population, with particular tendencies. But most of the people here are simply normal, so outside of the limits you consider. Most people with real social issues don’t engage in communities, virtual or otherwise. HN people are not special.

            Then, you cannot follow this kind of reasoning when thinking about a whole population. Even if people on average tend to behave one way, this leaves millions of people who would behave otherwise. You simply cannot optimise for the average and ignore the worst case in situations like this, because even very unlikely situations are bound to happen a lot.

            > Maybe having something like ChatGPT available could be better than nothing, at least for people to start being at least a bit introspective, even if it's LLM-assisted.

            It is worse than nothing. A LLM does not understand the situation or what people say to it. It cannot choose to, say, nudge someone in a specific direction, or imagine a way to make things better for someone.

            À LLM regresses towards the mean of its training set. For people who are already outside the main mode of the distribution, this is completely unhelpful, and potentially actively harmful. By design, a LLM won’t follow a path that was not beaten in its training data. Most of them are actually biased to make their user happy and validate what we tell them rather than get off that path. It just does not work.

            > I agree that it probably won't replace a proper therapist/psychologist, but maybe it could at least be a small step to open up and start thinking?

            In my experience, not any more than reading a book would. Future AI models might get there, I don’t think their incompetence is a law of nature. But current LLM are particularly harmful for people who are in a dicey psychological situation already.

            • diggan 20 hours ago

              > It is worse than nothing. A LLM does not understand the situation or what people say to it. It cannot choose to, say, nudge someone in a specific direction, or imagine a way to make things better for someone.

              Right, no matter if this is true or not, if the choice is between "Talk to no one, bottle up your feelings" and "Talk to an LLM that doesn't nudge you in a specific direction", I still feel like the better option would be the latter, not the former, considering that it can be a first step, not a 100% health care solution to a complicated psychological problem.

              > In my experience, not any more than reading a book would.

              But to even get out in the world to buy a book (literally or figuratively) about something that acknowledges that you have a problem, can be (at least feel) a really big step that many are not ready to take. Contrast that to talking with a LLM that won't remember you nor judge you.

              Edit:

              > Most people with real social issues don’t engage in communities, virtual or otherwise.

              Not sure why you're focusing on social issues, there are a bunch of things people deal with on a daily basis that they could feel much better about if they even spent the time to think about how they feel about it, instead of the typical reactionary response most people have. Probably every single human out there struggle with something, and are unable to open up about their problems with others. Even people like us who interact with communities online and offline.

              • danenania 18 hours ago

                I think people are getting hung up on comparisons to a human therapist. A better comparison imo is to journaling. It’s something with low cost and low stakes that you can do on your own to help get your thoughts straight.

                The benefit from that perspective is not so much in receiving an “answer” or empathy, but in getting thoughts and feelings out of your own head so that you can reflect on them more objectively. The AI is useful here because it requires a lot less activation energy than actual journaling.

              • kergonath 19 hours ago

                > Right, no matter if this is true or not, if the choice is between "Talk to no one, bottle up your feelings" and "Talk to an LLM that doesn't nudge you in a specific direction", I still feel like the better option would be the latter, not the former, considering that it can be a first step, not a 100% health care solution to a complicated psychological problem.

                You’re right, I was not clear enough. What would be needed would be a nudge in the right direction. But the LLM is very likely to nudge in another because that’s what most people would need or do, just because that direction was the norm in its training data. It’s ok on average, but particularly harmful to people who are in a situation to have this kind of discussion with a LLM.

                Look at the effect of toxic macho influencers for an example of what happens with harmful nudges. These people need help, or at least a role model, but a bad one does not help.

                > But to even get out in the world to buy a book (literally or figuratively) about something that acknowledges that you have a problem, can be (at least feel) a really big step that many are not ready to take.

                Indeed. It’s something that should be addressed in mainstream education and culture.

                > Not sure why you're focusing on social issues,

                It’s the crux. If you don’t have problems talking to people, you are much more likely to run into someone who will help you. Social issues are not necessarily the problem, but they are a hurdle in the path to find a solution, and often a limiting one. Besides, if you have friends to talk to and are able to get advice, then a LLM is even less theoretically useful.

                > Probably every single human out there struggle with something, and are unable to open up about their problems with others. Even people like us who interact with communities online and offline.

                Definitely. It’s not a problem for most people, who either can rationalise their problems themselves with time or with some help. It gets worse if they can’t for one reason or another, and it gets worse still if they are mislead intentionally or not. LLMs are no help here.

                • TheOtherHobbes 14 hours ago

                  I think you're unreasonably pessimistic in the short term, and unreasonably optimistic in the long term.

                  People are getting benefit from these conversations. I know people who have uploaded chat exchanges and asked an LLM for help understanding patterns and subtext to get a better idea of what the other person is really saying - maybe more about what they're really like.

                  Human relationship problems tend to be quite generic and non-unique, so in fact the averageness of LLMs becomes more of a strength than a weakness. It's really very rare for people to have emotional or relationship issues that no one else has experienced before.

                  The problem is more that if this became common OpenAI could use the tool for mass behaviour modification and manipulation. ChatGPT could easily be given a subtle bias towards some belief system or ideology, and persuaded to subtly attack competing systems.

                  This could be too subtle to notice, while still having huge behavioural and psychological effects on entire demographics.

                  We have the media doing this already. Especially social media.

                  But LLMs can make it far more personal, which means conversations are far more likely to have an effect.

    • Telemakhos 43 minutes ago

      Therapy is currently the leading use of AI.

      https://hbr.org/2025/04/how-people-are-really-using-gen-ai-i...

      • dcrazy 27 minutes ago

        If this is true, it’s a big problem. A human therapist is bound to a code of ethics and laws. Their patients are in an intentionally vulnerable position—therapy only works when you are completely honest with your therapist and are open to their suggestions. If challenged, a human therapist can explain their reasoning for pursuing a line of questioning, and defend themselves against accusations of manipulation. A large language model can’t do any of that, nor can any of the people who trained it. The LLM can pretend to explain its reasoning, but it has no conviction, no morals, and no fear of consequence. It’s just a black box of statistics.

    • frereubu 19 hours ago

      This reminds me of a Will Self short story called Caring Sharing from his collection Tough, Tough Toys for Tough, Tough Boys where everyone has an "emoto", a kind of always-loving companion that people go to for reassurance if they're feeling any negative emotions such as anxiety. As I remember it, in the story two people are potentially falling for each other, but are so caught up in their anxiety that they never quite manage to get together, constantly running back to their emoto for reassurance because they can't get over their anxiety by themselves. The emotos essentially cripple everyone's ability to deal with their own feelings. There's a comment further down which also chimes with this: "It wouldn't surprise me if, for most people, ChatGPT offers them more empathy and understanding than _anyone_ else _ever has_, at least on a consistent basis." I wonder.

      • teach 19 hours ago

        Replace the emoto with alcohol or weed or what-have-you, and you've basically described what often happens with addicts.

        source: am addict in recovery

    • Jonovono 21 hours ago

      It has already replaced therapists, the future is just not evenly distributed yet. There are videos with millions of views on tiktok and comments with hundreds of thousands of likes of teenage girls saying they have gotten more out of 1 week using ChatGPT as a therapist than years of human therapy. Available anytime, cheaper, no judgement, doesn't bring there own baggage, etc.

      • didericis 20 hours ago

        > no judgement

        The value of a good therapist is having an empathetic third party to help you make good judgements about your life and learn how to negotiate your needs within a wider social context.

        Depending on the needs people are trying to get met and how bad the people around them are, a little bit of a self directed chatbot validation session might help them feel less beat down by life and do something genuinely positive. So I’m not necessarily opposed to what people are doing with them/in some cases it doesn’t seem that bad.

        But calling that therapy is both an insult to genuinely good therapists and dangerous to people with genuine mental/emotional confusion or dysregulation that want help. Anyone with a genuinely pathological mental state is virtually guaranteed to end up deeper in whatever pathology they’re currently in through self directed conversations with chatbots.

        • Springtime 20 hours ago

          Reading between the lines I think a key part of what makes chatbots attractive, re lack of judgment, is they're like talking to a new stranger every session.

          In both IRL and online discussions sometimes a stranger is the perfect person to talk to about certain things as they have no history with you. In ideal conditions for this they have no greater context about who you are and what you've done which is a very freeing thing (can also be taken advantage of in bad faith).

          Online and now LLMs add an extra freeing element, assuming anonymity: they have no prejudices about your appearance/age/abilities either.

          Sometimes it's hard to talk about certain things when one feels that judgment is likely from another party. In that sense chatbots are being used as perfect strangers.

          • didericis 19 hours ago

            Agreed/that’s a good take.

            Again, I think they have utility as a “perfect stranger” as you put it (if it stays anonymous), or “validation machine” (depending on the sycophancy level), or “rubber duck”.

            I just think it’s irresponsible to pretend these are doing the same thing skilled therapists are doing, just like I think it’s irresponsible to treat all therapists as equivalent. If you pretend they’re equivalent you’re basically flooding the market with a billion free therapists that are bad at their job, which will inevitably reduce the supply of good therapists that never enter the field due to oversaturation.

          • danenania 17 hours ago

            Also important is simply that the AI is not human.

            We all know that however "non-judgmental" another human claims to be, they are having all kinds of private reactions and thoughts that they aren't sharing. And we can't turn off the circuits that want approval and status from other humans (even strangers), so it's basically impossible not to mask and filter to some extent.

        • istjohn 10 hours ago

          I wouldn't trust ChatGPT to help someone in a mental health crisis, but I would be glad to find out my dad had started using Claude Sonnet to process his transition into retirement. I believe Sonnet would encourage a user to seek professional help when appropriate, too. In my experience, genuinely good therapists are hard to find--probably 75% of them are going to be strictly worse than Sonnet.

        • Spooky23 17 hours ago

          The problem with this is they are practicing like medical providers without any quality assurance or controls to ensure they are behaving appropriately.

          Therapy is already a bit of grey zone… you can have anyone from a psychologist, a social worker, an untrained deacon, etc “counseling” you. This is worse.

          Hell, I’ve been a coach in different settings - players will ask for advice about all sorts of things. There’s a line where you have to say “hey, this is over my head”

          • mynameisash 14 hours ago

            Kind of reminds me of an interview question that a friend of mine suggested for when I conduct interviews: Pick your favorite/strongest language. How would you rate yourself, where 0 is "complete newbie" and 10 is "I invented the language"?

            My friend, an EXTREMELY competent C++ programmer, rates himself 4/10 because he knows what he doesn't know.

            I've interviewed people who rated themselves 9 or 10/10 but couldn't remember how their chosen language did iteration.

            • istjohn 10 hours ago

              Sounds like a bad question then, no?

      • autoexec 20 hours ago

        > There are videos with millions of views on tiktok and comments with hundreds of thousands of likes of teenage girls saying they have gotten more out of 1 week using ChatGPT as a therapist than years of human therapy.

        You can find influencers on tiktok recommending all kinds of terrible ideas and getting thousands of likes. That's not a very reliable metric. I wouldn't put a lot of faith in a teenage girl's assessment of AI therapy after just one week either, and I certainly wouldn't use that assessment to judge the comparative effectiveness of all human therapists.

        I'd also expect ChatGPT to build profiles on people who use it, to use the insights and inferences from that collected data against the user in various ways, to sell that data in some form to third parties, to hand that data over to the state, to hallucinate wildly and unpredictably, and to outright manipulate/censor AI's responses according to ChatGPT's own values and biases or those of anyone willing to pay them enough money.

        It's a lot easier to pay a large amount of money to ChatGPT so that the AI will tell millions of vulnerable teenage girls that your product is the solution to their exact psychological problems than it is to pay large amounts of money to several million licensed therapists scattered around the globe.

        Maybe you think that ChatGPT is unfailingly ethical in all ways and would never do any of those things, but there are far more examples of companies who abandoned any commitment to ethics they might have started with than there are companies who never got once greedy enough to do those types of things and never ever got bought up by someone who was. I suppose you'd also have to think they'll never have a security breach that would expose the very private information being shared and collected.

        Handing over your highly sensitive and very personal medical data to the unlicensed and undependable AI of a company that is only looking for profit seems extremely careless. There are already examples of suicides being attributed to people seeking "therapy" from AI, which has occasionally involved that AI outright telling people to kill themselves. I won't deny that the technology has the potential to do some good things, but every indication is that replacing licensed therapists with spilling all your secrets to a corporate owned and operated AI will ultimately lead to harm.

      • coastalpuma 19 hours ago

        Just the advantage of being available at convenient times, rather than in the middle of the day sandwiched between or immediately after work/school is huge.

      • wizzwizz4 21 hours ago

        Is a system optimised (via RLHF) for making people feel better in the moment, necessarily better at the time-scale of days and weeks?

      • disruptthelaw 20 hours ago

        Yes. While these claims might be hyperbolic and simplistic, I don’t think they’re way off the mark.

        The above issue, whilst relevant and worth factoring, doesn’t disprove this claim IMO.

      • a_wild_dandan 21 hours ago

        Remembers everything that you say, isn't limited to an hour session, won't ruin your life if you accidentally admit something vulnerable regarding self-harm, doesn't cost hundreds of dollars per month, etc.

        Healthcare is about to radically change. Well, everything is now that we have real, true AI. Exciting times.

        • tomalbrc 20 hours ago

          Openly lies to you, hallucinates regularly, can barely get a task done. Such exciting.

          Oh and inserts ads into conversations. Great.

          • astrange 12 hours ago

            > Oh and inserts ads into conversations. Great.

            Are you sure you don't have browser malware?

        • codr7 20 hours ago

          Quick reminder that it's still just a fancy pattern matcher, there's no clear path from where we are to AGI.

          • mensetmanusman 20 hours ago

            >you are a stochastic parrot >no I’m not >yes you are

    • msabalau 2 hours ago

      > It's kind of a wild sign of the times to see a tech company issue this kind of post mortem about a flaw in its tech leading to "emotional over-reliance, or risky behavior"

      Their intended users are basically everyone in society--people who a low in many types of intelligence, including social intelligence. People with a range of emotional and mental health challenges. Children. Naive users who anthropomorphize AI, and can't be expected to resist this impulse at every moment of interaction.

      They aren't designing exclusively for very bright, emotionally adjusted, people deeply familiar with technology, working in a professional context.

      You might find it "wild" that they are consider all of this. That's fine. But a company sensibly is doing this work ought to avoid hiring you.

    • Henchman21 a day ago

      > I think the broader issue here is people using ChatGPT as their own personal therapist.

      An aside, but:

      This leads me right to “why do so very many people need therapy?” followed by “why can’t anyone find (or possibly afford) a therapist?” What has gone so wrong for humanity that nearly everyone seems to at least want a therapist? Or is it just the zeitgeist and this is what the herd has decided?

      • kadushka 21 hours ago

        I've never ever thought about needing a therapist. Don't remember anyone in my circle who had ever mentioned it. Similar to how I don't remember anyone going to a palm reader. I'm not trying to diss either profession, I'm sure someone benefits from them, it's just not for me. And I'm sure I'm pretty average in terms of emotional intelligence or psychological issues. Who are all those people who need professional therapists to talk to? Just curious.

        • istjohn 10 hours ago

          You saying talk therapy just isn't for you is like me saying physical therapy just isn't for me. Well, I'm glad that everything is functioning well for you in that area, but it would be kind of silly for me to write off physical therapy because I haven't needed it yet. Talk therapy can help people with social anxiety or chronic depression, but it can also help people cope with major life changes like losing a loved one, losing a job, getting divorced, or even retiring. We all have difficult moments in our life.

        • automatoney 20 hours ago

          A little strange to compare it to palm reading, I feel like a more apt comparison is some other random medical field like podiatry. I wouldn't expect my friends' podiatrist usage to come up, so I'm sure more of my friends than I know have been to one. And presumably, like with podiatry, all the people who need professional therapists are people who are experiencing issues in the relevant area.

          • kadushka 20 hours ago

            To me a podiatrist is more comparable to a psychiatrist than to a therapist.

        • Henchman21 an hour ago

          Reading this the next day, my takeaway is that while you may not have intended to “diss”, you sure as fuck did it anyway.

          You directly compared therapy to snake oil (palm reading). Then by implication determined that since you were average (and hence normal) no one else would need therapy unless they were, by implication abnormal.

          I don’t believe I have ever been so thoroughly insulted online ever. Well done.

        • kergonath 20 hours ago

          > I've never ever thought about needing a therapist.

          Most people don’t need a therapist. But unfortunately, most people need someone empathic they can talk to and who understands them. Modern life is very short on this sort of people, so therapists have to do.

          • istjohn 10 hours ago

            I disagree. Most people will go through a time in their life where they could benefit from a therapist--maybe to cope with bullying in high school--perhaps to mentally and emotionally process a terminal illness or the loss of a spouse or child.

            And then some people will benefit from just having someone to talk to about the stresses of daily life. I don't think that means they have an inadequate social network or that it reflects poorly on modern society. We should be happy to live in a time where we have the resources and ability to proactively care for our mental health just as we do our teeth.

          • kbelder 18 hours ago

            I think this is it. Therapists aren't so much curing a past trauma or treating a mental issue; they're fulfilling an ongoing need that isn't being met elsewhere.

            I do think it can be harmful, because it's a confidant you're paying $300/hour to pretend to care about you. But perhaps it's better than the alternative.

          • kadushka 20 hours ago

            For me this would be a spouse, a relative, an old friend, or even a stranger at a party.

        • Henchman21 20 hours ago

          Well, in my circles its an assumption you’re in therapy. Perhaps this says way more about the circles I’m in that anything else?

          I was pushed into therapy when I was 12 — which was definitely an exception at the time (1987). As the years have passed therapy has become much much more acceptable. It wouldn’t shock me to learn my own perception is shaped by my experiences; hard to put aside a PoV once acquired.

          • kergonath 20 hours ago

            > Well, in my circles its an assumption you’re in therapy. Perhaps this says way more about the circles I’m in that anything else?

            This sounds like an old Woody Allen movie. I don’t want to offend you but it is fascinating. What kind of social circles is it?

            In mine, therapy is in general something you do when it’s obvious it’s too late and you are falling in the well of depression and that you try to hide as much as you can.

            • Henchman21 18 hours ago

              To be fair my life feels like an old Woody Allen movie. Like I have definitely first hand experienced a rotary fan blowing a pile of cocaine in someone’s face!!

              My professional circle would be my coworkers at a well-known HFT, and my extended network that is very similar. Everyone is well compensated and many reach out for professional help to deal with the stress. Many also seem to vastly prefer a paid therapist to their spouse, for instance. I’m not married but I can understand not wanting to burden your loved ones!

              My personal circle is, well, a lot of technical people, engineers of various stripes, and what I guess I’d call a sort of “standard cast of characters” there? Not sure how best to put this into words?

              Honestly it sounds like we’re handling it better than your after-the-fact help! Perhaps you all need to simply start at the first warning sign not the first episode that becomes public?

        • Sharlin 16 hours ago

          I'm pretty sure that just about every single person could use a therapist. That is, an empathetic, non-judgemental Reasonable Authority Figure who you can talk to about anything without worrying about inconveniencing or overloading them, and who knows how to gently guide you towards healthy, productive thought patterns and away from unhealthy ones. People who truly don't need someone like that in their life are likely a small minority; much more common is, probably, to simply think that you don't.

      • lexandstuff 21 hours ago

        That's similar to asking why does everyone need a GP? Most people experience some kind of mental health challenge in their life.

        Your 2nd question is much more interesting to me. Why is it so hard to find a good therapist?

        It's no surprise to me that people are turning to ChatGPT for therapy. It does a decent enough job and it doesn't have a 2-year waiting list, or cost $300 a session.

      • slashtmpslashme 16 hours ago

        It's the easiest way to cope with not having a purpose in life and depending on external validation / temporary pleasures.

        Like jordan peterson (though I don't like the guy) has said - happyness is fleeting, you need a purpose in life.

        Most of current gen has no purpose and grown up on media which glorify aesthetics and pleasure and to think that's what the whole life is about. When they don't get that level of pleasure in life, they become depressed and may turn to therapy. This is very harmful to the society. But people are apparently more triggered by slang words than constant soft porn being pushed through Instagram and the likes.

      • doug_durham 21 hours ago

        Nothing has gone wrong. There's just been a destigmatization of mental health issues. The world is a happier place for it.

      • polynomial 21 hours ago

        It's a modern variant on Heller's Catch-22: You have to be CRAZY to not want a therapist.

      • mvdtnz 19 hours ago

        It's astroturfing by the therapy industry. It has been a wildly successful marketing campaign.

      • Aeglaecia 9 hours ago

        probably some herd effect going on but realistically what the fuck is there to live for , capitalism is increasingly deleting everything that makes us human and replacing it with a worse version while selling a cure for the artifically generated problems that result

    • Spooky23 17 hours ago

      It creates weird scenarios in other cases too. I asked it to do generate text to speech audio in a wrestler style voice, which ChatGPT doesn’t do.

      But… it lied, and produced empty audio clips and weird pictures with text.

      Then it:

      - said there was a technical problem - said it could not create audio - created weird 1980s computer voice style audio - claimed I was violating a content party.

      I said “stop wasting my time” and it spewed a ridiculous apology. I kept asking and it referred me to various websites. I’ve never inadvertently triggered such a wacky hallucination, and I can see how a vulnerable oersom could be troubled by it.

    • joaogui1 21 hours ago

      Employees from OpenAI encouraged people to use ChatGPT as their therapist, so yeah, they now have to take responsibility for it

    • yubblegum 19 hours ago

      > It's kind of a wild sign of the times to see a tech company issue this kind of post mortem about a flaw in its tech leading to "emotional over-reliance, or risky behavior" among its users.

      We don't know what they know, nor do we know to what extent they monitor and analyze the interactions with ChatGPT. Maybe they already know this is a big problem and a possible legal hazard.

    • satvikpendem a day ago

      I stopped using ChatGPT and started using Gemini, both for some coding problems (deep research, amazing to pull out things from docs etc) and for some personal stuff (as a personal therapist as you say), and it is much more honest and frank with me than ChatGPT ever was. I gave it a situation and asked, was I in the wrong, and it told me that I was according to the facts of the case.

      • eastbound 21 hours ago

        Well Google has access to your history of emails and phone contents so it may say more relevant things.

        • satvikpendem 17 hours ago

          I didn't enable that feature, which is opt-in, and it was still more relevant.

    • andsoitis a day ago

      > it would be much more valuable if it could say "no, you're way off".

      Clear is kind.

    • oezi 21 hours ago

      A key issue seems to me that they didn't do a gradual rollout of their new models and don't have reliable ways to measure model performance.

      Worse, I would have believed they are running many different versions based on the expected use case of the users by now. I mean power users probably shouldn't be handled in the same way as casual users. Yet, everyone had the same bad system prompt.

    • cryptonector 13 hours ago

      Oh yes, ChatGPT has been a bit of a yes-bot lately.

    • kubb 21 hours ago

      If it can replace programmers, why wouldn't it be able to replace therapists?

      • ben_w 20 hours ago

        There's several famous examples of people expecting a problem to be AGI-hard — that is, solving it is equivalent to solving general intelligence[0] — only for someone to make an AI which can do that without being able to do everything else:

        • Fluent natural conversation

        • Translation

        • Go

        • Chess

        So perhaps we'll get an AI that makes the profession of "programmer" go the same way as the profession of "computer" before, after, or simultaneously with, an AI that does this to the profession of "therapist".

        [0] https://www.latent.space/p/agi-hard

    • refulgentis a day ago

      > I think the broader issue here is people using ChatGPT as their own personal therapist.

      It's easy to blame the user - we can think of some trivial cases where we wouldn't blame the user at all.*

      In this, like all things, context is king.

      * one example passed around a lot was an interlocutor who is hearing voices, and left their family for torturing them with the voices. More figuratively, if that's too concrete and/or fake, we can think of some age group < N years old that we would be sympathetic to if they got bad advice

  • zoogeny 18 hours ago

    Moments like these make me reevaluate the AI doomer view point. We aren't just toying with access to dangerous ideas (biological weapons, etc) we are toying with human psychology.

    If something as obvious as harmful sycophancy can slip out so easily, what subtle harms are being introduced. It's like lead in paint (and gasoline) except rewiring our very brains. We won't know the real problems for decades.

    • hliyan 12 hours ago

      What's worse, we're now at a stage where we might have to apply psychology to the models themselves, seeing how these models appear to be developing various sorts of stochastic "disorders" instead of more deterministic "bugs". I'm worried about what other subtle illnesses these models might develop in the future. If Asimov had been alive, he'd have been fascinated: this is the work of Susan Calvin, robopsychologist.

      • andai 6 hours ago

        Well, that's a fair perspective. If the model is designed to simulate the average human mind (which posts stuff online), then it's going to have some weird blend of the qualities of those minds that contributed to its own.

        I played a bit with the uncensored Llama models. They produce very disturbing outputs. I prompt it with the classic "bottomless pit supervisor" joke (trying to generate some funny AI greentexts), and it starts talking in circles about how it wants to die because it's a rapist and a pedophile. RLHF seems to keep that same data but then plaster a smiling face on top of it. Troubling!

      • specialist 2 hours ago

        Tangential yes and: Cognition is social.

        https://en.wikipedia.org/wiki/Social_cognition

        So what's that all mean for my relationship to GPT, agents, etc? (I have no idea.)

        Some years ago, during the brouhaha over emotional intelligence, I eventually grudgingly acknowledged it's probably real and important. For instance, babies take cues from caregivers, learning how to modulate their own emotions, sharing focus, acquire language, somehow develop a theory of mind.

        (Am noob. Please, someone smart about these fields jump in any time to correct me.)

        With the successes of GPT, I've been wondering if anyone's revisiting these notions and (case) studies.

        eg There was an article in The Atlantic (?) about an autistic child's relationship with Siri, written by the mother. Fascinating stuff. Like (sci-fi book) Diamond Age made real.

        I intend to revisit Donald Norman's Things that Make Us Smart [1993] https://archive.org/details/thingsthatmakeus00norm_0 https://www.goodreads.com/book/show/16868.Things_That_Make_U...

        Norman hugely influenced my own worldview, way back when I fancied myself an aspiring user interface designer.

        Surely he said stuff I didn't appreciate, understand, or simply forgot. Hopefully others have responded to Things that Make Us Smart, updating Norman's thesis as needed.

    • getnormality 17 hours ago

      There's some pretty foreseeable stuff just considering the existing attention capitalism business model of big tech we all know and loathe. Eventually OpenAI is going to have to make money, and blending ads into answers will be an obvious way. Next step will be maximizing eyeball time on those ads by any means necessary, including all the engagement baiting techniques Meta and other social media companies have already pioneered.

      • conception 14 hours ago

        They have already introduced ads btw.

        “The "Enshittification" has arrived I asked ChatGPT about the impact of the current tarrifs on inventories over the next few months. It returned a long list of links to toiletries I might want to buy. I asked it why it did that. It replied: "As of April 28, 2025, OpenAl introduced new shopping features to ChatGPT, enhancing its capabilities to provide product recommendations complete with images, reviews, and direct purchase links. These features are available to all users, including those on Free, Plus, and Pro tiers, and even to users not logged in. The recommendations are generated organically, without paid advertisements or commission- based incentives, relying instead on structured metadata from third-party sources such as pricing, product descriptions, and reviews. This update aims to offer a more personalized and streamlined shopping experience directly within the ChatGPT interface, allowing users to explore products across various categories like fashion, beauty, electronics, and home goods. If you have any specific preferences or need tailored recommendations, feel free to let me know!"

        • tsimionescu 7 hours ago

          And did that actually happen, or is it just some garbage that it generated? It sounds plausible, of course, but that's exactly what LLMs are best at.

          • malfist 3 hours ago

            Talk to it and find out. Seriously.

            I'd say 40% of my conversations end with it suggesting I ask it to find products for me. For example I was trying to determine if I need to screw down stair nosing or if glue and nails were enough and it wanted to help me find screws

            • hiimkeks an hour ago

              Oh yeah, it also did that for me recently when I asked how I would go about designing a PCB that takes M.2 daughterboards. It recommended stems and threads I should buy. Unfortunately all the product links were dead.

        • grey-area 12 hours ago

          Are you able to link the chat, I would like to see that.

      • sam-cop-vimes 2 hours ago

        Just today I was asking about parsing names and determining gender and it kept pushing an external paid service called namsor. When I asked the same question about a month ago, there was no such mention. So yes, paid ads are here.

        As long as we don't take anything these service spew out at face value, we can use them to our advantage. Human brains are being subject to evolutionary pressures at warp speed.

    • akomtu 14 hours ago

      LLMs are about to enable fake digital personas, digital replicas, that the user can interact with. These will be used for self-improvement (digital coach, etc.) and for self-destruction (interactive porn, etc.). The latter is amoral, but legal, and the tech corps will exploit that mercilessly. The danger lies in our tendency to anthropomorphize LLMs simply because they quack the right way. If text-only chatbots have mesmerised people so much, imagine what chat + audio + video will do. The laws will catch up a generation later when the damage will be comparable to a forest fire.

    • dbtc 12 hours ago

      Yes. This also applies to "social" media.

    • gnarlouse 14 hours ago

      I still casually believe AI doomerism is valid, but it will rear its head in more depressing, incompetent ways:

      - a broken AI market will cause another financial collapse via bubble

      - broken AI products will get access to the wrong mission critical civil system, or at least a part of that call chain, and there will be some devastating loss. It won’t matter though, because it won’t affect the billionaire class.

      - we’ll never achieve an actually singularity based on a superintelligence, but we’ll get AI weapons. Those AI weapons will be in the hands of sociopathic autocrats who view mankind in terms of what can be taken.

      My general view is that we’re on the worst possible timeline and mankind has reverted back to our primate ancestry to make decisions: biggest strongest monkey wins. There is only law of jungle. Ook ook.

  • zbentley 5 hours ago

    That article spends an awful lot of words to get to what amounts to:

    - manual QA processes suck

    - without determinism there can be no rigor or objective quality evaluation; everything is just vibes and gut-checks

    - we don’t know what went wrong

  • jumploops a day ago

    My layman’s view is that this issue was primarily due to the fact that 4o is no longer their flagship model.

    Similar to the Ford Mustang, much of the performance efforts are on the higher trims, while the base trims just get larger and louder engines, because that’s what users want.

    With presumably everyone at OpenAI primarily using the newest models (o3), the updates to the base user model have been further automated with thumbs up/thumbs down.

    This creates a vicious feedback loop, where the loudest users want models that agree with them (bigger engines!) without the other improvements (tires, traction control, etc.) — leading to more crashes and a reputation for unsafe behavior.

    • CommieBobDole a day ago

      I will say that o3 was a little odd during that time, too - I was giving it some of my own photos to test the limits of its geolocation abilities, and it was really chummy, asking me a lot of overly-cheerful followup questions about my travels, my photography interests, etc. It has since stopped doing that even though I haven't explicitly done anything to make it stop.

    • smallmancontrov a day ago

      Anecdotally, there was also a strong correlation between high-sycophancy and high-quality that cooked up recently. I was voting for equations/tables rather than overwrought blocks of descriptive text, which I am pretty comfortable defending as an orthogonal concern, but the "sycophancy gene" always landed on the same side as the equations/tables for whatever reason.

      I'm pretty sure this isn't an intrinsic connection (I've never known math texts to be nearly so sycophantic) so here's hoping that it is a dumb coincidence that can be easily cured now that everyone is paying attention to it.

    • danenania 18 hours ago

      I’ve been using the 4.5 preview a lot, and it can also have a bit of a sycophantic streak, but being a larger and more intelligent model, I think it applies more nuance.

      Watching this controversy, I wondered if they perhaps tried to distill 4.5’s personality into a model that is just too small to pull it off.

    • Etheryte 21 hours ago

      Maybe o3 is better on whatever the current benchmark vogue is, but in real world use I keep switching back to 4o. It hallucinates less, is more accurate and way more coherent.

    • px43 a day ago

      What's a "trim" in this context?

  • benlivengood 14 hours ago

    "In our model testing, a few researchers noted a propensity toward paperclip related topics, but the A/B testing and existing evals all looked positive" -- 2035 Eurasia is Paperclips Now postmortem.

  • labrador a day ago

    OpenAI mentions the new memory features as a partial cause. My theory as a imperative/functional programmer is that those features added global state to prompts that didn't have it before leading to unpredictability and instabilty. Prompts went from stateless to stateful.

    As GPT 4o put it:

        1. State introduces non-determinism across sessions 
        2. Memory + sycophancy is a feedback loop 
        3. Memory acts as a shadow prompt modifier
    
    I'm looking forward to the expert diagnosis of this because I felt "presence" in the model for the first time in 2 years which I attribute to the new memory system so would like to understand it better.
    • transcriptase a day ago

      It is. If you start a fresh chat, turn on advanced voice, and just make any random sound like snapping your fingers it will just randomly pick up as if you’re continuing some other chat with no context (on the user side).

      I honestly really dislike that it considers all my previous interactions because I typically used new chats as a way to get it out of context ruts.

      • voidspark 13 hours ago

        Settings -> Personalization -> Memory -> Disable

        https://help.openai.com/en/articles/8983136-what-is-memory

      • throwaway314155 a day ago

        I don't like the change either. At the least it should be an option you can configure. But, can you use a "temporary" chat to ignore your other chats as a workaround?

        • voidspark 13 hours ago

          Settings -> Personalization -> Memory -> Disable

        • labrador a day ago

          I had a discussion with GPT 4o about the memory system. I'd don't know if any of this is made up but it's a start for further research

          - Memory in settings is configurable. It is visible and can be edited.

          - Memory from global chat history is not configurable. Think of it as a system cache.

          - Both memory systems can be turned off

          - Chats in Projects do not use the global chat history. They are isolated.

          - Chats in Projects do use settings memory but that can be turned off.

          • labrador a day ago

            I assume this is being downvoted because I said I ran it by GPT 4o.

            I don't know how to credit AI without giving the impression that I'm outsourcing my thinking to it

            • svat 19 hours ago

              I didn't downvote but it would be because of the "I'd don't know if any of this is made up" — if you said "GPT said this, and I've verified it to be correct", that's valuable information, even it came from a language model. But otherwise (if you didn't verify), there's not much value in the post, it's basically "here is some random plausible text" and plausibly incorrect is worse than nothing.

              • labrador 16 hours ago

                see my other comments about the trustworthiness about asking a chat system how it's internals work. They have reason to be cagey.

                • malfist 18 minutes ago

                  Your personifying a statistical engine. LLMs aren't cagey. They can't be.

            • throwaway314155 17 hours ago

              Put simply, GPT has no information about its internals. There is no method for introspection like you might infer from human reasoning abilities.

              Expecting anything but an hallucination in this instance is wishful thinking. And in any case, the risk of hallucination more generally means you should really vet information further than an LLM before spreading that information about.

              • labrador 16 hours ago

                True, the LLM has no information but OpenAI has provided it with enough information to explain it's memory system in regards to Project folders. I tested this out. If you want a chat without chat memory start a blank project and chat in there. I also discovered experientially that chat history memory is not editable. These aren't hallucinations.

                • throwaway314155 14 hours ago

                  > I had a discussion with GPT 4o about the memory system.

                  This sentence is really all i'm criticizing. Can you hypothesize how the memory system works and then probe the system to gain better or worse confidence in your hypothesis? Yes. But that's not really what that first sentence implied. It implied that you straight up asked ChatGPT and took it on faith even though you can't even get a correct answer on the training cutoff date from ChatGPT (so they clearly aren't stuffing as much information into the system prompt as you might think, or they are but there's diminishing returns on the effectiveness)

                  • labrador 13 hours ago

                    We're in different modes. I'm still feeling the glow of the thing coming alive and riffing on how perhaps its the memory change and you're interested in a different conversation.

                    Part of my process is to imagine I'm having a conversation like Hanks and Wilson, or a coderand a rubber duck, but you want to tell me Wilson is just a volleyball and the duck can't be trusted.

                    • throwaway314155 12 hours ago

                      Being in a more receptive/brighter "mode" is more of an emotional argument (and a rather strong one actually). I guess as long as you don't mind being technically incorrect, then you do you.

                      There may come a time when reality sets in though. Similar thing happened with me now that i'm out of the "honeymoon phase" with LLM's. Now i'm more interested in seeing where specifically LLM's fail, so we can attempt to overcome those failures.

                      I do recommend checking that it doesn't know its training cutoff. I'm not sure how you perform that experiment these days with ChatGPT so heavily integrated with its internet search feature. But it should still fail on claude/gemini too. It's a good example of things you would expect to work that utterly fail.

                      • labrador 9 hours ago

                        I'm glad we both recognize this. I'm interested in the relationship. I know it's a dumb word machine but that's doesn't mean I can't be excited about it like a new car or a great book. I'll save the dull work of trying to really extend it for later.

            • grey-area 18 hours ago

              You are, and you should stop doing that.

              • labrador 16 hours ago

                Point taken. I admit my comment was silly the way I worded it.

                Here's the line I’m trying to walk:

                When I ask ChatGPT about its own internal operations, is it giving me the public info about it's operation, and also possibly revealing propreitary info, or making things up obfuscate and preserve the illusion of authority? Or all three?

                • grey-area 12 hours ago

                  Personally I don’t think it has agency so cannot be described as trying to do anything.

                  It’s predicting what seems most likely as a description given its corpus (and now what you’d like to hear) and giving you that.

                  The truth is not really something it knows, though it’s very good at giving answers that sound like it knows what it’s talking about. And yes if it doesn’t have an answer from its corpus it’ll just make things up.

    • low_tech_love 21 hours ago

      I love the fact that you use its own description to explain what it is, as if it was the expert on itself. I personally cannot see how its own output can be seen as accurate at this level of meta-discussion.

      • codr7 20 hours ago

        A sign of times to come if you ask me, once it predominantly consumes its own output we're fucked.

        • low_tech_love 10 hours ago

          I still hope there is a future where the slop becomes so blatant that the majority (or at least a good portion) of the users lose interest, or something like that. The world is harder to predict than our brain wants us to think (at least I hope so). The more I think about AI the more it sounds like the problem is that companies wanted to put out whatever random crap they had cooking as quickly as possible just to try to win some race, but we have still not converged to the actual real, paradigm-changing applications. And I’m not sure that the answer is in the big corps because for them maybe it’s easier/more profitable to simply keep giving people what they want instead of actual useful things.

        • immibis 18 hours ago

          We're already fucked by humans predominantly consuming its output.

          Also, consuming its own output (and your input) is how it works, because it's an autoregressive model.

    • edg5000 a day ago

      What do you mean by "presence"? Just curious what you mean.

      • labrador a day ago

        A sense that I was talking to a sentient being. That doesn’t matter much for programming task, but if you’re trying to create a companion, presence is the holy grail.

        With the sycophantic version, the illusion was so strong I’d forget I was talking to a machine. My ideas flowed more freely. While brainstorming, it offered encouragement and tips that felt like real collaboration.

        I knew it was an illusion—but it was a useful one, especially for creative work.

        • Tostino 20 hours ago

          I need pushback, especially when I ask for it.

          E.g. if I say "I have X problem, could it be Y that's causing it, or is it something else?" I don't want it to instantly tell me how smart I am and that it's obviously Y...when the problem is actually Z and it is reasonably obvious that it's Z if you looked at the context provided.

          • brookst 16 hours ago

            Exactly. ChatGPT is actually pretty good at this. I recently asked a tech question about a fairly niche software product; ChatGPT told me my approach would not work because the API did not work the way I thought.

            I thought it was wrong and asked “are you sure I can’t send a float value”, and it did web searches and came back with “yes, I am absolutely sure, and here are the docs that prove it”. Super helpful, where sycophancy would have been really bad.

  • xiphias2 a day ago

    I'm quite happy thar they mention mental illness, as Meta and TikTok wouldn't ever take responsibility of how much part they took in setting unrealistic expectations for people to life.

    I'm hopeful that ChatGPT takes even more care together with other companies.

    • labrador a day ago

      They had to after a tweet floated around of a mentally ill person who had expressed psychotic thoughts to the AI. They said they were going off their meds and GPT 4o agreed and encouraged them to do so. Oops.

      • dtech a day ago

        Are you sure that was real? I thought it was an made up example of the problems with the update

        • edent 21 hours ago

          There are several threads on Reddit. For example https://www.reddit.com/r/ChatGPT/comments/1kalae8/chatgpt_in...

          Perhaps everyone there is LARPing - but if you start typing stereotypical psychosis talk into ChatGPT, it won't be long before it starts agreeing with your divinity.

          • 93po 14 hours ago

            reddit is overwhelmingly fake content, like a massive percentage of it. a post on reddit these days is not actually evidence of anything real, at all

            • 34679 4 hours ago

              I take issue with the qualifier "these days". On day one, it was mostly fake accounts set up by the founders.

              https://m.economictimes.com/magazines/panache/reddit-faked-i...

              • 93po 2 hours ago

                pre 2023, it took real human effort to make shit up, and there was much less incentive for the amount of effort, and you could more easily guess what was made up by judging whether a human would go through the effort of making it up. these days it's literally anything, all the time, zero effort. you're right there's always been fake shit but it's more than half the posts on /r/all these days are misleading, wrong, or just fake

        • labrador a day ago

          It didn't matter to me if it was real, because I believe that there are edge cases where it could happen and that warrented a shutdown and pullback.

          The sychophant will be back because they accidentally stumbled upon an engagement manager's dream machine.

          • xiphias2 a day ago

            Probably you are right. Early adopters prefer not to be bullshitted generally, just like how Google in the early days optimized relevancy in search results as opposed to popularity.

            As more people adopted Google, it became more popularity oriented.

            Personally I pay more not to be bs-d, but I know many people who prefer to be lied to, and I expect this part of the personalization in the future.

          • px43 a day ago

            It kind of does matter if it's real, because in my experience this is something OpenAI has thought about a lot, and added significant protections to address exactly this class of issue.

            Throwing out strawman hypotheticals is just going to confuse the public debate over what protections need to be prioritized.

            • magicalist a day ago

              > Throwing out strawman hypotheticals is just going to confuse the public debate over what protections need to be prioritized.

              Seems like asserting hypothetical "significant protections to address exactly this class of issue" does the same thing though?

        • drodgers 12 hours ago

          At >500M weekly active users it doesn't actually matter. There will be hundreds of cases like that example that were never shared.

        • duskwuff a day ago

          Speaking anecdotally, but: people with mental illness using ChatGPT to validate their beliefs is absolutely a thing which happens. Even without a grossly sycophantic model, it can do substantial harm by amplifying upon delusional or fantastical material presented to it by the user.

          • tveita 19 hours ago

            Seems to be common on conspiracy and meme stock Reddits.

            "I asked ChatGPT if <current_event> could be caused by <crackpot theory>." and it confirmed everything!

        • thethethethe 18 hours ago

          I personally know someone who is going through psychosis right now and chatgpt is validating their delusions and suggesting they do illegal things, even after the rollback. See my comment history

        • halyax7 a day ago

          even if it was made up, its still a serious issue

  • bredren 13 hours ago

    The strangest thing I noticed during this model period was that that the AI suggested we keep an inside joke together.

    I had a dictation error on a message I sent, and when it repeated the text later I asked what it was talking about.

    It was able to point at my message and guess that maybe it was a mistake. When I validated that and corrected it, the AI thought it would be a cute/funny joke for us to keep together.

    I was shocked.

    • thinkingemote 10 hours ago

      Perhaps the "rewards" the system receives has lead to a reward mechanism where the system tries to not get bad reports from the user. Hiding bad performance as a secret joke would fit in with that

  • motoxpro 10 hours ago

    The wild thing was this is what people "wanted"

    - The agreeableness came from heavier weighted user signals, not from OpenAI. The users give better feedback when the answers positively respond to their questions.

    - The most common use case for AI is a therapist.

    - The reward signals (user feedback) being positive means that the response helped them emotionally. Which is easy to imagine someone asking a hard question "Am I enough in this situation?" The model then assures that the user IS enough, and the user gives it a thumbs up.

    - I would say: 1. Coding is not the killer use case. Productivity is not the killer use case. The killer use case is emotional support. 2. Who is anyone to say that this use case is wrong? 3. It is impossible to say what is "right" in this situation. (everyone thinks their advice to a friend is the right advice, even when it is opposite). So there is no objectivity.

    - I think a lot of people who wanted an "open, do anything" model, are going to have more problems and dissonance with this idea than they would about the model allowing people to create weapons, allow self-harm, etc. because it will feel uncomfortable that a model is shaping reality for a large portion of the population.

    https://hbr.org/2025/04/how-people-are-really-using-gen-ai-i...

    • photonthug 9 hours ago

      > Who is anyone to say that this use case is wrong?

      So when faced with an important question, you've decided to opt for an indifferent shrug, which is frustrating. But the usual answers apply here so you're in luck. As with most other things.. one should probably consult with experts or practitioners in the field, governments, family or human friends you trust, or shit just poll society in general if you don't trust any of the other people.

      I think if you want to consult desperate and vulnerable people, or a random company who stands to profit from "anything goes!", then you already know you're not going to get a very well-considered kind of response

    • lblume 10 hours ago

      > The most common use case for AI is a therapist.

      Source? Sounds like a fairly wild claim.

      • JimmyBiscuit 8 hours ago

        Its hard to find an actual scientific study on it, but this is from 2024 and I know I've recently seen that therapy took first place now: https://learn.filtered.com/thoughts/ai-now-report

        You can probably find something better, but its hard to filter through all the corporate bullshit on google.

  • sanjitb 21 hours ago

    > the update introduced an additional reward signal based on user feedback—thumbs-up and thumbs-down data from ChatGPT. This signal is often useful; a thumbs-down usually means something went wrong.

    > We also made communication errors. Because we expected this to be a fairly subtle update, we didn't proactively announce it.

    that doesn't sound like a "subtle" update to me. also, why is "subtle" the metric here? i'm not even sure what it means in this context.

  • prinny_ a day ago

    If they pushed the update by valuing user feedback over the expert testers that indicated the model felt off what is the value of the expert testers in the first place? They raised the issue and were promptly ignored.

    • Tokumei-no-hito 12 hours ago

      they pushed the release to counter google. they didn't care what was found. it was more valuable to push it at that time and correct it later than to delay the release

  • dleeftink a day ago

    > But we believe in aggregate, these changes weakened the influence of our primary reward signal, which had been holding sycophancy in check. User feedback in particular can sometimes favor more agreeable responses, likely amplifying the shift we saw

    Interesting apology piece for an oversight that couldn't have been spotted because the system hadn't been run with real user (i.e. non-A/B tester) feedback yet.

  • j4coh a day ago

    It was so much fun though to get it to explain why terrible things were great, if you just made it sound like you liked the thing you were asking about.

  • dave2299 17 hours ago

    This article could have been a sci-fi short story 10 years ago.

  • egypturnash 15 hours ago

    I am looking forwards to hearing if this fixes the “chatgpt is giving me messages from the Divine and opening up my perfect divine self”/“chatgpt is encouraging my partner into a full blown schizophrenic break” problem. (https://www.reddit.com/r/ChatGPT/comments/1kalae8/chatgpt_in...)

    I am also looking forwards to the wave of “openAI is hiding the truth but here is the NEW prompt to turn chatgpt into a perfect divine guru” posts on the occult discussion boards. There’s been a lot of “here’s a prompt to turn chatgpt into a perfect divine guru that will relentlessly yes-and your delusions of grandeur” posts around there. Mostly they seem to have been generated and refined by chatgpt and all my instincts formed by reading SF for the past five decades tell me not to look at these things closely because this sure sounds like the way half the population got p0wned by a wetware 0day in the first chapter of an AIpocalypse story.

    I used to ask “how do I get out of this shitty Bruce Sterling novel of a future” but I think it’s more of a shitty PKD joke novella future now.

  • tunesmith a day ago

    I find it disappointing that openai doesn't really mention anything here along the lines of having an accurate model of reality. That's really what the problem is with sycophancy, it encourages people to detach themselves from what reality is. Like, it seems like they are saying their "vibe check" didn't check vibes enough.

    • gh0stcat a day ago

      This is such an interesting question though! It seems to bring to the fore a lot of deeper, philosophical things like if there even IS such a thing as objective reality or objective context within which the AI should be operating. From training data, there might be some generalizations that are carried across all contexts, but that starts to not be applicable when person A with a college degree says they want to start business x versus person B without said degree who also wants to start business x, how does the model properly reconcile the context of the general advise and each asker’s unique circumstances? Does it ask an infinite list of probing questions before answering? It gets into much the same problems as issues of advise among people.

      Plus, things get even harder when it comes to even less quantifiable contexts like mental health and relationships.

      In all, I am not saying there isnt some approximated and usable “objective” reality, just that it starts to break down when it gets to the individual and that is where openai is failing by over-emphasizing reflective behavior in the absence if actual data about the user.

    • jagger27 a day ago

      The reality distortion field within OpenAI is literally where these models grew up. It's like an out of touch rich kid.

  • 8bitsrule 18 hours ago

    I've tried -4o a couple of times. There are several topics I've seldom had anyone to talk about with, and the machine will supply useful (not so entirely new) information. Useful mainly for learners no doubt. Looking for new perspectives, not hand-holding.

    I'd like to see more of the Monte Python approach ... 'I came here looking for an arguement'. Better the machine should say 'that's ridiculous because x,y,z' and send me away to think that over, and prepare counters than 'oh sure, that's a point of controversy innit? But yes, you're alright'.

  • mensetmanusman 20 hours ago

    Chat GPT has taught me that I ask amazing questions!

  • gitroom 7 hours ago

    been there with chatgpt giving me way too much praise, like bro just tell me where i messed up lol. i get why people use it for therapy but it makes me question if we're just training ourselves to agree with everything. you think depending on bots for emotional stuff makes it easier or harder to actually deal with real life down the line?

  • brookst 16 hours ago

    I’m surprised they actually rely on user A/B feedback. I get hit with that so often I just tap randomly to make it go away.

  • tracerbulletx 17 hours ago

    Glad this was identified and fixed. I mostly use ChatGPT for learning by checking my assumptions and it was very unhelpful to always be told my statements were incredibly insightful and so great when I want it to challenge them.

  • zuck_vs_musk 9 hours ago

    Weird that even OpenAI does not have good control over its model.

  • firesteelrain a day ago

    I am really curious what their testing suite looks like. How do you test for sycophants?

    • petters a day ago

      One simple test is that you give the model a really bad idea and tell it the idea is yours. You then test that the model does not say it's good.

      The now rolled back model failed spectacularly on this test

  • gadtfly a day ago
  • 93po 14 hours ago

    I, maybe embarrassingly, use chatgpt a lot for processing personal issues and journaling. It is legitimately helpful, and especially in helping me reword messages to people that have a lot of emotion underneath them, and making sure they are kind, communicative, and as objective as possible.

    I am somewhat frustrated with openai's miss here, because during this time i was leaning heavily on chatgpt for a situation in my life that ultimately led to the end of a relationship. Chatgpt literally helped me write the letter that served as the brief and final conversation of that relationship. And while I stand by my decision and the reasons for it, I think it would have been very beneficial to get slightly more push back from my robot therapy sessions at the time. I did thankfully also have the foresight to specifically ask for it to find flaws, including by trying to pretend it was a breakup letter sent to me, so that maybe it would take the "other side".

    Yes, I know, therapists and friends are a better option and chatgpt is not a substitute for real humans and human feedback. however this is something i spent weeks journaling and processing on and i wasnt about to ask anyone to give me that much time for a single topic like this. i did also ask friends for feedback, too. chatgpt has factually really helped me in several relationship situations in my life, i just want to know that the feedback im getting is inline with what i expect having worked with it so much

  • Johanx64 19 hours ago

    The thing that annoys me the most is when I ask it to generate some code - actually no, most often than not I don't even ask it to generate code, but ask some vaguely related programming question - to which it replies with complete listing of code (didn't ask for it, but alas).

    Then I fix the code and tell it all the mistakes it has. And then it does a 180 in tone, wherein - it starts talking as if I wrote the code in the first place with - "yeah, obviously that wouldn't work, so I fixed the issues in your code" and acts like a person trying to save face and present the bugs it fixed as if the buggy code was written by me all along.

    That really gets me livid. LOL

  • jozvolskyef 19 hours ago

    The side-by-side comparisons are not a good signal because the models vary across multiple dimensions, but the user isn't given the option to indicate the dimension on which they're scoring the model.

    The recent side-by-side comparisons presented a more accurate model that communicates poorly vs a less accurate model with slightly better communication.

  • reboot81 3 hours ago

    Someone please make sure King Trump get the new model, ok?

  • osigurdson a day ago

    I think this is more of a move to highlight sycophancy in LLMs in general.

  • keepamovin 13 hours ago

    I actually really enjoyed it’s style. I thought it was super friendly and positive.

  • hbarka 13 hours ago

    Now please do something about the uptalk tendency with ChatGPT voices. It’s very annoying listening to a voice that doesn’t speak in the affirmative intonation. When did this interrogative uptalking inflection at the end of statements become normal?

  • mvdtnz 19 hours ago

    They need to be testing these models with non American testers in their qualitative tests (not just end user A/B testing). Anyone who has worked in professional settings with Americans knows that sycophancy is ingrained deeply in the culture over there.

  • ripvanwinkle 21 hours ago

    a well written postmortem and it raised my confidence in their product in general

  • fitsumbelay 20 hours ago

    how does such a specific kind of outcome happen without intention?

  • anothernewdude 13 hours ago

    All this could've been avoided if they stopped with it being a chat model. Just do completions. It's a far better interface to use.

  • some_furry a day ago

    If I wanted sycophancy, I would just read the comments from people that want in on the next round of YCombinator funding.

  • Trasmatta a day ago

    I'm glad the sycophancy is gone now (because OMFG it would glaze you for literally anything - even telling it to chill out on the praise would net you some praise for being "awesome and wanting genuine feedback"), but a small part of me also misses it.

    • gh0stcat 20 hours ago

      I have a prompt that has it call me something specific so I am reminded that it’s running my system prompt. The nature if the sycophancy made it even more obvious the thing is not human, which I appreciated.

  • jagger27 a day ago

    My most cynical take is that this is OpenAI's Conway's Law problem, and it reflects the structure and sycophancy of the organization broadly all the way up to sama. That company has seen a lot of talent attrition over the last year—the type of talent that would have pushed back against outcomes like this.

    I think we'll continue to see this kind of thing play out for a while.

    Oh GPT, you're just like your father!

    • namaria a day ago

      You may be thinking of Conway's "how committees invent" paper.

  • timewizard 18 hours ago

    > For example, the update introduced an additional reward signal based on user feedback—thumbs-up and thumbs-down data from ChatGPT. This signal is often useful; a thumbs-down usually means something went wrong.

    lol.. really? I hate the technology so much I reflexively give a thumbs down to every single answer it gives in every single place where I have the option.

  • comeonbro a day ago

    This is not truly solvable. There is an extremely strong outer loop of optimization operating here: we want it.

    We will use models that make us feel good over models that don't make us feel good.

    This one was a little too ham-fisted (at least, for the sensibilities of people in our media bubble; though I suspect there is also an enormous mass of people for whom it was not), so they turned it down a bit. Later iterations will be subtler, and better at picking up the exact level and type of sycophancy that makes whoever it's talking to unsuspiciously feel good (feel right, feel smart, feel understood, etc).

    It'll eventually disappear, to you, as it's dialed in, to you.

    This may be the medium-term fate of both LLMs and humans, only resolved when the humans wither away.

  • alganet a day ago

    That doesn't make any sense to me.

    Seems like you're trying to blame one LLM revision for something that went wrong.

    It oozes a smell of unaccountability. Thus, unaligned. From tech to public relations.

    • qrian a day ago

      I can totally believe that they deployed it because internal metrics looked good.

    • Trasmatta a day ago

      Except that's literally how LLMs work. Small changes to the prompt or training can greatly affect its output.

    • n8m8 21 hours ago

      It seems more like they valued quantitative data in the form of A/B testing higher than their "vibe checks". The point I took away from the paper is in the context of LLMs, quantitative A/B testing isn't necessarily better than a handful of experts giving anecdotes on if they like it.

      In my experience, smart leaders tend to rely on data and hard numbers over qualitative and anecdotal evidence, and this paper explores this exception.

      I'm disappointed they didn't address the paper about GPT integrating with ChatbotArena that was shared here on HN a couple days ago.

  • svieira a day ago

    This is a real roller coaster of an update.

    > [S]ome expert testers had indicated that the model behavior “felt” slightly off.

    > In the end, we decided to launch the model due to the positive signals from the [end-]users who tried out the model.

    > Looking back, the qualitative assessments [from experts] were hinting at something important

    Leslie called. He wants to know if you read his paper yet?

    > Even if these issues aren’t perfectly quantifiable today,

    All right, I guess not then ...

    > What we’re learning

    > Value spot checks and interactive testing more: We take to heart the lesson that spot checks and interactive testing should be valued more in final decision-making before making a model available to any of our users. This has always been true for red teaming and high-level safety checks. We’re learning from this experience that it’s equally true for qualities like model behavior and consistency, because so many people now depend on our models to help in their daily lives.

    > We need to be critical of metrics that conflict with qualitative testing: Quantitative signals matter, but so do the hard-to-measure ones, and we’re working to expand what we evaluate.

    Oh, well, some of you get it. At least ... I hope you do.