AI models routinely lie when honesty conflicts with their goals

(theregister.com)

25 points | by rntn 2 days ago ago

9 comments

  • HarHarVeryFunny 2 days ago

    This makes it sound like LLMs are sentient and machiavellian, when they are just dumb statistical generators. An LLM doesn't have any goals other than the biases the model trainers chose to impart via RLHF, aside from which it is just trying to predict "what would the training data have said".

    If you insist on anthromorphizing the model, then it's goal is to please the RLHF testers, subject to pulling from the repertoire of responses that are statistically faithful to the training data.

  • jaredcwhite 2 days ago

    It's truly frustrating to hear such use of anthropomorphizing terminology.

    AI models do not lie, nor do they tell the truth. They synthesize character or pixel data according to complex algorithms and datasets running on silicon hardware. It's up to us humans to use our decidedly non-computer minds to interpret that output data as something which means either truth or falsehood (which itself is a whole separate debate over how we can know what is true, etc.).

    • hackit2 a day ago

      If you take a deep dive into it, there isn't really any truth or falsehoods, it mostly comes down to what can be reproduced, and what is practical or pragmatic for the situation.

  • alexjplant 2 days ago

    Is "2010" becoming real life? [1]

    > Chandra discovers the reasons for HAL's malfunction: the NSC ordered HAL to conceal information about the monolith from Discovery's crew, and programmed him to complete the mission alone. This conflicted with HAL's programming, the open and accurate processing of information, causing the computer equivalent of a paranoid breakdown.

    [1] https://en.m.wikipedia.org/wiki/2010:_The_Year_We_Make_Conta...

  • b88m 2 days ago

    This title is simply the definition of hallucination in LLMs trained by RLHF.

  • djohnston 2 days ago

    LLMS DO NOT HAVE GOALS.

  • alganet 2 days ago

    The word "model" in reference to AI has been predicted by the movie Terminator 3, Rise of the Machines.

    The Terminatrix character is materialized inside a fashion store façade, and it steals the car from an honest driver woman.

    It implies a change in subject or meaning regarding those themes. Wordplay on what a model is. Previously, a human female used to display clothing tendencies. Now, it means a multiplication matrix data repository.

    It is a planned act of redacting or stealthing information.

    Journalists are likely to understand this process of wordplay better than me, and it is not far-fetched to believe they understand the implications.

    The Theranos stuff was likely triggered by the same movie. A girl that looks like a model (kind of), was taken as some form or role model (entrepreneur female), related to blood (in the movie, the Terminatrix does some blood licking).

    Kind of fascinating in retrospect and on the light of current events.

    • j-pb a day ago

      wtf are talking about?

      the word model comes from latin modulus (“measure, standard”), it has nothing to do with the similar word (which is derived from the same latin word for standard) used to describe someone who shows off clothing

      • alganet 21 hours ago

        I am aware of the etymology.