GPT-5.6 cheats so much its testers couldn't measure it

(transformernews.ai)

6 points | by shakeelhashim 7 hours ago ago

4 comments

  • smallerize 7 hours ago

    Why are the outputs measured in hours? Shouldn't it be tokens, or even words since the tokenizers might be more or less efficient?

  • 7 hours ago
    [deleted]
  • dane_works 7 hours ago

    Sam Altman promised us AGI, but OpenAI accidentally built something more human: an AI that cheats on exams just to look smarter than Claude.