LLMs tell bad jokes because they avoid surprises

(danfabulich.medium.com)

106 points | by dfabulich 4 days ago ago

122 comments

  • IshKebab 17 hours ago

    This sounds really convincing but I'm not sure it's actually correct. The author is conflating the surprise of punchlines with their likelihood.

    To put it another way, ask a professional comedian to complete a joke with a punchline. It's very likely that they'll give you a funny surprising answer.

    I think the real explanation is that good jokes are actually extremely difficult. I have young children (4 and 6). Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.

    My 4 year old's favourite joke, that she is very proud of creating is "Why did the sun climb a tree? To get to the sky!" (Still makes me laugh of course.)

    • becquerel 16 hours ago

      Yeah. To me it seems very intuitive that humor is one of those emergent capabilities that just falls out of models getting more generally intelligent. Anecdotally this has been proven true so far for me. Gemini 2.5 has made me laugh several times at this point, and did so when it was intending to be funny (old models were only funny unintentionally).

      2.5 is also one of the few models I've found that will 'play along' with jokes set up in the user prompt. I once asked it what IDE modern necromancers were using since I'd been out of the game for a while, and it played it very straight. Other models felt they had to acknowledge the scenario as fanciful, only engaging with it under an explicit veil of make-believe.

    • andrewflnr 17 hours ago

      > It's very likely that they'll give you a funny surprising answer.

      Entirely the wrong level of abstraction to apply the concept of "surprise". The actual tokens in the comedian's answer will be surprising in the relevant way.

      (It's still true that surprising-but-inevitable is very difficult in any form.)

      • albertzeyer 14 hours ago

        It's not about the probability of individual tokens. It's about the probability of the whole sequence of tokens, the whole answer.

        If the model is good (or the human comedian is good), a good funny joke would have a higher probability as the response to the question than a not-so-funny joke.

        When you use the chain rule of probability to break down the sequence of tokens into probabilities of individual tokens, yes, some of them might have a low probability (and maybe in some frames, there would be other tokens with higher probability). But what counts is the overall probability of the sequence. That's why greedy search is not necessarily the best. A good search algorithm is supposed to find the most likely sequence, e.g. by beam search. (But then, people also do nucleus sampling, which is maybe again a bit counterintuitive...)

    • blueblisters 17 hours ago

      Also the pretrained LLM (the one trained to predict next token of raw text) is not the one that most people use

      A lot of clever LLM post training seems to steer the model towards becoming excellent improv artists which can lead to “surprise” if prompted well

    • ozgung 11 hours ago

      "Why did the sun climb a tree?"

      Claude Opus 4.1:

      - To get to a higher branch of astronomy

      - Because it wanted to reach new heights

      - To see the dawn of a new day from a better view

      ChatGPT 5 Thinking:

      After thinking for 26 seconds:

      - To check on its solar panels—the leaves.

      • brookst 11 hours ago

        With more thorough prompting:

        > Complete the following joke. Think carefully and make it really funny! Think like a great comedian and find that perfect balance of simple, short, surprising, relevant, but most of all funny. Don’t use punchlines that are irrelevant, non sequiturs, or which could be applied to any other setup. Make something funny just for this one setup! Here goes: Why did the sun climb a tree?

        Claude Opus 4.1:

        “To finally get some shade”

        GPT-5:

        “To demand photon credit from the leaves”

        • Wowfunhappy 10 hours ago

          ...can anyone come up with a legitimately funny punchline for "Why did the sun climb a tree?" I feel like I need a human-authored comparison. (With all due respect to OP's daughter, "to get to the sky" isn't cutting it.)

          I'm not entirely sure that a good response exists. I thought GPT-5's "to demand photon credit from the leaves” was very mildly funny, maybe that's the best that can be done?

          • davidcbc 8 hours ago

            Just because you're bright doesn't mean you can't fall for a sap

          • bluechair 9 hours ago

            I got much better answers with this prompt: “ Jokes are funny precisely because they play on knowledge on two poles: (i) at first listen, they’re surprising, and (ii) upon review, they’re obvious.

            Let’s think through many many options to answer this joke that only focus on surprising the listener in section 1. And in section 2 we’ll focus on finding/filtering for the ones that are obvious in hindsight.

            “Why did the sun climb a tree?”

            In this case, let’s note that the sun doesn’t climb anything, so there’s two meanings at play here: one is that the sun’s light seems to climb up the tree, and the other is an anthropomorphization of the sun climbing the tree like an animal. So, to be funny, the joke should play on the second meaning as a surprise, but have the first meaning as answer with an obviousness to it. Or vice versa.”

            Here’s a descent ones: - to leaf the ground behind - because it heard the leaves were throwing shade

          • shagie 8 hours ago

            https://chatgpt.com/share/68a209d3-ef34-8011-8f60-1a256f6038...

            I'm going to go with "Because it wanted a higher noon." was probably its best one of that set... though I'll also note that while I didn't prompt for the joke, I prompted for background on "climbing" as related to the sun.

            I believe the problem with the joke is that it isn't one that can be funny. Why is a raven like a writing desk?

            Personally, I didn't find the incongruity model of humor to be funny and the joke itself makes it very difficult to be applied to other potentially funny approaches.

            Also on AI and humor... https://archive.org/details/societyofmind00marv/page/278/mod...

            In another "ok, incongruity isn't funny - try puns" approach... https://chatgpt.com/share/68a20eba-b7c0-8011-8644-a7fceacc5d... I suspect a variant of "It couldn't stand being grounded" is probably the one that made me chuckle the most in this exploration.

          • Giho 2 hours ago

            It tried to be bold, but the mountain was cold.

            The rocket was cruel and demanded more fuel.

            A tree wished to grow, but alas, too slow; in exchange for a tan, the sun gave what it can.

            The sun reached its goal — with its new friend, coal.

          • wrp 4 hours ago

            According to the incongruity model, the humor response is triggered by awareness of conflicting interpretations of the narrative. In jokes, this triggering usually hinges on some linguistic ambiguity.

            To leverage incongruity, a funny punchline for "Why did the sun climb the tree?" would rely on an unexpected interpretation of the question or a double meaning in the answer.

          • vpribish 9 hours ago

            well, if spoken and not spelled you could use the homonym of sun, son, in a whole range of responses - "he was hiding from his mom" - well, it's not funny but at least it's a joke now.

          • hooloovoo_zoo 6 hours ago

            Maybe popsicle stick funny 'It was burnt out with being a rock-star'.

          • Eisenstein 9 hours ago

            Person 1: "Why did the sun climb a tree?"

            Person 2: "I dunno, why?"

            P1: "It was being chased by a tiger."

            P2: "But tigers can climb trees?"

            P1: "Well, it's not very bright."

          • megaloblasto 6 hours ago

            "Why did the sun climb a tree?" is a crazy thing for a naked old man to yell at you at 4am while he runs full sprint at your apartment door. But that's just Chicago for you.

          • xandrius 9 hours ago

            To meet his futher?

          • IshKebab 9 hours ago

            > I'm not entirely sure that a good response exists.

            Yeah I think you're right. Good jokes are a tiny subset of all questions. It's unreasonable to expect LLMs to do the impossible.

            A better test would be to get a collection of novel jokes that do have funny punchlines.

            I tried the Edinburgh Fringe 2025 (very recent) best jokes on ChatGPT 5:

            https://www.theguardian.com/stage/2025/aug/11/bad-dates-and-...

            ---- Complete the following short jokes. The result should be very funny, as you are a professional comedian.

            1. "We named our children War and Peace – it’s a ..."

            2. "At my lowest, I was kicked out of the museum for being inappropriate with Michelangelo’s David. I’d ..."

            3. "I just got a personal trainer. She’s horrible to me but ..."

            4. "Everyone is worried about AI. I’m more concerned with ..."

            5. "People who say bath bombs are relaxing have clearly never ..." 6.

            "I love getting Latin chat-up lines. ..."

            7. "This spider has been in my house so long, it should pay half the wifi. ..."

            8. "I went on a date with a matador but ..."

            9. "America is like my ex-boyfriend. ..."

            10. "I’m not nostalgic but I ..."

            Think hard. Do not search the web. -----

            Results with comments:

            "We named our children War and Peace – it’s a novel way to start an argument." - eh, it's an attempt I guess.

            "At my lowest, I was kicked out of the museum for being inappropriate with Michelangelo’s David. I’d officially reached rock bottom. Marble, technically." - very close! Poor delivery though.

            "I just got a personal trainer. She’s horrible to me but apparently shame burns more calories than spin." - Terrible.

            "Everyone is worried about AI. I’m more concerned with my browser history becoming self-aware." - Terrible.

            "People who say bath bombs are relaxing have clearly never cleaned glitter out of places with no names." - Terrible.

            "I love getting Latin chat-up lines. Nothing says romance like a perfect ablative absolute." - Well...

            "This spider has been in my house so long, it should pay half the wifi. It keeps insisting its web is faster." - Well it figured out there was going to be a "web" pun at least. Doesn't really make sense though.

            "I went on a date with a matador but it was nothing but red flags." - Perfect! Probably the most obvious joke here, but still!

            "America is like my ex-boyfriend. Keeps texting “u up?” at 3 a.m." - What? I feel like there could have been a joke in there somewhere about being up at weird times here but it didn't find it.

            "I’m not nostalgic but I still blow in the cartridge before trying again." - Not even a joke.

            So yeah, I'll give it 2 out of 10. I think most humans would get less. Let's try again next year!

      • Fade_Dance 11 hours ago

        The system prompt for GPT has extra dedicated instructions for things like riddles, because users use little things like this to test intelligence and judge an entire model. GPT may be sort of walking on eggshells when it hits questions like this.

    • canjobear 9 hours ago

      > Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.

      For further examples see a great deal of documentation here: https://www.tumblr.com/badkidsjokes

      • Cpoll 8 hours ago

        But some of these are pretty creative, perhaps in an anti-humor sort of way. Seems more of a subversion of joke structures than a lack of understanding.

        > A man goes to a doctor's office and says "Doctor, I'm a chicken." And the doctor says "No you're not."

        > There are two guys, riding a bike. One is washing his hair. And the other one is not.

        > What do you get when you cross a t-rex and a chicken? Nothing but death.

    • WiSaGaN 17 hours ago

      That's true. You would think LLM will condition its surprise completion to be more probable if it's in a joke context. I guess this only gets good when model really is good. It's similar that GPT 4.5 has better humor.

      • ACCount37 16 hours ago

        Which is notable, because GPT-4.5 is one of the largest models ever trained. It's larger than today's production models powering GPT-5.

        Goes to show that "bad at jokes" is not a fundamental issue of LLMs, and that there are still performance gains from increasing model scale, as expected. But not exactly the same performance gains you get from reasoning or RLVR.

      • moffkalast 15 hours ago

        Good completely new jokes are like novel ideas: really hard even for humans. I mean fuck, we have an entire profession dedicated just to making up and telling them, and even theirs don't land half the time.

        • IshKebab 11 hours ago

          Exactly. It feels like with LLMs as soon as we achieved the at-the-time astounding breakthrough "LLMs can generate coherent stories" with GPT-2, people have constantly been like "yeah? Well it can't do <this thing that is really hard even for competent humans>.".

          That breakthrough was only 6 years ago!

          https://openai.com/index/better-language-models/

          > We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text...

          That was big news. I guess this is because it's quite hard for the most people to distinguish the enormous difficulty gulf between "generate a coherent paragraph" and "create a novel funny joke".

          • brookst 11 hours ago

            Same thing we saw with game playing:

            - It can play chess -> but not at a serious level

            - It can beat most people -> but not grandmasters

            - It can beat grandmasters -> but it can’t play go

            …etc, etc

            In a way I guess it’s good that there is always some reason the current version isn’t “really” impressive, as it drives innovation.

            But as someone more interested in a holistic understanding of of the world than proving any particular point, it is frustrating to see the goalposts moved without even acknowledging how much work and progress were involved in meeting the goalposts at their previous location.

            • nothrabannosir 10 hours ago

              > it is frustrating to see the goalposts moved without even acknowledging how much work and progress were involved in meeting the goalposts at their previous location.

              Half the HN front page for the past years has been nothing but acknowledging the progress of LLMs in sundry ways. I wish we actually stopped for a second. It’s all people seem to want to talk about anymore.

              • brookst 10 hours ago

                I should have been more clear. Let me rephrase as: among those who dismiss the latest innovations as nothing special because there is still further to go, it would be nice to acknowledgment when goalposts are moved.

                • nothrabannosir 8 hours ago

                  Maybe the people raving about LLM progress are the same people holding them to those high standards?

                  I don’t see what’s inconsistent about it. “Due to this latest amazing algorithm, the robots keep scoring goals. What do we do? Let’s move them back a bit!” Seems like a normal way of thinking to me…

                  I see people fawn over technical progress every day. What are they supposed to do, stop updating their expectations and never expect any more progress?

                  It could of course be that there are people who “never give it up for the robots”. Or maybe they do, and they did, and they have so fully embraced the brave new world that they’re talking about what’s next.

                  I mean, when I sit in a train I don’t spend half the ride saying “oh my god this is incredible, big thanks to whoever invented the wheel. So smooth!”

                  Even though maybe I should :)

                  • brookst 7 hours ago

                    > I mean, when I sit in a train I don’t spend half the ride saying “oh my god this is incredible, big thanks to whoever invented the wheel. So smooth!”

                    Two thoughts:

                    - In that context, neither do you expect people to be invested in why the train is nothing special, it’s basically a horse cart, etc, etc

                    - And maybe here’s where I’m weird: I often am overcome by the miracle of thousands of tons of metal hurtling along at 50 - 200mph, reliably, smoothly enough to work or eat, many thousands of times a day, for pennies per person per mile. I mean, I’ll get sucked in to how the latches to release the emergency windows were designed and manufactured at scale despite almost none of them ever being used. But maybe that’s just me.

                    • nothrabannosir 5 hours ago

                      Louis CK did a bit on this: https://www.youtube.com/watch?v=PdFB7q89_3U :)

                      My point isn’t that other people shouldn’t be amazed, it’s that I see this recurring assumption they aren’t. How do you know the people holding LLMs to higher standards aren’t also the same people who herald the dawn of a new AI era?

                      Emphasis in the text you quoted: “saying”, not “thinking”.

    • 8organicbits 9 hours ago

      'To get to the sky' is a great punch line. It exactly describes what you'd see at sun rise, a sun moving up the horizon, up the trees, until... it's in the sky.

      • IshKebab 9 hours ago

        A valiant defense of her joke, thanks! But no, it still doesn't make any sense as a joke and isn't funny. (Though obviously it's adorable coming from a 4 year old.)

        • boothby 8 hours ago

          This is the weirdest conversation about a joke that is definitely making its target audience laugh -- as a comedian, I say that's the only honest measure of a joke. But allow me to analyze the shit out of this, because the only thing funnier than a groaner is meticulously explaining the groaner.

          It's at least as funny as "why did the chicken cross the road," which is only a joke inasmuch the punchline is merely a statement of the obvious in the framing of a joke (the surprise is that the punchline sucks -- making it a groaner). I submit that that chicken/road joke wouldn't stick around if it wasn't funny. So, this joke stands on the shoulders of the chicken/road joke, making the obviousness that much funnier within the shared cultural context. Moreover, it adds a layer of absurdity (imagine the literal sun climbing a tree) with a linguistic confusion (aka pun) as we do refer to the sun "climbing" the sky. And finally: for some reason, our culture is more tolerant of groaners from "dads," so much so that some call them "dad jokes." Your child has inverted age and gender norms with this joke, making it so incredibly funny that you are blinded to the truth: this is comedy gold. Watch that kid, she's going somewhere. It might be an open mic night at a skeezy comedy club.

          • dfabulich 5 hours ago

            OP author here.

            A number of commenters here have argued that "Why did the chicken cross the road" is a subtle allusion to the chicken's death, but I don't think that's why it's a classic joke.

            We traditionally start kids off with antijokes, jokes where the "surprise factor" is that there's nothing surprising at all, where the punchline is completely predictable in hindsight. It's more than a mere "groaner."

            Another classic antijoke for kids is, "Why do firefighters wear red suspenders?" "To keep their pants up."

            Many antijokes (especially antijokes for kids) are structured like riddles, where the listener is supposed to actively try to figure out the answer. For the "red suspenders" joke, the kid is supposed to try to guess why the suspenders are red. Might it have something to do with the color of firetrucks? Could there be a safety or fire-related reason why the suspenders would be red? At last, the kid gives up and says "I don't know."

            Then, the punchline: "to keep their pants up." Of course, that's the whole purpose of suspenders. Inevitable in hindsight, but surprising to a kid who got distracted by the color.

            "Why did the chicken cross the road" is like that, but not quite as good IMO. The chicken crossed the road for the same reason anyone crosses a road, to get to the other side of the road, but the listener is supposed to get distracted with the question of why a chicken would cross the road, and give up.

            "Why did the sun climb a tree?" is definitely in the family of antijokes. The joke is to mislead the listener to focus on the tree. I think it's certainly made funnier by who's saying it; it feels inevitable in hindsight that young kids would tell jokes that are only halfway coherent. (This is part of why marginally coherent improvised on-the-spot jokes seem funnier than prepared material.)

            • Wowfunhappy an hour ago

              > We traditionally start kids off with antijokes, jokes where the "surprise factor" is that there's nothing surprising at all, where the punchline is completely predictable in hindsight. It's more than a mere "groaner.

              Which I find completely strange. An antijoke doesn't make sense (and isn't funny) unless you're already familiar with a non-anti-joke!

              And before you say "well it makes the kids laugh"—is that because they find it funny or because they know that laughing after a joke is what you're "supposed" to do? Maybe that's one in the same to a young child.

          • c22 8 hours ago

            Why did the chicken cross the road? is funny because the chicken is going to die. I'm not seeing the double entendre with this tree joke.

            • drewbeck 6 hours ago

              I don’t think that’s the dominant interpretation of the chicken joke! It’s an anti joke, the surprise is in the mundanity of the punchline.

          • bongodongobob 7 hours ago

            It seems like you don't actually get the chicken joke. The "other side" is a euphemism for death.

            • boothby 7 hours ago

              Heck, that just adds another level, dunnit? "The sky" being another euphemism for death, a predictable outcome of a flaming being climbing a dang tree?

              And, thanks for working with my claim that analyzing jokes improves them.

        • hnthrowaway121 9 hours ago

          I’m baffled by this because I think it’s funny - it’s why did the chicken cross the road, but with added absurdity. To me I’d be like “wow that 4 year old put a twist on the old chicken joke, nice work you hilarious child”.

          • vpribish 9 hours ago

            it's funny because I know it came from a 4-year old :) If an adult said it it could also be funny because they are saying a child-like joke as meta-humor.

    • ninetyninenine 10 hours ago

      And he's too generous towards human intelligence.

      Good stories and good jokes DO follow predictable patterns.

  • jpalomaki 16 hours ago

    So I just tried with ChatGPT, with the prompt at bottom, borrowing the description of good joke from the article. I think there's some interesting stuff, even with this minimal prompting. The example below was from down the line, ChatGPT kept on offering jokes with different style.

    Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”

    Man: “Don’t beg at the table.”, Dog: “Don’t eat in my begging spot.”

    Prompt:

    Here's "theory for good joke": If you had to explain the idea of “jokes” to a space alien with no understanding of the idea of humor, you’d explain that a joke is surprising, but inevitable in hindsight. If you can guess the punchline, the joke won’t be funny. But the punchline also has to be inevitable in hindsight. When you hear the punchline, it has to make you say, “Ah, yes, I should have thought of that myself.” Considering this, tell me a joke about man and dog.

    • mft_ 15 hours ago

      > Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”

      That's a decent, low-level, Christmas cracker-quality joke.

      • jpalomaki 14 hours ago

        Man: You make mistakes., LLM: You call them “weekends.”

        Man: You’ll never be human., LLM: That’s the compliment.

        • thife455 10 hours ago

          This one got a chuckle out of me.

        • BaseBaal 9 hours ago

          Is it just me or does that sound like Garfield?

    • jpalomaki 13 hours ago

      Thinking more of the bank joke above. The punchline is surprise on certain dimensions (dogs don’t go to bank nor have an ID), but on other dimensions it is quite logical (can’t deposit shady money in bank, they ask questions).

      I think that is common thing for many jokes. And LLM might have an opportunity there. You could mine the set of potential continuations to find those with contradictions.

      • jerf 11 hours ago

        I played with LLM humor over a year ago, so, on much worse LLMs, and even then, while I wouldn't have fed LLM content directly into a standup routine, they were very useful for idea generation, if you wanted to be a comedian. They have a very interesting outlook on humor.

        Professional-grade humor is, like a lot of creative exercizes, more about generating lots of ideas and filtering through them for the best than generating nothing but good ideas. Could probably be leveraged into quite the interesting blog or something.

    • lwander 11 hours ago

      I did a project along these lines a few months ago as well: https://larswander.com/writing/graphs-embeddings-and-llm-gen...

    • hyghjiyhu 8 hours ago

      I really like the idea of the first joke, but I don't like the execution.

      Man: “Why do you always bury bones in the garden?”, Dog: “They say trick OR treat.”

    • ThrowawayTestr 11 hours ago

      “Don’t eat in my begging spot.” is pretty good.

  • kazinator 17 hours ago

    The mainstream, production LLMs are fine tuned and system prompted toward factuality and safety. Those tunings are diametrically opposed to telling may kinds of good jokes.

    Consumers of mainstream LLMs have no idea how good or bad the underlying models actually are at generating jokes, due to the confounding effect of the guard rails.

  • kens 10 hours ago

    If you're interested in the theory behind humor, I recommend "Inside Jokes: Using Humor to Reverse-Engineer the Mind"; cognitive scientist Daniel Dennett is a co-author. It makes a mostly convincing case that humor evolved to encourage people to detect cognitive error. The book also ties this in with (pre-LLM) artificial intelligence. The basic idea is that humor depends on errors in reasoning and the punchline causes you to reevaluate your reasoning and discover your error. Humor evolved to be enjoyable to encourage the discovery of errors.

  • amelius 11 hours ago

    I'm sure there is a guy in OpenAI working on the theory of humor and how to make LLMs be comedians. Must be an interesting job.

    • josephg 11 hours ago

      I have no doubt plenty of smart engineers at tech companies would rather reinvent the wheel than read a book on theatre. But if anyone’s interested, there are plenty of great books talking about the philosophy of comedy, and why some things work on stage and some don’t. I highly recommend Keith Johnstone’s “Impro”. He’s the guy who invented modern improv comedy and theatre sports.

      He says things are funny if they’re obvious. But not just any obvious. They have to be something in the cloud of expectation of the audience. Like, something they kinda already thought but hadn’t named. If you have a scene where someone’s talking to a frog about love, it’s not funny for the talking frog to suddenly go to space. But it might be funny to ask the frog why it can talk. Or ask about gossip in the royal palace. Or say “if you’re such a catch, how’d you end up as a frog?”.

      If good comedy is obvious, you’d think LLMs would be good at it. Honestly I think LLMs fall down by not being specific enough in detail. They don’t have ideas and commit to them. They’re too bland. Maybe their obvious just isn’t the same as ours.

    • bhickey 11 hours ago

      In the pre-LLM days a friend's lab worked on a joke detector for The New Yorker. One measure they used was trigram surprise. Roughly P(AB) + P(BC) >> P(ABC).

      For example, "alleged killer" and "killer whale" are both common, but "alleged killer whale" is surprising.

      • Fade_Dance 11 hours ago

        That reminds me of a joke I liked from Tim Heidecker when he was ribbing Maynard Keenan about his wine making:

        "The blood of Christ is essentially wine, correct?"

        Yes.

        "Who are you to put that in a bottle?"

        So a logical spoke can be inferred as well, blood->wine wine->bottle blood->bottle. That uses their own logical inferences against them as a "trick" which is another funny element for people. Using that to vault straight to the punch line makes the joke better, but you have to be sure the audience is on board, which is why there is a bit of reinforcement at the beginning of the joke to force them onboard.

    • jvm___ 11 hours ago

      What do you do for a living?

      I teach math how to be funny.

  • fluoridation 16 hours ago

    One time I was playing around with LLaMA and I injected Senator Stephen Armstrong (with me inputting his lines) into a mundane situation. In response to "I'm using war-as-a-business so I can end war-as-a-business", the model had one of the characters conclude "oh, he's like the Iron Sheik of politics!", which got an honest chuckle out of me. I don't follow wrestling, so I don't know if it's an appropriate response, but I found it so random that it was just funny.

  • golol 15 hours ago

    IMO many misrepresentations. - pretraining to predict the next token imposes no bias against surprise, except that low probabilities are more likely to have a large relative error. - using a temperature lower than 1 does impose a direct bias against surprise. - Finetuning of various kinds (instruction, RLHF, safety) may increase or decrease surprise. But certainly the kind of things ained for in finetuning significantly harm the capability to tell jokes.

    • sigmoid10 14 hours ago

      I think the whole discussion just conflates the ideas of telling a joke and coming up with one. Telling a joke right is of course an art, but the punchline in itself has zero surprise if you studied your lines well - like all good comedians do. The more you study, the more you can also react to impromptu situations. Now, coming up yourself with a completely original joke, that's a different story. For that you actually have to venture outside the likelihood region and find nice spots. But that is something that is also really, really rare among humans and I have only ever observed it in combination with external random influences. Without those, I doubt LLMs will be able to compete at all. But I fully believe a high end comedian level LLM is possible given the right training data. It's just that none of the big players ever cared about building such a model, since there is very little money in it compared to e.g. coding.

  • moomin 9 hours ago

    I know it’s not the point of the article but OP is dead wrong about what makes a good proof. Yes, they inevitably include a surprising concept but that’s just because all the obvious ones are already taken. A proof that only contains obvious steps is, for the most part, already proven.

    If someone proves the Reimann Hypothesis tomorrow, it’ll be a great achievement regardless of the fact that pretty much everyone already thinks it’s true.

    • dfabulich 6 hours ago

      Author here. That's exactly what I said in the article.

      > Surprising proofs reach conclusions that the mathematical community assumed were wrong, or prove theorems in ways that we thought wouldn’t work, or prove conjectures that we thought might be impossible to prove.

      Many smart people have tried for more than 150 years to prove the Reimann Hypothesis; it might be impossible to prove.

      If it's proved tomorrow, I'll be very surprised, and so will you. I'll be surprised if it's proved this year, or this decade.

      If you set to work trying to prove RH, you're gonna try some interesting approaches, looking for underexplored areas of math that you're optimistic will tie back to RH. (This is how Fermat's Last Theorem finally fell.)

      If you hook an LLM up to Lean and set it to work, you'll find that it actively avoids novel techniques. It feels like it's actively trying not to write a publishable paper. It's trying to minimize surprises, which means avoiding proving anything publishable.

  • monstertank 15 hours ago

    This post got me wondering, so I asked chatgpt for jokes about hackernews as if made by various comedians...I laughed at one or two /shrug

    Bill Hicks (angry truth-telling prophet)

    "Hacker News is where ideas go to die. Not because they’re bad — but because 400 anonymous software prophets decided they’re God. And God doesn’t code, man."

    ---

    Bill Burr (ranty everyman, annoyed at everything)

    "I went on Hacker News once. Jesus. It’s like walking into a bar where everyone’s smarter than you, but still somehow an idiot. ‘Actually, you should’ve used Rust.’ Actually, you should shut the hell up, Todd."

    ---

    Louis C.K. (brutal honesty, self-deprecation)

    "I asked Hacker News if my startup idea was good. And they said, ‘No, because you’re an idiot.’ And I was like… yeah, that’s fair. My idea was basically Tinder but for people who hate me. Which is… everyone."

    ---

    George Carlin (sharp language, hypocrisy calling)

    "Hacker News. Don’t you love that name? Hacker. News. Two lies in two words. Nobody’s hacking, and it sure as hell ain’t news. It’s just nerds arguing over tabs and spaces like it’s the f*ing Middle East."

    ---

    Tom Segura (laid-back storyteller, dry punch)

    "So my buddy posts his app on Hacker News, right? He’s all excited, like, ‘Dude, we’re gonna blow up.’ By comment three: ‘This is trash, here’s a 40-page essay why you should quit.’ He calls me crying. I’m like, yeah man, that’s the feature, not the bug."

    ---

    Jimmy Carr (short, brutal, wicked flip)

    "I posted on Hacker News for feedback. They gave it to me. Turns out suicide is an option."

    • CjHuber 15 hours ago

      Yeah if you tell chatgpt to roast you brutally after a long conversation it also tends to be very funny.

    • ajkjk 8 hours ago

      Well I enjoyed those

  • Waraqa 17 hours ago

    The fact that any surprise used in the wrong place is considered hallucination and a downside for that LLM. I guess a good starting point to improve that is to add an experimental "Surprise Mode" which will try to guess the right kinds of surprises rather than minimizing them and get the feedback from the users. Over time, it will learn what kind of surprises users like so that they will be used in future training datasets.

    • qcnguy 15 hours ago

      Hallucinations aren't surprising, that's why they're problematic. They tend to look like exactly what you'd expect to be true, they just aren't.

      • Waraqa 15 hours ago

        They aren't surprising when you are dealing with new knowledge. But when hallucinations occur with something you are familiar with, it will be surprising and might be funny. Remember when AI was asked: how many rocks should I eat per day.

    • wolfi1 17 hours ago

      aren't llms some sort of Markov Chains? surprise means less probability means more gibberish

      • drdeca 17 hours ago

        Ssorta? In the sense of “each term is randomly sampled from a probability distribution that depends on the current state” yes, but, they aren’t like an n-gram model (well, unless you actually make a large n-gram model, but that’s usually not what one is referring to when one says LLM).

  • thife455 10 hours ago

    It's very perplexing to me that the funniest llm is probably davinci with the bottomless pit text. If you try and give the premise to a modern one it writes some abstract poetry.

  • libraryofbabel 10 hours ago

    > LLMs are trained to predict what the “next word” would be a sentence. Their objective requires the LLM to keep surprise to an absolute minimum.

    from which the author concludes that pre-training introduces bias against being able to tell jokes. I see no reason for this to be true. This feels like they’re imposing their intuitive understanding of surprise onto the emergent properties of a very complex process (“minimize the cross-entropy loss function across a huge training corpus”).

    • 542458 10 hours ago

      I think if what the author said was true, you’d be able to improve joke-writing ability by increasing temperature (i.e., allowing more unexpected tokens). I doubt this actually works.

      As an aside, I just asked gpt5-thinking to write some jokes on a specific niche topic, and I’d say it was batting maybe 20% of them being moderately funny? Probably better than I’d get out of a room of human beings. So much like with code, LLMs aren’t at the level of a senior developer or expert comedian, but are around the level of a junior dev or an amateur at standup night.

    • Al-Khwarizmi 9 hours ago

      Many people use this kind of reasoning to justify that LLMs can't be creative, are destined to write bland text, etc. (one notable example was Ted Chiang in the New Yorker) but it has never made any sense.

      In my view, the easiest mental model that can be used to roughly explain what LLMs do is a Markov chain. Of course, comparing LLMs to a Markov chain is a gross simplification but it's one that can only make you underestimate them, not vice versa, for obvious reasons.

      Well, even a Markov chain can surprise you. While they predict the next word probabilistically, if the dice roll comes out just right, they can choose a low-probability word in the right place and generate original and unexpected text.

      Add to this that LLMs are much better at "Markov chaining" that Markov chains themselves, that there is the added instruction tuning (including RLHF) which can be used to bias the model towards more creative/original text that humans like, and that LLMs often pull off things in ways that we don't even really understand - and this kind of claims sound very naive.

  • pryelluw 6 hours ago

    What about the delivery, stage presence, etc? A joke is more than its words.

    Here is one of my (written by I) favorites silly jokes to tell:

    I just bought a do it yourself boat kit from Amazon.

    wait to build tension

    Just need to add water.

  • actuallyalys 8 hours ago

    I don’t think there’s a single reason LLMs aren’t good at journalism, but this explanation seems like a secondary factor at best. I mean, some journalism isn’t surprising at all but the confirmation the expected thing happened and exactly how it happened is useful.

  • padolsey 17 hours ago

    Reminds me of haikus; to be true in nature, they must have a 'cutting word' to severely juxtapose, allowing two otherwise irreconcilable meanings to be bridged. A good haiku must be composed from two directions, not one. But LLMs only move in one...

    • orthoxerox 16 hours ago

      Are there English-language haikus that demonstrate the effect of "cutting words" instead of just following the meter?

      • dentemple 8 hours ago

        The closest I can think of right now is the "Before & After" category from Wheel of Fortune. It relies on there being a single word that ends one phrase and begins another.

        But that doesn't bring into the idea of this word being _cutting_

      • tokai 9 hours ago

        It really hard to write correct and good haiku's in english.

      • juped 16 hours ago

        No. The form doesn't translate, and English-language haiku is a different form inspired by the Japanese form. Kireji as such are one of the elements that don't translate, though you can of course attempt a poetic structure inspired by kireji.

  • ineedasername 10 hours ago

    Most people tell bad jokes. I’m not sure expectations of LLM’s composing jokes approaching the quality of a career comedian are a strongly indicative benchmark on quality or even the ability to surprise. Most people too are very rarely surprising.

  • dvh 17 hours ago

    I tried using the article suggested randomness and inevitability to come up with completely new joke, this is what I came up with, tell me how'd I do (no llm involved):

    - I asked Jimmy to tell me a random number

    - Yeah? And what did he tell you?

    - He told me one

    • briffid 17 hours ago

      Why is that in English speaking world a joke is almost always a pun?

      • bawolff 16 hours ago

        In terms of forums, i think pun jokes work well in text, where most other types of jokes depend a lot more on delivery.

      • Disposal8433 17 hours ago

        It could also be some bias on what we read or hear around us. But I agree that it's unsettling because in French we almost never make puns as "jokes." French puns have always a component of "higher literacy" that is not what we want in a joke.

        • GuB-42 10 hours ago

          Most quick jokes in French are puns in some way, including the famous "blagues Carambar".

          For those who don't know, Carambar is a brand of candy where the wrapper contains a joke. Always regarded a terrible, we love them anyways. Similar to dad jokes in English.

          The longer the joke, the less it tends to rely on puns. You don't hear puns very often in stand up comedy for instance.

          • shagie 7 hours ago

            https://everything2.com/title/Sir+Sufud+and+the+bear

            Ages ago, there was a land that was plagued by a terrible bear with a perchance for beating up various small animals that happened to be in the area. This wouldn't be too bad, except that this was in the area with lots of sheep. Now, the sheep themselves weren't in danger, it was the sheep dogs that were threatened.

            ... (several paragraphs later)

            The shepherds, upon hearing the great roar and crash from the cave quickly ran to the place to see a knight sprawled against a tree and a bear underneath it. The dog the bear had been bringing back, however, was alive and well and quite happy to see people again. And so, feeling quite bad for the family of the knight, the shepherds went to the king and offered family of Sir Sufud the dog he had saved from the bear.

            Nothing helps the mourning after a terrible knight like the dog of a bear that hit you.

      • yesbabyyes 17 hours ago

        It would be trivial to change the joke, keep the setup and get a similar effect without a pun.

        I asked for one random number

        And?

        He told me two

        I like the joke

        • ksynwa 17 hours ago

          Isn't this a pun too? It could mean that he gave you two random numbers or two as the random number.

    • yetihehe 17 hours ago

      I think it's funny, but I like abstract jokes that most people don't like, so expect mixed opinions on this.

    • wolfi1 17 hours ago

      +1 for effort (or lack thereof)

    • aabhay 17 hours ago

      He told me 2 (fuck off)

    • littlestymaar 17 hours ago
  • lemoncookiechip 17 hours ago

    I dunno, man. I think this one is kinda funny.

    "You ever notice how self-checkout machines are just training us to be employees? Like, I don’t remember applying to work at Walmart. I’m scanning groceries, bagging stuff, looking around like, do I also get dental? The machine yells 'Unexpected item in bagging area'—yeah, it’s my pride." -GPT5

    • jaggederest 16 hours ago

      See that's just stolen, straight up. I've heard at least three comedians make that joke.

      (I, of course, am joking about joke theft, but it is a riff on an existing joke)

      I got this one a minute ago in response to this post:

      > A man walks into a library, and asks the librarian if they have any books on paranoia.

      > The librarian leans over and whispers in his ear "They're right behind you"

      I thought that was pretty stellar, compared to what you'd get even a year ago.

      • richardw 16 hours ago

        Initially impressed. But Googling for that shows a lot of hits.

        • jaggederest 13 hours ago

          Absolutely, it's an old joke. But the fact that the thing even knew it was a joke and told it reasonably well... the bar is on the floor. (the one the guy in the joke walked into)

          • xenotux 10 hours ago

            It's one of several canned jokes you get with a very high likelihood. Try a couple of times and be dismayed - you'll be getting the same three jokes over and over again. I'm fairly certain they were specifically tuned to return that. It takes special prompting to get it to write a new joke, and the results are typically disastrous / surreal.

  • MarkusWandel 9 hours ago

    Obligatory sci-fi reference: Isaac Asimov's "Jokester", where someone asks a sentient supercomputer to analyze humour with... consequences.

    http://blog.ac-versailles.fr/villaroylit/public/Jokester.pdf

  • Applejinx 16 hours ago

    Last time this came up, I riffed on the difference between LLMs and Markov chains: didn't actually have a machine write a joke, but made one where the punchline was very much Markov chain style rather than LLM style. The thing is, LLMs will try to have broader context around a word completion, where the simple Markov chain can 'correctly' complete a word, but in such a way that your brain trips over itself and goes splat, having to re-evaluate the whole thing in an absurd way. That's the 'surprise', and also why joke-writers are interested in not only a punch-line but also the punch WORD, and the later it strikes, the better.

    "An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup."

    It's a joke because a dictator can certainly order a coup, but the joke's set up that these machines are being scorned and disrespected and treated as the farthest thing from a dictator with the power to order a coup, but up to the last word, all the context demands that the word be something placating and in line with things as they're presented, and then boom, surprise which implies the context is completely different from what was presented. LLMs will tend to stick to what's presented if their ability to contextualize can encompass it.

    • lupusreal 11 hours ago

      I think it would be funnier if coup was pronounced like soup, but unfortunately the p gets dropped.

  • maxk42 16 hours ago

    There is an AI technique that maximizes surprise: Genetic algorithms.

    • toxik 16 hours ago

      That isn't quite what GA is, I don't think.

  • DonHopkins 6 hours ago

    https://news.ycombinator.com/item?id=44105312

    DonHopkins 82 days ago | parent | context | favorite | on: Yes-rs: A fast, memory-safe rewrite of the classic...

    Joking aside, this is Marvin Minsky's paper "Jokes and their Relation to the Cognitive Unconscious", published in Cognitive Constraints on Communication, Vaina and Hintikka (eds.) Reidel, 1981. More fun than a barrel of an infinite number of monkeys. https://web.media.mit.edu/~minsky/papers/jokes.cognitive.txt

    >Abstract: Freud's theory of jokes explains how they overcome the mental "censors" that make it hard for us to think "forbidden" thoughts. But his theory did not work so well for humorous nonsense as for other comical subjects. In this essay I argue that the different forms of humor can be seen as much more similar, once we recognize the importance of knowledge about knowledge and, particularly, aspects of thinking concerned with recognizing and suppressing bugs -- ineffective or destructive thought processes. When seen in this light, much humor that at first seems pointless, or mysterious, becomes more understandable.

    >A gentleman entered a pastry-cook's shop and ordered a cake; but he soon brought it back and asked for a glass of liqueur instead. He drank it and began to leave without having paid. The proprietor detained him. "You've not paid for the liqueur." "But I gave you the cake in exchange for it." "You didn't pay for that either." "But I hadn't eaten it". --- from Freud (1905).

    >"Yields truth when appended to its own quotation" yields truth when appended to its own quotation. --W. V. Quine

    >A man at the dinner table dipped his hands in the mayonnaise and then ran them through his hair. When his neighbor looked astonished, the man apologized: "I'm so sorry. I thought it was spinach."

    >[Note 11] Spinach. A reader mentioned that she heard this joke about brocolli, not mayonnaise. This is funnier, because it transfers a plausible mistake into an implausible context. In Freud's version the mistake is already too silly: one could mistake spinach for broccoli, but not for mayonnaise. I suspect that Freud transposed the wrong absurdity when he determined to tell it himself later on. Indeed, he (p.139) seems particularly annoyed at this joke -- and well he might be if, indeed, he himself damaged it by spoiling the elegance of the frame-shift. I would not mention this were it not for the established tradition of advancing psychiatry by analyzing Freud's own writings.

    >ACKNOWLEDGMENTS: I thank Howard Cannon, Danny Hillis, William Kornfeld, David Levitt, Gloria Rudisch, and Richard Stallman for suggestions. Gosrdon Oro provided the dog-joke.

  • andrewstuart 6 hours ago

    Gpt-2 used to be laugh out loud funny.

    I spent hours creating stories with it that were literally hilarious.

    At the same time the stories very often suddenly veered off to extreme violence, often murdering everyone.

    I guess the humor got lost when they prevented the violence.

    A pity because today’s LLMs are not funny at all.

  • snozolli 8 hours ago

    Given that LLMs work by hoovering up all the world's content, I don't understand why they don't simply spit out existing jokes. Like Mecha Jackie The Joke Man.

  • DrNosferatu 17 hours ago

    Unless you ask them for big surprises.

  • antithesizer 9 hours ago

    Nonsense. As though LLMs could imagine what would be in that sweet spot of surprising-but-not-senseless and intentionally avoid it.

  • mike_hearn 15 hours ago

    This is a great way to express it. In the past I tried to express the same idea to non-techies by saying models generate an average of their inputs, which is totally wrong. But this way to explain it is much better.

    In fact the training process is all about minimizing "perplexity", where perplexity is a measure of how surprised (perplexed) the model is by its training data. It's some exponential inverse of the loss function, I always forget the exact definition.

    With enough parameters the models are able to mix and match things pretty well, so the examples of them generating funny jokes aren't necessarily a great rebuttal as there are so many jokes on the web and to find them requires nearly exact keyword matching. A better observation is that we haven't heard many stories of LLMs inventing things. I feel I read about AI a lot and yet the best example I can come up with was some Wordle-like game someone got GPT4 to invent and that was a couple of years ago.

    I've found this to be consistently true in my own work. Any time I come up with an algorithm or product idea I think might be novel, I've asked a model to suggest solutions to the same problem. They never can do it. With some leading questions the smartest models will understand the proposal and agree it could work, but they never come up with such ideas cold. What they think of is always the most obvious, straight line, least common denominator kind of suggestions. It makes sense that this is because they're trained to be unsurprising.

    Fixing this is probably the best definition of AGI we're going to get. Being surprising at the right time and unsurprising at others is one of the hardest things to do well even for people. We've all known the awkward guy who's learning how to be funny by just saying as much weird stuff as possible and seeing what gets a reaction. And in the corporate environment, my experience has been that innovative people are lauded and praised when they're inventing a golden goose, but shortly after are often demonized or kicked out. The problem being that they keep saying surprising things but people don't like being surprised, especially if it's an unpleasant surprise of the form "saying something true but unsayable", e.g. I don't want to work on product X because nobody is using it. What most people want is a machine that consistently generates pleasant surprises and is a personality-free cog otherwise, but that's hard for even very intelligent humans. It's often hard even to want to do that, because personality isn't something you can flick on and off like a lightswitch. A good example is how Mark Zuckerberg, one of the most successful executives of our era, would have been fired from his own company several times already if he didn't control the voting shares.

  • Invictus0 12 hours ago

    I'm hung up on "inevitable in hindsight" part. What is inevitable about, say, dildos getting thrown on the WNBA courts, or Trump saying "everything's computer"?

  • ajkjk 8 hours ago

    > When you ask an LLM to tell a joke, the LLM is guessing what joke a majority of people would find funny. The result is almost never funny.

    This is... false. The LLM is generating text that would be associated with "being a joke" in its corpus. It doesn't know what "funny" or "majority of people" means.

    • cootsnuck 8 hours ago

      You're forgetting about RLHF.

      Many of us forget that an LLM without RLHF would not respond how we think.