Coarse Is Better

(borretti.me)

128 points | by _dain_ 6 hours ago ago

56 comments

  • raincole 5 hours ago

    It's ridiculous lol.

    Midjourney is optimized for beautiful images, while Nano Banana is optimized for better prompt adherence and (more importantly) image editing. It should be obvious for anyone who spent 20 minutes trying out these models.

    If your goal is to replace human designers with cheaper options[0], Nano Banana / ChatGPT is indefinitely more useful than Midjourney. I'd argue Midjourney is completely useless except for social media clout or making concept art for experienced designers.

    [0]: A hideous goal, I know. But we shouldn't sugarcoat it: this is what underpin the whole AI scheme now.

    • jamblewamble 34 minutes ago

      It is what has underpinned all of human progress towards automation. It isn't a bad thing. Every time we automate something the luddites cry out about the coming mass unemployment. It has never happened.

      • pchangr 18 minutes ago

        It has happened. There is a related term we use which is related to a historical fact .. see https://en.wikipedia.org/wiki/Luddite

      • malnourish 17 minutes ago

        What other automations have been hyped to automate and replace so many different types of jobs at once?

        Whether or not it comes to fruition, it's making large portions of society feel uneasy, and not just programmers, or artists, or teachers.

      • vlovich123 19 minutes ago

        Except all the manufacturing jobs got shipped overseas and now those people are Walmart greeters or similar unskilled labor. Having a shit job isn’t unemployment but it’s not a huge step up

      • JeremyNT 17 minutes ago

        The promise is to automate the drudge work, freeing people to pursue their passions.

        Like, you know... creating art.

  • airstrike 3 hours ago

    I'm no image gen expert but these prompts are downright terrible even by my standards.

    Are you really complaining that ", from the British Museum." leads to it a painting in the actual British Museum? Just remove the sentence, and you'll be fine. Now good luck trying to make Midjourney place the image at the museum!

    I'm a paying MJ user and am impressed by Nano Banana. They're different models. They each serve their purpose.

    This analysis is just noise. Yawn.

    Ironically, even an LLM with its fake reasoning capabilities can point out the issue with the prompts if you ask it to critique this article.

    • wrsh07 an hour ago

      It is interesting what the nbp model takes away from the prompt, though

      Eg instead of focusing on the artist, it focuses on the location

      This makes sense! I imagine it was trained in some sort of rlvr like way where you give it a prompt and then interrogate "does this image ..." (where each question examines a different aspect of the prompt)

      It's obviously an incredible model. I think there's a limit to how useful another article praising it is in contrast with one expressing frustration

      I would also welcome someone writing a short takedown where they fix the prompts and get better-than-2022 results from nbp

  • pornel 4 hours ago

    The author is using special prompts exploiting flaws of the old models, and doesn't like that new models interpret the hacks literally instead.

    The new models have prompt adherence precise enough to distinguish what "British Museum" or "auction at Christie's" is from the art itself, instead of blending a bag of words together into a single vector and implicitly copying all of the features of all works containing "museum" or "ArtStation" in their description.

    • RHSeeger 4 hours ago

      The prompts bothered me a lot, too. I don't do a lot of work with AI, but

      > A painting sold at Sotheby's

      and

      > A painting in the style of something that would be sold at Sotheby's

      convey very different meaning (to me).

  • dleeftink 3 hours ago

    Eno applies:

    > It's the sound of failure: so much modern art is the sound of things going out of control, of a medium pushing to its limits and breaking apart. The distorted guitar sound is the sound of something too loud for the medium supposed to carry it. The blues singer with the cracked voice is the sound of an emotional cry too powerful for the throat that releases it. The excitement of grainy film, of bleached-out black and white, is the excitement of witnessing events too momentous for the medium assigned to record them.

    • 2b3a51 2 hours ago

      And

      > "By the time a whole technology exists for something it probably isn't the most interesting thing to be doing."

      • stephantul 32 minutes ago

        Where did you get this from? Searching for it, in a weird irony I guess, just leads me back to this post.

  • airza 5 hours ago

    Years of refinement on the taste of people with no taste has produced a model with no taste. Crazy

    • Undertow_ 3 hours ago

      it's not shocking that this is the result of "art" from people that think complexity and accuracy are the only qualifying factors.

    • drob518 5 hours ago

      I tasted the model, but then I spit it right back out.

      • mcpeepants 4 hours ago

        they put a special coating on the model to discourage this behavior

  • andy99 4 hours ago

    You’re definitely on to something, people wouldn’t criticize as much as they are otherwise, they’d ignore it.

    I think the whole point is that in optimizing for instruction following and boring realism we’ve lost what could have been some unique artistic elements of a new medium, but anyway.

  • spaceman_2020 5 hours ago

    While I don’t disagree with the author, these are simply two completely different tools with different use cases. Nano Banana Pro throws out fantastic images you can actually use in your marketing right away. It’s not an art tool - it’s a business tool

    As long as the older tools still exist to make art, I don’t see what the problem is. Use NBP to make your marketing pics, MJv2 for your art

  • yoan9224 an hour ago

    The author's prompts are fighting against what Nano Banana was optimized for. Saying "British Museum" to MJv2 worked because it blurred all images tagged with museums into the aesthetic. NBP interprets it literally: show me something IN a museum.

    This isn't worse - it's different. MJv2 was a happy accident machine. NBP is a precision tool.

    If you want the coarse aesthetic, prompt for it: "rough brushstrokes, visible canvas texture, unfinished edges, painterly, loose composition". NBP will give you exactly that because it actually understands what you're asking for.

    The real lesson: we're in a transition period where prompting strategies that exploited old model quirks no longer work. That's fine - we just need to adapt our prompting to match what the model was designed to do.

  • recursivecaveat an hour ago

    Maybe it's better that this author is using LLMs because they would be an immensely frustrating client for an artist. Asks for futurism: complains about getting it. Wants bright colors: refuses to ask. Parts of the request are supposed to be evocative and parts are supposed to be literal, who knows which.

  • amram_art 3 hours ago

    The problem is not in the image models rather the training data and its context. "British museum" for MJ is the image source, "British museum" is the setting for Nano Banana.

  • TrueDuality 3 hours ago

    I love the inherent wonder and joy in this post around the original images.

  • Demiurge 5 hours ago

    I don’t see splashes of primary color as more artistic. Anyway, what if you just ask it “more coarse”? I see impressive depth in the latest outputs, but as with all technically proficient performers, you might just have to consciously scale it back.

  • only-one1701 4 hours ago

    AI doesn’t make art. The OP is trying to fit the square peg of their intuitive understanding about the art creation process into the round hole of generating it via AI

    • jellyroll42 3 hours ago

      Correct! The process and struggle of creation is a large part of what makes art art. Removing friction from the process makes something artless.

      • card_zero 3 hours ago

        Yes, but: when I was young I used to love photorealism and hyperrealism, which is super-smooth-and-shiny art that conceals its process in order to awe simpletons. Then I bought an airbrush, and then true color computer graphics happened, and soon after that I began to appreciate brush strokes and the texture of pen marks and the idea of the personality of the artist's hand. But that doesn't mean the process-hiding stuff is non-art, or even bad art. What's wrong with creating an amazingly convincing illusion, wasn't that always the goal, historically? Also there are no prizes for effort, and if your artwork is only struggle, I don't want to see it. Unless you're really badass about it.

        • nehal3m 2 hours ago

          I really like Cory Doctorow’s description of why it feels empty, quote:

          “Herein lies the problem with AI art. Just like with a law school letter of reference generated from three bullet points, the prompt given to an AI to produce creative writing or an image is the sum total of the communicative intent infused into the work. The prompter has a big, numinous, irreducible feeling and they want to infuse it into a work in order to materialize versions of that feeling in your mind and mine. When they deliver a single line's worth of description into the prompt box, then – by definition – that's the only part that carries any communicative freight.”

          • card_zero 2 hours ago

            OK, but then there's the possibility of reestablishing the bandwidth by selecting the output. If the artist selects one AI image from hundreds, that's like photography, or collage, or "found sculpture" if you can dig it. Then we can do away with the need for hundreds of versions by saying that the artist selected this image from among all the assorted sights seen during the day to frame as art and present to the viewer, and that's just like picking a preferred version from among hundreds, and thus is just like crafting an image. Tenuously. (This falls apart because the selectivity of the selection isn't good enough, I guess. But the process - throwing away bad ideas as you go along - is just like drawing.)

            • nehal3m 2 hours ago

              Sort of. It’s like selecting from hundreds of versions of a letter of reference that word the same three bullet points slightly differently. It still feels empty to me, but I guess that’s personal.

              • card_zero 2 hours ago

                I reckon it's not personal, and you and Doctorow are objectively correct, but the explanation isn't great.

        • greekrich92 40 minutes ago

          Art that takes tremendous effort but looks effortless isn't negated by my comment. The process and struggle is still there.

  • chrismsimpson 4 hours ago

    Is some kind of MoE or routing (but for image models obviously), depending on the prompt ask, a possible solve?

  • BoredPositron 3 hours ago

    The OP would likely prefer Disco Diffusion if they want their art to remain coarse. Modern models possess advanced spatial understanding and adhere strictly to prompts, whereas the OP is using unstructured inputs better suited for older models with CLIP or T5 encoders that lack that spatial awareness. These legacy prompting styles are incompatible with Gen3 models that utilize VLMs as text encoders. If the OP wants to explore modern architecture, they should use Flux.2 with a LoRA or perhaps a coarser model like Zit if they prefer to rely solely on text conditioning. Nano Banana Pro requires extremely long and distinctive prompting to achieve specific aesthetics. His blog post shows a lack of understanding and a lack of adaption to modern architecture which would be fine if it wasn't that dismissive.

    Here is an image from NBP with an adapted prompt for Italian futurism: https://imgur.com/a/4pN0I0R

    and for Kowloon:

    https://imgur.com/a/rDT8dfP

  • smurda 3 hours ago

    Another word for coarse is impasto technique, where the paint is so thick the painting-knife or brush strokes are visible and leave a pronounced texture (e.g. Van Gogh, Rembrandt).

    Another cool prompt could be specific painting techniques (e.g. pencil shading, glaze) as if you were training an actual artist in a specific technique.

    • flir an hour ago

      Just asked sora for an impasto image of a coca cola bottle. But it still came out looking like a coca cola ad/AI art. Super glossy, slick, meaningless. It didn't look like paint. (And the logo wasn't impasto, which I thought was interesting - I guess that logo's utterly ingrained in the model, it's seen it so many times).

  • Zak 5 hours ago

    The author claims the old models are better at creating art than the new ones. I disagree; art requires consciousness and intent while this type of model is capable of neither.

    • LatencyKills 5 hours ago

      I define art as something that evokes an emotion or feeling. I’ve seen people wax poetic about the ”meaning” of an imagine only to find out that the image was created synthetically.

      Were those “feelings” not authentic?

      • neonnoodle 5 hours ago

        If I see a cloud in the shape of my childhood dog and start to cry, is the cloud art?

        • rtldg 4 hours ago

          Yes. The Earth and its formations are art. I disagree that art requires consciousness and intent, but those admittedly do improve its value [to me]. (For reference, I value AI content/art poorly and avoid it)

          • only-one1701 4 hours ago

            Everything is art, fantastic. I see nothing wrong with this definition.

            • card_zero 4 hours ago

              We have at least established that very boring pieces, such as Andy Warhol's Empire, Kazimir Malevich's White on White, and John Cage's As Slow As Possible, are not art.

              • only-one1701 3 hours ago

                Bad code is still code. A painting of code is not code.

                • card_zero 3 hours ago

                  I think you're saying bad art is still art, but I'm unsure what to do with the second sentence. I'm toying with "an encoding of art is not art", which might mean that art has to be available to an audience.

      • zelphirkalt 3 hours ago

        I don't think it is about the feelings or emotions evoked in the observer. At least not in that generality. It only is, if there is an intention in the creating process of the art, that aims at evoking the emotions or feelings. Otherwise going by the more general definition, many everyday objects become art. Home becomes art. The way to the office becomes art, even if it completely sucks.

      • greekrich92 35 minutes ago

        If someone lies and convinces you that a loved one has died and you cry, were those feelings authentic?

        Art that provokes emotion in a cheap or manipulative way is often, if not always, bad art.

      • only-one1701 4 hours ago

        Is a car crash art?

        • RHSeeger 4 hours ago

          A drawing/painting of a car crash certainly can be

          https://www.etsy.com/listing/4329570102/crash-impact-car-can...

          As can a photo of one (sorry, I don't have a good example of that).

          And, both a camera and AI are an example of "using a tool to create an image of something". Both involve a creator to determine what picture is created; but the tool is central/crucial to the creation.

          • only-one1701 3 hours ago

            I would never argue that a painting of a car crash couldn’t be art. It’s funny your bringing up that a camera is a tool for creating art; I also hold photographic art in lower esteem than other kinds of visual art (though I still think some kind of photography can be art).

            At a certain point, we need to be realistic about the amount of effort involved in artistic creation. Here’s a thought experiment: someone puts two paintings in a photocopier and makes a single sheet of paper with both paintings. Did that person create art? They certainly had the vision to put those two specific paintings together, and they used a tool to create that vision in reality!

            • card_zero 3 hours ago

              It's going to be "creativity" (another hazy definition!) rather than effort, though. Photography, often said to be all about framing, seems very low effort. You might take one lucky snap. Then the effort can be claimed to be in years of getting ready to be lucky, which is a fair point, but that displaced effort isn't really in the specific photo. Besides, maybe you're a very happy photographer, loved every minute of learning your craft, and found it no effort at all, just really interesting.

              • tormeh an hour ago

                Yeah, photography (editing aside) is about having taste and getting lucky. A good photographer can of course raise their odds of getting lucky, but still. There's some technique in there too, but that's really not all that complicated. That said, I think few things match a good photo. There's something about a photo subject being real that I find fascinating. A photo exhibition does not display the imagination of the photographers, but rather the incredible in the real world.

          • card_zero 3 hours ago

            When I was about 12 a car crashed in my quiet street (somebody tried to drive it through a concrete fence), so the next day I sat in the street and did an ink drawing of the wreckage with a mapping pen nib. That was excellent art. Then I stole one of the gigantic suspension springs and took it home to use as a stool, which by some silly definitions was also an act of art. But this all evades the original question about whether the actual car crash is art for evoking feelings, or whether art in fact must involve pictures, or human communication, or what. It's one of the impossible definitions, along with "intelligence" and "freedom". I'm a fan of "I know it when I see it".

        • card_zero 4 hours ago

          Perhaps it has to be a more sophisticated emotion, such as feeling tired of a hackneyed definition.

    • CuriouslyC 3 hours ago

      I'm pretty sure people have created images via random physical processes, then selected the best ones, and people have called it "art." That's no different than cherry picking AI generated images that resonate. The only difference is the anti-generative AI crusade being spearheaded by gatekeepers who want to keep their technical skills scarce in their own interests.

      • zelphirkalt 3 hours ago

        I think one could still point out a little difference: Random physical processes do usually not involve mix and matching millions of other people's works. Instead, something new in every aspect and its origin can emerge.

        It feels like AI art is often just a version of: "I take all the things and mix them! You can't tell which original work that tree is taken from! Tiihiiihi!"

        Where "tree" stands for any aspect of arbitrary size. The relationship is not that direct, of course, because all the works gen AI learns from kind of gets mixed in the weights of edges in the ANN. Nevertheless, the output is still some kind of mix of the stuff it learned from, even if it is not necessarily recognizable as such any longer. It is in the nature of how these things work.

  • delis-thumbs-7e 3 hours ago

    Just fucking by canvas, brushes and good quality oil paint. You need only five colours[1]. Cost you maybe 50-80 euros. And any mess you produce will give you more joy thanand shot produced by any clanker brain. Keep at it for few years, take evening classrs, look tutorials and you have learned yourself a skill. You can now travel to any majos art museum across the world and have a discussion with masters through their works hanging on the wall.

    And you will also see how fucking sad and inferior all these ai images are. Really, trust me, please. There is more to art than this. There is more to life.

    [1] https://www.youtube.com/watch?v=f7F67FsLaaY