> On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.
Yeah, agree. I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for? Like, in terms of art, it's discarded (art is about humans), in terms of assets: sure, but people is getting tired of AI-generated images (and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing). Ads? C'mon that's depressing.
What else? In general, I think people are starting to realize that things generated without effort are not worth spending time with (e.g., no one is going to read your 30-pages draft generated by AI; no one is going to review your 500 files changes PR generated by AI; no one is going to be impressed by the images you generate by AI; same goes for music and everything). I think we are gonna see a Renaissance of "human-generated" sooner rather than later. I see it already at work (colleagues writing in slack "I swear the next message is not AI generated" and the like)
> I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for?
I feel like this is something people in the industry should be thinking about a lot, all the time. Too many social ills today are downstream of the 2000s culture of mainstream absolute technoöptimism.
Vide. Kranzberg's first law--“Technology is neither good nor bad; nor is it neutral.”
Completely unrelated, but I am curious about your keyboard layout since you mistyped ö instead of - these two symbols are side by side in the Icelandic layout, and the ö is where - in the English (US) layout. As such this is a common type-o for people who regularly switch between the Icelandic and the English (US) layout (source: I am that person). I am curious whether more layouts where that could be common.
This is also a stylistic choice that the New Yorker magazine uses for words with double vowels where you pronounce each one separately, like coöperate, reëlect, preëminent, and naïve. So possibly intentional.
Yes, this is exactly correct, and I will die on this hill. Additionally, I don't like the way a hyphenated "techno-optimism" looks and "technOOPtimism" is a bit too on-the-nose.
That makes sense[1] but it prompts the obvious question: does this style write it as typeö then?
1: Though personally I hate it, I just cannot not read those as completely different vowels (in particular ï → [i:] or the ee in need; ë → [je:] or the first e here; and ö → [ø] or the e in her)
I can’t design wallpapers/stickers/icons/…, but I can describe what I want to an image generation model verbally or with a source photo, and the new ones yield pretty good results.
For icons in particular, this opens up a completely new way of customizing my home screen and shortcuts.
Not necessary for the survival of society, maybe, but I enjoy this new capability.
So we get a fresh new cheap way to spread propaganda and lies and erode trust all across society while cementing power and control for a few at the top, and in return get a few measly icons (as if there weren’t literally thousands of them freely available already) and silly images for momentaneous amusement?
Are you asking if the 10 seconds it takes AI to generate an image is more costly to the environment than a commissioned graphics artist using a laptop for 5-6 hours, or a painter who uses physical media sourced from all over the world?
A modern laptop is running almost fanless, like a 486 from the days of yore.
A single H200 pumps out 700W continuously in a data center, and you run thousands of them.
Also, don't forget the training and fine tuning runs required for the models.
Mass transportation / global logistics can be very efficient and cheap.
Before the pandemic, it was cheaper to import fresh tomatoes from half-world away rather than growing them locally in some cases. A single container of painting supplies is nothing in the grand scheme of things, esp. when compared with what data centers are consuming and emitting.
Cheaper/faster tech increases overall consumption though. Without the friction of commissioning a graphics artist to design something, a user can generate thousands of images (and iterate on those images multiple times to achieve what they want), resulting in way more images overall.
I'm not really well versed on the environmental cost, more just (neutrally) pointing out that comparing a single 10s image to a 5-6 hour commission ignores the fact that the majority of these images probably would never have existed in the first place without AI.
Also, ignoring training when talking about the environmental costs is bad faith. Without training this image would not exist, and if nobody generating images like these, the training would not happen. So we should really ask, the 10 seconds it took for inference, plus the weeks or months of high intensity compute it took to train the model.
I work with direct liquid cooled systems. If the datacenter is working with open DLC systems (most AI datacenters in the US in fact do), there's a lot of water is being wasted, 7/24/365.
A mid-tier top-500 system (think about #250-#325) consumes about a 0.75MW of energy. AI data centers consume magnitudes more. To cool that behemoth you need to pump tons of water per minute in the inner loop.
Outer loop might be slower, but it's a lot of heated water at the end of the day.
To prevent water wastage, you can go closed loop (for both inner and outer loops), but you can't escape the heat you generate and pump to the atmosphere.
So, the environmental cost is overblown, as in Chernobyl or fallout from a nuclear bomb is overblown.
The problem is you don't just use that water and give it back.
The water gets contaminated and heated, making it unsuitable for organisms to live in, or to be processed and used again.
In short, when you pump back that water to the river, you're both poisoning and cooking the river at the same time, destroying the ecosystem at the same time too.
To reiterate, I work in a closed loop DLC datacenter.
Pipes rust, you can't stop that. That rust seeps to the water. That's inevitable. Moreover, if moss or other stuff starts to take over your pipes, you may need to inject chemicals to your outer loop to clean them.
Inner loops already use biocides and other chemicals to keep them clean.
Look how nuclear power plants fight with organism contamination in their outer cooling loops where they circulate lake/river water.
Depends on if you believe it will ever become cheaper. Either hardware, inspiring more efficient smaller models, or energy itself. The techno optimist believes that that is the inevitable and investable future. But on what horizon and will it get “zip drived” before then?
The issue is that the signalling makes sense when human generated work is better than AI generated. Soon AI generated work will be better across the board with the rare exception of stuff the top X% of humans put a lot of bespoke highly personalized effort into. Preferring human work will be luxury status-signalling just like it is for clothing, food, etc.
I think "better" is doing a lot of heavy lifting in this argument. Better how?
Is an AI generated photo of your app/site going to be more accurate than a screenshot? Or is an AI generated image of your product going to convey the quality of it more than a photo would?
I think Sora also showed that the novelty of generating just "content" is pretty fleeting.
I would be interested to see if any of the next round of ChatGPT advertisements use AI generated images. Because if not, they don’t even believe in their own product.
I'm probably in a weird subgroup that isn't representative of the general public, but I've found myself preferring "rough" art/logos/images/etc, basically because it signals a human put time into it. Or maybe not preferring, but at least noticing it more than the generally highly refined/polished AI artwork that I've been seeing.
There’s no reason to think people broadly want “better” writing, images, whatever. Look at the indie game scene, it’s been booming for years despite simpler graphics, lower fidelity assets, etc. Same for retro music, slam poetry, local coffee shops, ugly farmers market produce, etc.
There is a mass, bland appeal to “better” things but it’s not ubiquitously desired and there will always be people looking outside of that purely because “better” is entirely subjective and means nothing at all.
Only novel art is interesting. AI can't really do novel. It's a prediction algorithm; it imitates. You can add noise, but that mostly just makes it worse. It can be used to facilitate original stuff though.
But so many people want to make art, and it's so cheap to distribute it, that art is already commoditized. If people prefer human-created art, satisfying that preference is practically free.
AI can be novel, there is nothing in the transformer architecture which prohibits novelty, it's just that structurally it much prefers pattern-matching.
But the idea of novelty is a misnomer I think. Any random number generator can arbitrarily create a "novel" output that a human has never seen before. The issue is whether something is both novel and useful, which is hard for even humans to do consistently.
Anthropic recently changed their take-home test specifically to be more “out-of-distribution” and therefore more resistant to AI so they can assess humans.
I’m so tired of “there’s nothing preventing”, and “humans do that too”. Modern AI is just not there. It’s not like humans and has difficulties with adapting to novelty.
Whether transformers can overcome that remains to be seen, but it is not a guarantee. We’ve been dealing with these same issues for decades and AI still struggles with them.
The issue being, it's not an expression of anything. Merely like a random sensation, maybe some readable intent, but generic in execution, which isn't about anything even corporate art should be about. Are we going to give up on art, altogether?
Edit: One of the possible outcomes may be living in a world like in "Them" with glasses on. Since no expression has any meaning anymore, the message is just there being a signal of some kind. (Generic "BUY" + associated brand name in small print, etc.)
Because I'm not an artist and can't afford to pay one for whatever business I have? This idea that only experts are allowed to do things is just crazy to me. A band poster doesn't have to be a labor of love artisanal thing. Were you mad when people made band posters with MS word instead of hiring a fucking typesetter? I just don't get it.
I dunno, I have some band posters that are pretty cool pieces of art that obviously had a lot of thought put into them (pre-AI era stuff). I don't think I'd hang up an AI generated band poster, even if it was cool; I'd feel weird and tacky about it.
I was hosting a Karaoke event in my town and really went out of my way to ensure my promotional poster looked nothing like AI. I really really really did not want my townfolks thinking I would use AI to design a poster.
My design rules were: No gradients; no purple; prefer muted colors; plenty of sharp corners and overlapping shapes; Use the Boba Milky font face;
> can't afford to pay one for whatever business I have
At small scales what "art" does your business need? If you can't afford to hire an artist (which is completely fine, I couldn't for my business!) do you really need the art or are you trying to make your "brand" look more polished than it actually is? Leverage your small scale while you can because there isn't as much of an expectation for polish.
And no, a band poster doesn't have to be a labor of love. But it also doesn't have to be some big showy art either. If I saw a small band with a clearly AI generated poster it would make me question the sources for their music as well.
I think you're misunderstanding - most people's beef with AI art isn't that it "isn't made by experts", it's that
1) it's made from copyrighted works, and the original authors receive no credit;
2) it is (typically) low-effort;
3) there are numerous negative environmental effects of the AI industry in general;
4) there are numerous negative social effects of AI in general, and more specifically AI generated imagery is used a lot for spreading misinformation;
5) there are numerous negative economic effects of AI, and specifically with art, it means real human artists are being replaced by AI slop, which is of significantly lower quality than the equivalent human output. Also, instead of supporting multiple different artists, you're siphoning your money to a few billion dollar companies (this is terrible for the economy)
As a side note, if you have a business which truly cannot afford to pay any artists, there are a lot of cheaper, (sometimes free!) pre-paid art bundles that are much less morally dubious than AI. Plus, then you're not siphoning all of your cash to tech oligarchs.
I agree and whose to say your life experience isn't as valid as someone with less years but more time at just the traditional tools? I'd think either extreme could produce real art if the tools moat was reduced with AI.
I actually love MS word posters. It's a million times more authentic and enjoyable than a slop generation. If a band put up an AI poster I'd assume they lack any kind of taste which is the whole reason I'd want to listen to a band anyway.
I know this is controversial in tech spaces. But most people, particularly those in art spaces like music actually appreciate creativity, taste, effort, and personal connection. Not just ruthless efficiency creating a poster for the lowest cost and fastest time possible.
How about going without? I can’t afford an artist, either, so I don’t have art. Don’t foist slop on people because you are trying to be something that you aren’t.
I'm not saying it's worthless for yourself, it's worthless to me as a viewer. AI content is great for your own usage, but there is no point posting and distributing AI generation.
I could have generated my own content, so just send the prompt rather than the output to save everyone time.
And when the distilled knowledge/product is the result of multiple prompts, revisions, and reiterations? Shall we send all 30+ of those as well so as to reproduce each step along the way?
Exactly how I feel. There is already more art, movies, music, books, video games and more made by human beings than I can experience in my lifetime. Why should I waste any time on content generated by the word guessing machine?
I just recently used for image generation to design my balcony.
It was a great way to see design ideas imagined in place and decide what to do.
There are many cases people would hire an artist to illustrate an idea or early prototype. AI generated images make that something you can do by yourself or 10x faster than a few years ago.
Not withstanding a few code violations, it generated some good ideas we were then able to tweak. The main thing was we had no idea of what we wanted to do, but seeing a lot of possibilities overlaid over the existing non-garden got us going. We were then able to extend the theme to other parts of the yard.
100%. A picture is worth a thousand words only when it conveys something. I love to see the pictures from my family even when they are taken with no care to quality or composition but I would look at someone else’s (as in gallery/exhibitions) only when they are stunning and captured beautifully. The medium is only a channel to communicate.
Also, this can’t be real. How many publications did they train this stuff on and why are there no acknowledgment even if to say - we partnered with xyz manga house to make our model smarter at manga? Like what’s wrong with this company?
I used to have an assistant make little index-card sized agendas for gettogethers when folks were in town or I was organising a holiday or offsite. They used to be physical; now it's a cute thing I can text around so everyone knows when they should be up by (and by when, if they've slept in, they can go back to bed). AI has been good at making these. They don't need to be works of art, just cute and silly and maybe embedded with an inside joke.
I don't care how many times you write "cute," having my vacation time programmed with that level of granularity and imposed obligation sounds like the definition of "dystopian."
If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself.
Edit: I'm not an outlier here. There have even been sitcom episodes about overbearing hosts over-programming their guests' visits, going back at least to the Brady Bunch.
> If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself
Okay. I'd be confused why you didn't voice up while we were planning everything as a group, but those people absolutely exist. (Unless it's someone's, read: a best friend or my partner's, birthday. Then I'm a dictator and nobody gets a choice over or preview of anything.)
I like to have a group activity planned on most days. If we're going to drive to get in an afternoon hike in before a dinner reservation (and if I have 6+ people in town, I need a dinner reservation because no I'm not coooking every single evening), or if I've paid for a snowmobile tour or a friend is bringing out their telescope for stargazing, there are hard no-later-than departure times to either not miss the activity or be respectful of others' time.
My family used to resolve that by constantly reminding everyone the day before and morning of, followed by constantly shouting at each other in the hours and minutes preceding and–inevitably–through that deadline. I prefer the way I've found. If someone wants to fuck off from an activity, myself included, that's also perfectly fine.
(I also grew up in a family that overplanned vacations. And I've since recovered from the rebound instinct, which involves not planning anything and leaving everything to serendipity. It works gorgeously, sometimes. But a lot of other times I wonder why I didn't bother googling the cool festival one town over before hand, or regretted sleeping in through a parade.)
> There have even been sitcom episodes about overbearing hosts over-programming their guests' visits
Sure. And different groups have different strokes. When it comes to my friends and I, generally speaking, a scheduled activity every other day with dinners planned in advance (they all get hangry, every single fucking one of them) works best.
We need to flip the script. AI is trying to do marketing: add “illegal usage will lead to X” is a gateway to spark curiosity. There is this saying that censoring games for young adults makes sure that they will buy it like crazy by circumventing the restrictions because danger is cool.
There is nothing that cannot harm. Knives, cars, alcohol, drugs. A society needs to balance risks and benefits. Word can be used to do harm, email, anything - it depends on intention and its type.
Provenance is part of the work. If a roomful of monkeys banged out something that looked like anything, I'd absolutely hang it on my wall. I would not say the same for 99% of AI generated art.
The technically (in both senses) astonishing and amazing output is not far off from some of the qualities of real advertising: Staged, attention grabbing, artificially created, superficially demanded, commercially attractive qualities. These align, and lots of similarities in the functions and outcomes of these two spheres come to mind.
I see your point but reconsider: we will and need to see. Time will tell and this is simply economics: useful? Yes, no.
I started being totally indifferent after thinking about my spending habits to check for unnecessary stuff after watching world championships for niche sports. For some this is a calling for others waste. It is a numbers game then.
I tend to share your same view. But is there really a line like you describe? Maybe AI just needs to get a few iterations better and we'll all love what it generates. And how's it really any different than any Photoshop computer output from the past?
I think there's real value to be had in using this for diagrams.
Visual explanations are useful, but most people don't have the talent and/or the time to produce them.
This new model (and Nano Banana Pro before it) has tipped across the quality boundary where it actually can produce a visual explanation that moves beyond space-filling slop and helps people understand a concept.
I've never used an AI-generated image in a presentation or document before, but I'm teetering on the edge of considering it now provided it genuinely elevates the material and helps explain a concept that otherwise wouldn't be clear.
Are there any models that are specifically trained to produce diagrams as SVG? I'd much prefer that to diffusion-based raster image generation models for a few reasons:
- The usual advantages of vector graphics: resolution-independence, zoom without jagged edges, etc.
- As a consequence of the above, vector graphics (particularly SVG) can more easily be converted to useful tactile graphics for blind people.
This is the key point. In my view it's just like anything else, if AI can help humans create better work, it's a good thing.
I think what we'll find is that visual design is no longer as much of a moat for expressing concepts, branding, etc. In a way, AI-generated design opens the door for more competition on merits, not just those who can afford the top tier design firm.
While I agree with you, hacker news audience is not in the middle of the bell curve.
I get this sounds elitist - but tremendous percentage of population is happily and eagerly engaging with fake religious images, funny AI videos, horrible AI memes, etc. Trying to mention that this video of puppy is completely AI generated results in vicious defense and mansplaining of why this video is totally real (I love it when video has e.g. Sora watermarks... This does not stop the defenders).
I agree with you that human connection and artist intent is what I'm looking for in art, music, video games, etc... But gawd, lowest common denominator is and always has been SO much lower than we want to admit to ourselves.
Very few people want thoughtful analysis that contradicts their world view, very few people care about privacy or rights or future or using the right tool, very few people are interested in moral frameworks or ethical philosophy, and very few people care about real and verifiable human connection in their "content" :-/
I'm working on an edutech game. Before I would've had much less of a product because I don't have the budget to hire an artist and it would've been much less interactive but because of this I'm able to build a much more engaging experience so that's one thing. For what it's worth.
Seems good enough to generate 2D sprites. If that means a wave of pixel-art games I count it as a net win.
I dont think gamers hate AI, it is just a vocal miniority imo. What most people dislike is sloppy work, as they should, but that can happen with or without AI. The industry has been using AI for textures, voices and more for over a decade.
It’s really not. That's actually a pet peeve of mine as someone who used to spent a lot of time messing with pixel art in Aseprite.
Nobody takes the time to understand that the style of pixel art is not the same thing as actual pixel art. So you end up with these high-definition, high-resolution images that people try to pass off as pixel art, but if you zoom in even a tiny bit, you see all this terrible fringing and fraying.
That happens because the palette is way outside the bounds of what pixel art should use, where proper pixel art is generally limited to maybe 8 to 32 colors, usually.
There are plenty of ways to post-process generative images to make them look more like real pixel art (square grid alignment, palette reduction, etc.), but it does require a bit more manual finesse [1], and unfortunately most people just can’t be bothered.
Are you kidding? I think I see more vitriol for AI in gaming communities than anywhere else. To the point where steam now requires you to disclose its usage
The Human Renaissance is something I've been thinking of too and I hope it comes to pass. Of course, I feel like societally, things are gonna get worse for a lot of folks. You already see it in entire towns losing water or their water becoming polluted.
You'd think these kickbacks leaders of these towns are getting for allowing data centers to be built would go towards improving infrastructure but hah, that's unrealistic.
>Like, in terms of art, it's discarded (art is about humans)
I dunno how long this is going to hold up. In 50 years, when OpenAI has long become a memory, post-bubble burst, and a half-century of bitrot has claimed much of what was generated in this era, how valuable do you think an AI image file from 2023 - with provenance - might be, as an emblem and artifact of our current cultural moment, of those first few years when a human could tell a computer, "Hey, make this," and it did? And many of the early tools are gone; you can't use them anymore.
Consider: there will never be another DallE-2 image generation. Ever.
>In general, I think people are starting to realize that things generated without effort are not worth spending time with
Agreed mostly, BUT
I'm building tools for myself. The end goal isn't the intermediate tool, they're enabling other things. I have a suspicion that I could sell the tools, I don't particularly want to. There's a gap between "does everything I want it to" and "polished enough to justify sale", and that gap doesn't excite me.
They're definitely not generated without effort... but they are generated with 1% of the human effort they would require.
I feel very much empowered by AI to do the things I've always wanted to do. (when I mention this there's always someone who comes out effectively calling me delusional for being satisfied with something built with LLMs)
I completely disagree, this replaces art as a job. Why does human art need monetary feedback to be shared? If people require a paycheck to make art then it was never anything different than what Ai generated images are.
As for advertising being depressing - its a little late to get up on the high horse of anti-Ads for tech after 2 decades of ad based technology dominating everything. Go outside, see all those bright shiny glittery lights, those aren't society created images to embolden the spirit and dazzle the senses, those are ads.
North Korea looks weird and depressing because the don't have ads. Welcome to the west.
OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"
Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
Like... this has things that AI will seemingly always be terrible at?
At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:
- Nightmarish screaming faces on most people
- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist
- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...
It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...
We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
- Raccoon center: roughly (460, 1680)
- Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)
- Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780
It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."
Pretty mixed feelings on this. From the page at least, the images are very good. I'd find it hard to know that they're AI. Which I think is a problem. If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..
I also don't like that these things are trained on specific artist's styles without really crediting those artists (or even getting their consent). I think there's a big difference between an individual artist learning from a style or paying it homage, vs a machine just consuming it so it can create endless art in that style.
You might be onto something. I find every image unsettling. they're very good no doubt, but maybe it disturbs me because all of it is a complete copy of what someone else created. I know, I know, there is no pure invention. That's not what i mean. Humans borrow from other humans all the time. There's a humanity in that! A machine fully repurposing a human contribution as some kind of new creation, iono i'm old, it's weird and i don't like it.
Here is my regular "hard prompt" I use for testing image gen models:
"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid-splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens."
This seems like a great time to mention C2PA, a specification for positively affirming image sources. OpenAI participates in this, and if I load an image I had AI generate in a C2PA Viewer it shows ChatGPT as the source.
Bad actors can strip sources out so it's a normal image (that's why it's positive affirmation), but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.
Yeah, OpenAI has been attaching C2PA manifests to all their generated images from the very beginning. Also, based on a small evaluation that I ran, modern ML based AI generated image detectors like OmniAID[1] seem to do quite well at detecting GPT-Image-2 generated images. I use both in an on-device AI generated image detector that I built.
When NB 2 came out I actually had to increase the difficulty of the piano test - reversing the colors of all the accidentals and the naturals, and it still managed it perfectly.
You still have the studio ghibili look from the video. The issue of generating manga was the quality of characters, there’s multiple software to place your frame.
But I am hopeful. If I put in a single frame, can it carry over that style for the next images? It would be game changing if a chat could have its own art style
The improvement in Chinese text rendering is remarkable and impressive! I still found some typos in the Chinese sample pic about Wuxi though. For example the 笼 in 小笼包 was written incorrectly. And the "极小中文也清晰可读" section contains even more typos although it's still legible. Still, truly amazing progress. Vastly better than any previous image generation model by a large margin.
Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.
One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried
I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time
Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
Yea, especially when they know all that work will be completely pointless in a few years when open source / local models will be just as good and won't have any legal limitations, so people will be generating fake images of famous people like crazy with nothing stopping them
This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
And here I was proud of myself, having taught my mom and her friends how to discern real from fakes they get on WhatsApp groups. Another even more powerful tool for scammers. I'm taking a break.
IMO you're fighting the wrong battle: there'll always be a new model.
But the broader concept of fake news and the manufactured nature of media and rhetoric is much more relevant - e.g. whether or not something's AI is almost immaterial to the fact that any filmed segment does not have to be real or attributed to the correct context.
Its an old internet classic just to grab an image and put a different caption on it, relying on the fact no one can discern context or has time to fact check.
Image editing program -> different versions of the image, each with some but not all of the elements you want, on each layer -> mask out the parts you don't need/apply mask, fill with black, soft brush with white the parts you want back in. Copy flattened/merged, drop it back into the image model, keep asking for the changes. As long as each generation adds in an element you want, you can build a collage of your final image.
Works for me, but really weirdly on iOS: Copying to clipboard somehow seems to break transparency; saving to the iOS gallery does not. (And I’ve made sure to not accidentally depend on iOS’s background segmentation.)
In some cases I would agree with this, but image model releases including this one are beginning to incorporate and market the thinking step. It is not a reach at this point to expect the model to take liberties in order to deliver a faithful and accurate representation of your request. A model could still be accurate while navigating your lack of specificity.
If every single image on their blog was generated by Images 2.0 (I've no reason to believe that's not the case), then wow, I'm seriously impressed. The fidelity to text, the photorealism, the ability to show the same character in a variety of situations (e.g. the manga art) -- it's all great!
This is hilarious. Seems like kind of a random image for a model to memorize, but it could be.
There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.
So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:
Create a 8x8 contiguous grid of the Pokémon whose National Pokédex numbers correspond to the first 64 prime numbers. Include a black border between the subimages.
You MUST obey ALL the FOLLOWING rules for these subimages:
- Add a label anchored to the top left corner of the subimage with the Pokémon's National Pokédex number.
- NEVER include a `#` in the label
- This text is left-justified, white color, and Menlo font typeface
- The label fill color is black
- If the Pokémon's National Pokédex number is 1 digit, display the Pokémon in a 8-bit style
- If the Pokémon's National Pokédex number is 2 digits, display the Pokémon in a charcoal drawing style
- If the Pokémon's National Pokédex number is 3 digits, display the Pokémon in a Ukiyo-e style
The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...
I wonder if this will be decent at creating sprite frame animations. So far I've had very poor results and I've had to do the unthinkable and toil it out manually.
I had exactly the same thought! I've got a game I've been wanting to build for over a decade that I recently started working on. The art is going to be very challenging however, because I lack a lot of those skills. I am really hoping the AI tools can help with that.
Is anyone doing this already who can share information on what the best models are?
Pretty much all of the kerfuffle over AI would go away of it was accurately priced.
After 2008 and 2020 vast (10s of trillions) amounts of money has been printed (reasonably) by western gov and not eliminated from the money supply. So there are vast sums swilling about - and funding things like using massively
Computationally intensive work to help me pick a recipie for tonight.
Google and Facebook had online advertising sewn up - but AI is waaay better at answering my queries. So OpenAI wants some of that - but the cost per query must be orders of magnitude larger
So charge me, or my advertisers the correct amount. Charge me the right amount to design my logo or print an amusing cat photo.
Charge me the right cost for the AI slop on YouTube
Charge the right amount - and watch as people just realise it ain’t worth it 95% of the time.
Great technology - but price matters in an economy.
It's definitely not accidental but I'm not completely sure whether or not it is simply a "tell" or watermark or an attempt to foster brand association.
I wake up everyday, read the tech news, and usually see some step change in AI or whatever. It's wild to think I'm living through such a massive transformation in my lifetime. The future of tech is going to be so different from when I was born (1980), I guess this is how people born in 1900 felt when they got to see man land on the moon?
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
"Solved" seems a tad overstated if you scroll up to Simonw's Where's Waldo test with deformed faces plus a confabulated target when prompted for an edit to highlight the hidden character with an arrow.
It's "solved" in that we have a way forward to reduce the errors down to 0.00001% (a number I just made up). Throwing more compute/time/money at these problems seems to reduce that error number.
As someone born in 1975 I always felt until the last couple of years that I had been stuck in a long period of stagnation compared to an earlier generation. My grandmother who was born in the 1910s got to witness adoption of electricity, mass transit, radio, television, telephony, jet flights and even space exploration before I was born.
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
Chess exists solely for the sake of the humans playing it. Even if machines solved chess, people would rather play chess against a person than a machine because it is a social activity in a way. It's like playing tennis versus a person compared to tennis against a wall.
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
Each day when my AI girlfriend wakes me up and shows me the latest news, I feel: This is it! We are living in a revolution!
Never before in history did humanity have the possibility of seeing a picture of a pack of wolves! The dearth of photographs has finally been addressed!
I told my AI girlfriend that I will save money to have access to this new technology. She suggested a circular scheme where OpenAI will pay me $10,000 per year to have access to this rare resource of 21th century daguerreotype.
I am hopeful that OpenAI will potentially offer clarity on their loss-leading subscription model. I'd prefer to know the real cost of a token from OpenAI as opposed to praying the venture-funded tokens will always be this cheap.
> On the flip side, there are hundreds of ways that these tools cause genuine harm, not just to individuals but to entire systems.
Yeah, agree. I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for? Like, in terms of art, it's discarded (art is about humans), in terms of assets: sure, but people is getting tired of AI-generated images (and even if we cannot tell if an image is AI-generated, we can know if companies are using AI to generate images in general, so the appealing is decreasing). Ads? C'mon that's depressing.
What else? In general, I think people are starting to realize that things generated without effort are not worth spending time with (e.g., no one is going to read your 30-pages draft generated by AI; no one is going to review your 500 files changes PR generated by AI; no one is going to be impressed by the images you generate by AI; same goes for music and everything). I think we are gonna see a Renaissance of "human-generated" sooner rather than later. I see it already at work (colleagues writing in slack "I swear the next message is not AI generated" and the like)
> I think it's the first time I'm asking myself: Ok, so this new cool tech, what is it good for?
I feel like this is something people in the industry should be thinking about a lot, all the time. Too many social ills today are downstream of the 2000s culture of mainstream absolute technoöptimism.
Vide. Kranzberg's first law--“Technology is neither good nor bad; nor is it neutral.”
Completely unrelated, but I am curious about your keyboard layout since you mistyped ö instead of - these two symbols are side by side in the Icelandic layout, and the ö is where - in the English (US) layout. As such this is a common type-o for people who regularly switch between the Icelandic and the English (US) layout (source: I am that person). I am curious whether more layouts where that could be common.
This is also a stylistic choice that the New Yorker magazine uses for words with double vowels where you pronounce each one separately, like coöperate, reëlect, preëminent, and naïve. So possibly intentional.
Yes, this is exactly correct, and I will die on this hill. Additionally, I don't like the way a hyphenated "techno-optimism" looks and "technOOPtimism" is a bit too on-the-nose.
That makes sense[1] but it prompts the obvious question: does this style write it as typeö then?
1: Though personally I hate it, I just cannot not read those as completely different vowels (in particular ï → [i:] or the ee in need; ë → [je:] or the first e here; and ö → [ø] or the e in her)
I suspect the diaresis was intentional, in “New Yorker” style.
https://www.arrantpedantry.com/2020/03/24/umlauts-diaereses-...
I can’t design wallpapers/stickers/icons/…, but I can describe what I want to an image generation model verbally or with a source photo, and the new ones yield pretty good results.
For icons in particular, this opens up a completely new way of customizing my home screen and shortcuts.
Not necessary for the survival of society, maybe, but I enjoy this new capability.
So we get a fresh new cheap way to spread propaganda and lies and erode trust all across society while cementing power and control for a few at the top, and in return get a few measly icons (as if there weren’t literally thousands of them freely available already) and silly images for momentaneous amusement?
What a rotten exchange.
I wonder what will happen to the entire legal system. It used to be fairly difficult to create convincing photos and videos.
AI can probably fool most court judges now. Or the defense can refute legitimate evidence by saying “it’s AI / false”. How would that be refuted?
By having people also testify to authenticity and coming down like the hand of God on fakers, the same way we make sure evidence is real now.
Is that worth the cost of this technology? Both in terms of financial shenanigans and its environmental cost?
Are you asking if the 10 seconds it takes AI to generate an image is more costly to the environment than a commissioned graphics artist using a laptop for 5-6 hours, or a painter who uses physical media sourced from all over the world?
In short, yes.
A modern laptop is running almost fanless, like a 486 from the days of yore.
A single H200 pumps out 700W continuously in a data center, and you run thousands of them.
Also, don't forget the training and fine tuning runs required for the models.
Mass transportation / global logistics can be very efficient and cheap.
Before the pandemic, it was cheaper to import fresh tomatoes from half-world away rather than growing them locally in some cases. A single container of painting supplies is nothing in the grand scheme of things, esp. when compared with what data centers are consuming and emitting.
Cheaper/faster tech increases overall consumption though. Without the friction of commissioning a graphics artist to design something, a user can generate thousands of images (and iterate on those images multiple times to achieve what they want), resulting in way more images overall.
I'm not really well versed on the environmental cost, more just (neutrally) pointing out that comparing a single 10s image to a 5-6 hour commission ignores the fact that the majority of these images probably would never have existed in the first place without AI.
Also, ignoring training when talking about the environmental costs is bad faith. Without training this image would not exist, and if nobody generating images like these, the training would not happen. So we should really ask, the 10 seconds it took for inference, plus the weeks or months of high intensity compute it took to train the model.
The environmental cost is significantly overblown, especially water usage.
I work with direct liquid cooled systems. If the datacenter is working with open DLC systems (most AI datacenters in the US in fact do), there's a lot of water is being wasted, 7/24/365.
A mid-tier top-500 system (think about #250-#325) consumes about a 0.75MW of energy. AI data centers consume magnitudes more. To cool that behemoth you need to pump tons of water per minute in the inner loop.
Outer loop might be slower, but it's a lot of heated water at the end of the day.
To prevent water wastage, you can go closed loop (for both inner and outer loops), but you can't escape the heat you generate and pump to the atmosphere.
So, the environmental cost is overblown, as in Chernobyl or fallout from a nuclear bomb is overblown.
So, it's not.
It's not that it doesn't use water; it's that water is not scarce unless you live in a desert.
As a country, we use 322 billion gallons of water per day. A few million gallons for a datacenter is nothing.
The problem is you don't just use that water and give it back.
The water gets contaminated and heated, making it unsuitable for organisms to live in, or to be processed and used again.
In short, when you pump back that water to the river, you're both poisoning and cooking the river at the same time, destroying the ecosystem at the same time too.
Talk about multi-threaded destruction.
No, you're making that up. Datacenters do not poison rivers.
To reiterate, I work in a closed loop DLC datacenter.
Pipes rust, you can't stop that. That rust seeps to the water. That's inevitable. Moreover, if moss or other stuff starts to take over your pipes, you may need to inject chemicals to your outer loop to clean them.
Inner loops already use biocides and other chemicals to keep them clean.
Look how nuclear power plants fight with organism contamination in their outer cooling loops where they circulate lake/river water.
Same thing.
Depends on if you believe it will ever become cheaper. Either hardware, inspiring more efficient smaller models, or energy itself. The techno optimist believes that that is the inevitable and investable future. But on what horizon and will it get “zip drived” before then?
absolutely without a doubt it is
If that energy is used for research, maybe. If used to answer customer questions or generate Studio Ghibli knock-offs, it's not worth it, even a bit.
The issue is that the signalling makes sense when human generated work is better than AI generated. Soon AI generated work will be better across the board with the rare exception of stuff the top X% of humans put a lot of bespoke highly personalized effort into. Preferring human work will be luxury status-signalling just like it is for clothing, food, etc.
I think "better" is doing a lot of heavy lifting in this argument. Better how?
Is an AI generated photo of your app/site going to be more accurate than a screenshot? Or is an AI generated image of your product going to convey the quality of it more than a photo would?
I think Sora also showed that the novelty of generating just "content" is pretty fleeting.
I would be interested to see if any of the next round of ChatGPT advertisements use AI generated images. Because if not, they don’t even believe in their own product.
I'm probably in a weird subgroup that isn't representative of the general public, but I've found myself preferring "rough" art/logos/images/etc, basically because it signals a human put time into it. Or maybe not preferring, but at least noticing it more than the generally highly refined/polished AI artwork that I've been seeing.
There’s no reason to think people broadly want “better” writing, images, whatever. Look at the indie game scene, it’s been booming for years despite simpler graphics, lower fidelity assets, etc. Same for retro music, slam poetry, local coffee shops, ugly farmers market produce, etc.
There is a mass, bland appeal to “better” things but it’s not ubiquitously desired and there will always be people looking outside of that purely because “better” is entirely subjective and means nothing at all.
Only novel art is interesting. AI can't really do novel. It's a prediction algorithm; it imitates. You can add noise, but that mostly just makes it worse. It can be used to facilitate original stuff though.
But so many people want to make art, and it's so cheap to distribute it, that art is already commoditized. If people prefer human-created art, satisfying that preference is practically free.
AI can be novel, there is nothing in the transformer architecture which prohibits novelty, it's just that structurally it much prefers pattern-matching.
But the idea of novelty is a misnomer I think. Any random number generator can arbitrarily create a "novel" output that a human has never seen before. The issue is whether something is both novel and useful, which is hard for even humans to do consistently.
Anthropic recently changed their take-home test specifically to be more “out-of-distribution” and therefore more resistant to AI so they can assess humans.
I’m so tired of “there’s nothing preventing”, and “humans do that too”. Modern AI is just not there. It’s not like humans and has difficulties with adapting to novelty.
Whether transformers can overcome that remains to be seen, but it is not a guarantee. We’ve been dealing with these same issues for decades and AI still struggles with them.
There are lots of things that are novel to you without necessarily being novel to the universe.
The issue being, it's not an expression of anything. Merely like a random sensation, maybe some readable intent, but generic in execution, which isn't about anything even corporate art should be about. Are we going to give up on art, altogether?
Edit: One of the possible outcomes may be living in a world like in "Them" with glasses on. Since no expression has any meaning anymore, the message is just there being a signal of some kind. (Generic "BUY" + associated brand name in small print, etc.)
The goal of art isn't to be perfect or as realistic as possible. The goal of art is to express, and enjoy that unique expression.
"Artisanal art" as it were.
This is where I’m at. If you can’t be bothered to write/make it, why would I be bothered to read or review it?
Because I'm not an artist and can't afford to pay one for whatever business I have? This idea that only experts are allowed to do things is just crazy to me. A band poster doesn't have to be a labor of love artisanal thing. Were you mad when people made band posters with MS word instead of hiring a fucking typesetter? I just don't get it.
I dunno, I have some band posters that are pretty cool pieces of art that obviously had a lot of thought put into them (pre-AI era stuff). I don't think I'd hang up an AI generated band poster, even if it was cool; I'd feel weird and tacky about it.
I was hosting a Karaoke event in my town and really went out of my way to ensure my promotional poster looked nothing like AI. I really really really did not want my townfolks thinking I would use AI to design a poster.
My design rules were: No gradients; no purple; prefer muted colors; plenty of sharp corners and overlapping shapes; Use the Boba Milky font face;
Yes, but…
https://imgur.com/a/cYn68Cp
> band poster doesn't have to be a labor of love artisanal thing
Very few bands would agree with that statement.
> can't afford to pay one for whatever business I have
At small scales what "art" does your business need? If you can't afford to hire an artist (which is completely fine, I couldn't for my business!) do you really need the art or are you trying to make your "brand" look more polished than it actually is? Leverage your small scale while you can because there isn't as much of an expectation for polish.
And no, a band poster doesn't have to be a labor of love. But it also doesn't have to be some big showy art either. If I saw a small band with a clearly AI generated poster it would make me question the sources for their music as well.
I think you're misunderstanding - most people's beef with AI art isn't that it "isn't made by experts", it's that
1) it's made from copyrighted works, and the original authors receive no credit; 2) it is (typically) low-effort; 3) there are numerous negative environmental effects of the AI industry in general; 4) there are numerous negative social effects of AI in general, and more specifically AI generated imagery is used a lot for spreading misinformation; 5) there are numerous negative economic effects of AI, and specifically with art, it means real human artists are being replaced by AI slop, which is of significantly lower quality than the equivalent human output. Also, instead of supporting multiple different artists, you're siphoning your money to a few billion dollar companies (this is terrible for the economy)
As a side note, if you have a business which truly cannot afford to pay any artists, there are a lot of cheaper, (sometimes free!) pre-paid art bundles that are much less morally dubious than AI. Plus, then you're not siphoning all of your cash to tech oligarchs.
I agree and whose to say your life experience isn't as valid as someone with less years but more time at just the traditional tools? I'd think either extreme could produce real art if the tools moat was reduced with AI.
I actually love MS word posters. It's a million times more authentic and enjoyable than a slop generation. If a band put up an AI poster I'd assume they lack any kind of taste which is the whole reason I'd want to listen to a band anyway.
I know this is controversial in tech spaces. But most people, particularly those in art spaces like music actually appreciate creativity, taste, effort, and personal connection. Not just ruthless efficiency creating a poster for the lowest cost and fastest time possible.
Because I'm not an artist and can't afford to pay one for whatever business I have?
If your business can't afford to spend $5 on Fivr, it's not a business. It's not even panhandling.
How about going without? I can’t afford an artist, either, so I don’t have art. Don’t foist slop on people because you are trying to be something that you aren’t.
Nobody can be bothered to make my cat out of Lego and the size of mount Everest but if an AI did I'd sure love to see it.
Your quip is pithy but meaningless.
I'm not saying it's worthless for yourself, it's worthless to me as a viewer. AI content is great for your own usage, but there is no point posting and distributing AI generation.
I could have generated my own content, so just send the prompt rather than the output to save everyone time.
And when the distilled knowledge/product is the result of multiple prompts, revisions, and reiterations? Shall we send all 30+ of those as well so as to reproduce each step along the way?
Exactly how I feel. There is already more art, movies, music, books, video games and more made by human beings than I can experience in my lifetime. Why should I waste any time on content generated by the word guessing machine?
Here’s one example:
I just recently used for image generation to design my balcony.
It was a great way to see design ideas imagined in place and decide what to do.
There are many cases people would hire an artist to illustrate an idea or early prototype. AI generated images make that something you can do by yourself or 10x faster than a few years ago.
Did the same for my front garden.
Not withstanding a few code violations, it generated some good ideas we were then able to tweak. The main thing was we had no idea of what we wanted to do, but seeing a lot of possibilities overlaid over the existing non-garden got us going. We were then able to extend the theme to other parts of the yard.
100%. A picture is worth a thousand words only when it conveys something. I love to see the pictures from my family even when they are taken with no care to quality or composition but I would look at someone else’s (as in gallery/exhibitions) only when they are stunning and captured beautifully. The medium is only a channel to communicate.
Also, this can’t be real. How many publications did they train this stuff on and why are there no acknowledgment even if to say - we partnered with xyz manga house to make our model smarter at manga? Like what’s wrong with this company?
> What else?
I used to have an assistant make little index-card sized agendas for gettogethers when folks were in town or I was organising a holiday or offsite. They used to be physical; now it's a cute thing I can text around so everyone knows when they should be up by (and by when, if they've slept in, they can go back to bed). AI has been good at making these. They don't need to be works of art, just cute and silly and maybe embedded with an inside joke.
I don't care how many times you write "cute," having my vacation time programmed with that level of granularity and imposed obligation sounds like the definition of "dystopian."
If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself.
Edit: I'm not an outlier here. There have even been sitcom episodes about overbearing hosts over-programming their guests' visits, going back at least to the Brady Bunch.
> If I got one of your cute schedule cards while visiting you, I'd tear it up, check into a cheap motel, and spend the rest of my vacation actually enjoying myself
Okay. I'd be confused why you didn't voice up while we were planning everything as a group, but those people absolutely exist. (Unless it's someone's, read: a best friend or my partner's, birthday. Then I'm a dictator and nobody gets a choice over or preview of anything.)
I like to have a group activity planned on most days. If we're going to drive to get in an afternoon hike in before a dinner reservation (and if I have 6+ people in town, I need a dinner reservation because no I'm not coooking every single evening), or if I've paid for a snowmobile tour or a friend is bringing out their telescope for stargazing, there are hard no-later-than departure times to either not miss the activity or be respectful of others' time.
My family used to resolve that by constantly reminding everyone the day before and morning of, followed by constantly shouting at each other in the hours and minutes preceding and–inevitably–through that deadline. I prefer the way I've found. If someone wants to fuck off from an activity, myself included, that's also perfectly fine.
(I also grew up in a family that overplanned vacations. And I've since recovered from the rebound instinct, which involves not planning anything and leaving everything to serendipity. It works gorgeously, sometimes. But a lot of other times I wonder why I didn't bother googling the cool festival one town over before hand, or regretted sleeping in through a parade.)
> There have even been sitcom episodes about overbearing hosts over-programming their guests' visits
Sure. And different groups have different strokes. When it comes to my friends and I, generally speaking, a scheduled activity every other day with dinners planned in advance (they all get hangry, every single fucking one of them) works best.
We need to flip the script. AI is trying to do marketing: add “illegal usage will lead to X” is a gateway to spark curiosity. There is this saying that censoring games for young adults makes sure that they will buy it like crazy by circumventing the restrictions because danger is cool.
There is nothing that cannot harm. Knives, cars, alcohol, drugs. A society needs to balance risks and benefits. Word can be used to do harm, email, anything - it depends on intention and its type.
> Like, in terms of art, it's discarded (art is about humans)
If a work of art is good, then it's good. It doesn't matter if it came from a human, a neanderthal, AI, or monkeys randomly typing.
Provenance is part of the work. If a roomful of monkeys banged out something that looked like anything, I'd absolutely hang it on my wall. I would not say the same for 99% of AI generated art.
Whether art is considered good is in practice highly contextual. One of those contexts is who (what) made it.
The technically (in both senses) astonishing and amazing output is not far off from some of the qualities of real advertising: Staged, attention grabbing, artificially created, superficially demanded, commercially attractive qualities. These align, and lots of similarities in the functions and outcomes of these two spheres come to mind.
I see your point but reconsider: we will and need to see. Time will tell and this is simply economics: useful? Yes, no.
I started being totally indifferent after thinking about my spending habits to check for unnecessary stuff after watching world championships for niche sports. For some this is a calling for others waste. It is a numbers game then.
I tend to share your same view. But is there really a line like you describe? Maybe AI just needs to get a few iterations better and we'll all love what it generates. And how's it really any different than any Photoshop computer output from the past?
I think there's real value to be had in using this for diagrams.
Visual explanations are useful, but most people don't have the talent and/or the time to produce them.
This new model (and Nano Banana Pro before it) has tipped across the quality boundary where it actually can produce a visual explanation that moves beyond space-filling slop and helps people understand a concept.
I've never used an AI-generated image in a presentation or document before, but I'm teetering on the edge of considering it now provided it genuinely elevates the material and helps explain a concept that otherwise wouldn't be clear.
Are there any models that are specifically trained to produce diagrams as SVG? I'd much prefer that to diffusion-based raster image generation models for a few reasons:
- The usual advantages of vector graphics: resolution-independence, zoom without jagged edges, etc.
- As a consequence of the above, vector graphics (particularly SVG) can more easily be converted to useful tactile graphics for blind people.
- Vector graphics can more practically be edited.
This is the key point. In my view it's just like anything else, if AI can help humans create better work, it's a good thing.
I think what we'll find is that visual design is no longer as much of a moat for expressing concepts, branding, etc. In a way, AI-generated design opens the door for more competition on merits, not just those who can afford the top tier design firm.
yeah I'm not sure I'm in agreement that we can hand-wave assets and ads as entire classes of valuable content
While I agree with you, hacker news audience is not in the middle of the bell curve.
I get this sounds elitist - but tremendous percentage of population is happily and eagerly engaging with fake religious images, funny AI videos, horrible AI memes, etc. Trying to mention that this video of puppy is completely AI generated results in vicious defense and mansplaining of why this video is totally real (I love it when video has e.g. Sora watermarks... This does not stop the defenders).
I agree with you that human connection and artist intent is what I'm looking for in art, music, video games, etc... But gawd, lowest common denominator is and always has been SO much lower than we want to admit to ourselves.
Very few people want thoughtful analysis that contradicts their world view, very few people care about privacy or rights or future or using the right tool, very few people are interested in moral frameworks or ethical philosophy, and very few people care about real and verifiable human connection in their "content" :-/
I'm working on an edutech game. Before I would've had much less of a product because I don't have the budget to hire an artist and it would've been much less interactive but because of this I'm able to build a much more engaging experience so that's one thing. For what it's worth.
Seems good enough to generate 2D sprites. If that means a wave of pixel-art games I count it as a net win.
I dont think gamers hate AI, it is just a vocal miniority imo. What most people dislike is sloppy work, as they should, but that can happen with or without AI. The industry has been using AI for textures, voices and more for over a decade.
> Seems good enough to generate 2D sprites.
It’s really not. That's actually a pet peeve of mine as someone who used to spent a lot of time messing with pixel art in Aseprite.
Nobody takes the time to understand that the style of pixel art is not the same thing as actual pixel art. So you end up with these high-definition, high-resolution images that people try to pass off as pixel art, but if you zoom in even a tiny bit, you see all this terrible fringing and fraying.
That happens because the palette is way outside the bounds of what pixel art should use, where proper pixel art is generally limited to maybe 8 to 32 colors, usually.
There are plenty of ways to post-process generative images to make them look more like real pixel art (square grid alignment, palette reduction, etc.), but it does require a bit more manual finesse [1], and unfortunately most people just can’t be bothered.
[1] - https://github.com/jenissimo/unfake.js
There are already more games being released on Steam than anyone can keep up with, I'm not sure how adding another "wave" on top of it helps.
AI for textures for over a decade? What AI?
Efros–Leung, PatchMatch? Nearest neighbours was "AI" before difusion models.
Are you kidding? I think I see more vitriol for AI in gaming communities than anywhere else. To the point where steam now requires you to disclose its usage
Crimson Desert failed to disclose on release and (almost) nobody cared, gamers kept buying it.
The Human Renaissance is something I've been thinking of too and I hope it comes to pass. Of course, I feel like societally, things are gonna get worse for a lot of folks. You already see it in entire towns losing water or their water becoming polluted.
You'd think these kickbacks leaders of these towns are getting for allowing data centers to be built would go towards improving infrastructure but hah, that's unrealistic.
WTF is that unrealistic? SMH
>You already see it in entire towns losing water or their water becoming polluted
Do you have any references for such cases? I have seen talk of such thing at risk, but I am unaware of any specific instances of it occuring
My only actual use of image or video AI tools is self-entertainment. I like to give it prompts and see the results it gives me.
That's it. I can't think of a single actual use case outside of this that isn't deliberately manipulative and harmful.
>Like, in terms of art, it's discarded (art is about humans)
I dunno how long this is going to hold up. In 50 years, when OpenAI has long become a memory, post-bubble burst, and a half-century of bitrot has claimed much of what was generated in this era, how valuable do you think an AI image file from 2023 - with provenance - might be, as an emblem and artifact of our current cultural moment, of those first few years when a human could tell a computer, "Hey, make this," and it did? And many of the early tools are gone; you can't use them anymore.
Consider: there will never be another DallE-2 image generation. Ever.
>In general, I think people are starting to realize that things generated without effort are not worth spending time with
Agreed mostly, BUT
I'm building tools for myself. The end goal isn't the intermediate tool, they're enabling other things. I have a suspicion that I could sell the tools, I don't particularly want to. There's a gap between "does everything I want it to" and "polished enough to justify sale", and that gap doesn't excite me.
They're definitely not generated without effort... but they are generated with 1% of the human effort they would require.
I feel very much empowered by AI to do the things I've always wanted to do. (when I mention this there's always someone who comes out effectively calling me delusional for being satisfied with something built with LLMs)
Porn and memes. Obviously. This is all that Stable Diffusion has been used for since it was released.
I completely disagree, this replaces art as a job. Why does human art need monetary feedback to be shared? If people require a paycheck to make art then it was never anything different than what Ai generated images are.
As for advertising being depressing - its a little late to get up on the high horse of anti-Ads for tech after 2 decades of ad based technology dominating everything. Go outside, see all those bright shiny glittery lights, those aren't society created images to embolden the spirit and dazzle the senses, those are ads.
North Korea looks weird and depressing because the don't have ads. Welcome to the west.
AI loopidity rearing it's head. Just send the bullet points that we all want anyway, right?! Stop sending globs of text and other generated content!
I've been trying out the new model like this:
Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
Like... this has things that AI will seemingly always be terrible at?
At some point the level of detail is utter garbo and always will be. An artist who was thoughtful could have some mistakes but someone who put that much time into a drawing wouldn't have:
- Nightmarish screaming faces on most people
- A sign that points seemingly both directions, or the incorrect one for a lake and a first AID tent that doesn't exist
- A dog in bottom left and near lake which looks like some sort of fuzzy monstrosity...
It looks SO impressive before you try to take in any detail. The hand selected images for the preview have the same shit. The view of musculature has a sternocleidomastoid with no clavicle attachment. The periodic table seems good until you take a look at the metals...
We're reconfiguring all of our RAM & GPUs and wasting so much water and electricity for crappier where's Waldos??
I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!I think that image cost 40 cents.
A startling number of people either have no arms, one arm, a half of an arm, or a shrunken arm; how odd!
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
Which is correct!We would need a larger sample size than just myself, but the raccoon was in the very first spot I looked. Found it literally immediately, as if that's where my eyes naturally gravitated to first. Hopefully that's just luck and not an indictment of the image-creating ability, as if there is some element missing from this "Where's Waldo" image, that would normally make Waldo hard to find.
I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
simonw posted 2 different images: make sure to look at the second one.
Yeah, I noticed that just now, but too late to delete the comment :p
You had a meta problem, and three, in total: find the raccoon, find the umbrella, find the right link in the comments.
The faces...that's nice that it turned a kid's book into an abomination
I tried it on the ChatGPT web UI and it also worked, although the ham radio looks like a handbag to me.
https://postimg.cc/wyxgCgNY
The people in this image remind me of early this person does not exist, in the best way
fair point, also "this raccoon does not exist"
I found it on the 2nd image! On the 1st one not yet...
> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure
I see an opportunity for a new AI test!
There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."
(I don't think it's right).
I tried
> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
and got this. I'm not sure I know what a ham radio looks like though.
https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
Also, the racoon it circled isn't in the original.
I love how perfectly this captures the difficulties of using generative AI for detection tasks.
Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.
That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...
haha took me a while to notice that one of the buildings is labelled 'Ham radio'
The second 4K image definitely has a raccoon on the left there! Nice.
Damn. There’s a fun game app to make here ^^
I see the raccoon
Pretty mixed feelings on this. From the page at least, the images are very good. I'd find it hard to know that they're AI. Which I think is a problem. If we had a functioning congress, I wonder if we might end up with legislation that these things need to be watermarked or otherwise made identifiable as AI generated..
I also don't like that these things are trained on specific artist's styles without really crediting those artists (or even getting their consent). I think there's a big difference between an individual artist learning from a style or paying it homage, vs a machine just consuming it so it can create endless art in that style.
You might be onto something. I find every image unsettling. they're very good no doubt, but maybe it disturbs me because all of it is a complete copy of what someone else created. I know, I know, there is no pure invention. That's not what i mean. Humans borrow from other humans all the time. There's a humanity in that! A machine fully repurposing a human contribution as some kind of new creation, iono i'm old, it's weird and i don't like it.
Maybe i'm just bloviating also.
Here is my regular "hard prompt" I use for testing image gen models:
"A macro close-up photograph of an old watchmaker's hands carefully replacing a tiny gear inside a vintage pocket watch. The watch mechanism is partially submerged in a shallow dish of clear water, causing visible refraction and light caustics across the brass gears. A single drop of water is falling from a pair of steel tweezers, captured mid-splash on the water's surface. Reflect the watchmaker's face, slightly distorted, in the curved glass of the watch face. Sharp focus throughout, natural window lighting from the left, shot on 100mm macro lens."
Last time I ran the test with Nano Banana 2 (first run): https://s.h4x.club/eDuOzPDd
Images 2 using Simons method he mentioned (first run): https://s.h4x.club/qGuWZveR
Ran a bunch both on the .com and via the api, none of them are nearly as good as Nano Banana.
This seems like a great time to mention C2PA, a specification for positively affirming image sources. OpenAI participates in this, and if I load an image I had AI generate in a C2PA Viewer it shows ChatGPT as the source.
Bad actors can strip sources out so it's a normal image (that's why it's positive affirmation), but eventually we should start flagging images with no source attribution as dangerous the way we flag non-https.
Learn more at https://c2pa.org
Yeah, OpenAI has been attaching C2PA manifests to all their generated images from the very beginning. Also, based on a small evaluation that I ran, modern ML based AI generated image detectors like OmniAID[1] seem to do quite well at detecting GPT-Image-2 generated images. I use both in an on-device AI generated image detector that I built.
[1]: https://arxiv.org/abs/2511.08423
This time it passed the piano keyboard test:
https://chatgpt.com/s/m_69e7ffafbb048191b96f2c93758e3e40
But it screwed up when attempting to label middle C:
https://chatgpt.com/s/m_69e8008ef62c8191993932efc8979e1e
Edit: it did fix it when asked.
When NB 2 came out I actually had to increase the difficulty of the piano test - reversing the colors of all the accidentals and the naturals, and it still managed it perfectly.
https://mordenstar.com/other/nb-pro-2-tests
> you can make your own mangas
No you can’t.
You still have the studio ghibili look from the video. The issue of generating manga was the quality of characters, there’s multiple software to place your frame.
But I am hopeful. If I put in a single frame, can it carry over that style for the next images? It would be game changing if a chat could have its own art style
The improvement in Chinese text rendering is remarkable and impressive! I still found some typos in the Chinese sample pic about Wuxi though. For example the 笼 in 小笼包 was written incorrectly. And the "极小中文也清晰可读" section contains even more typos although it's still legible. Still, truly amazing progress. Vastly better than any previous image generation model by a large margin.
Been using the model for a few hours now. I'm actually reall impressed with it. This is the first time i've found value in an image model for stuff I actually do. I've been using it to build powerpoint slides, and mockups. It's CRAZY good at that.
In the next round of ChatGPT advertisements, if they don’t use AI generated images, then that means they don’t believe in their own product right?
One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried
I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
Are you using Google Gemini directly? I've found the Vertex API seems to be significantly less strict.
How does it determine they are well known and not just similar looking?
Gemini often rejects photos of random people (even ones it generated itself) because it thinks they look too similar to some well known person.
I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time
Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
It's not super deterministic but it didn't fail once on my attempts. See: https://imgur.com/a/james-acaster-cold-lasagne-1R7fpzQ
Very interesting. It fails every single time for me. I'm in Germany, maybe Google is stricter here?
See https://imgur.com/a/77BRDQv
That makes sense to me. I just Googled around like a fool and got here https://en.wikipedia.org/wiki/Personality_rights#Germany
It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
Yea, especially when they know all that work will be completely pointless in a few years when open source / local models will be just as good and won't have any legal limitations, so people will be generating fake images of famous people like crazy with nothing stopping them
What if you change the prompt to tell it specifically its not a famous person? Or try it without text?
This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
> I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
Without question.
AI will be indistinguishable from having a team. Communicating clearly has always and will always mattered.
This, however, is even stronger. Because you can program and use logic in your communications.
We're going to collectively develop absolutely wild command over instruction as a society. That's the skill to have.
On the other hand LLMs are getting very good at understanding poorly constructed instructions as well.
So being able to express oneself clearly in a structured way may not be such an edge.
HN submission for a direct link to the product announcement which for some reason is being penalized by the HN algorithm: https://news.ycombinator.com/item?id=47853000
And here I was proud of myself, having taught my mom and her friends how to discern real from fakes they get on WhatsApp groups. Another even more powerful tool for scammers. I'm taking a break.
IMO you're fighting the wrong battle: there'll always be a new model.
But the broader concept of fake news and the manufactured nature of media and rhetoric is much more relevant - e.g. whether or not something's AI is almost immaterial to the fact that any filmed segment does not have to be real or attributed to the correct context.
Its an old internet classic just to grab an image and put a different caption on it, relying on the fact no one can discern context or has time to fact check.
Are camera manufacturers working on signed images? That seems like the only way our trust in any digital media doesn't collapse entirely.
the guys presenting are probably all like 25x smarter than I am but good god, literally 0 on screen presence or personality.
I liked it that way, felt more authentic to see the noobs
That's a trained skill, and they presumably have focused on other skills.
Yeah, skills to make them a cool 10mn a year
eh, i don't think personalities are trained. on screen presence for sure, but you'd see right through it IRL.
I think its endearing
didn't think that sam guy was that bad
No mention of modifying existing images, which is more important than anything they mentioned.
I think we all know the feeling of getting an image that is ok, but needs a few modifications, and being absolutely unable to get the changes made.
It either keeps coming up with the same image, or gives you a completely new take on the image with fresh problems.
Anyone know if modification of existing images is any better?
Anything better that OpenAI?
Image editing program -> different versions of the image, each with some but not all of the elements you want, on each layer -> mask out the parts you don't need/apply mask, fill with black, soft brush with white the parts you want back in. Copy flattened/merged, drop it back into the image model, keep asking for the changes. As long as each generation adds in an element you want, you can build a collage of your final image.
There was an Edit button in one of the images in the livestream
Oh wow, scrolling through the page on mobile makes me dizzy
I caught the last minute of this—was it just ChatGPT Images 2.0?
It appears so!
yes
Why do all of the cartoons still look like that? Genuinely asking.
Can it generate transparent PNGs yet?
Previous gpt image models could (when generating, not editing) but gpt-image-2 can't.
Noticed it earlier while updating my playground to support it
https://github.com/alasano/gpt-image-playground
Works for me, but really weirdly on iOS: Copying to clipboard somehow seems to break transparency; saving to the iOS gallery does not. (And I’ve made sure to not accidentally depend on iOS’s background segmentation.)
My test for image models is asking it to create an image showing chess openings. Both this model and Banana pro are so bad at it.
While the image looks nice, the actual details are always wrong, such as showing pawns in wrong locations, missing pawns, .. etc.
Try it yourself with this prompt: Create a poster to show opening game for Queen's Gambit to teach kids to play chess.
It almost nailed it for me (two squares have both white and black color). All pieces and the position look correct.
What move? Who's turn is it? Declined or accepted? Garbage in, garbage out.
In some cases I would agree with this, but image model releases including this one are beginning to incorporate and market the thinking step. It is not a reach at this point to expect the model to take liberties in order to deliver a faithful and accurate representation of your request. A model could still be accurate while navigating your lack of specificity.
What do you mean? Parent clearly describes the Queen's Gambit. 1.d4 d5 2.c4 There is no room for ambiguity here.
200+ points in Arena.ai , that's incredible. They are cleaning house with this model
point delta (from 2nd) not total
https://www.youtube.com/watch?v=Adsaiyr7Nv8
Someone remind me again why this is a good idea to be able to create perfect fake images?
If every single image on their blog was generated by Images 2.0 (I've no reason to believe that's not the case), then wow, I'm seriously impressed. The fidelity to text, the photorealism, the ability to show the same character in a variety of situations (e.g. the manga art) -- it's all great!
One of the images in the blog (https://images.ctfassets.net/kftzwdyauwt9/4d5dizAOajLfAXkGZ7...) is a carbon copy of an image from an article posted Mar 27, 2026 with credits given to an individual: https://www.cornellsun.com/article/2026/03/cornell-accepts-5...
Was this an oversight? Or did their new image generation model generate an image that was essentially a copy of an existing image?
This is hilarious. Seems like kind of a random image for a model to memorize, but it could be.
There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
Regardless, great find.
That has to be the wrong stock image included or something, bloody hell.
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.Or the image was generated with AI in the first place and a test for Images 2.0
Well, it's on web archive. So unless they got their hands on it almost a month early or escaped their light cone it wasn't.
Haha! That would really take the cake. If it is, congratulations to them! I could never have known.
Given the recency of that image, it is unlikely it is in the training data and therefore I would go with oversight.
So during my Nano Banana Pro experiments I wrote a very fun prompt that tests the ability for these image generation models to follow heuristics, but still requires domain knowledge and/or use of the search tool:
The NBP result is here, which got the numbers, corresponding Pokemon, and styles correct, with the main point of contention being that the style application is lazy and that the images may be plagiarized: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...Running that same prompt through gpt-2-image high gave an...interesting contrast: https://cdn.bsky.app/img/feed_fullsize/plain/did:plc:oxaerni...
It did more inventive styles for the images that appear to be original, but:
- The style logic is by row, not raw numbers and are therefore wrong
- Several of the Pokemon are flat-out wrong
- Number font is wrong
- Bottom isn't square for some reason
Odd results.
Why is it all so asian?
I wonder if this will be decent at creating sprite frame animations. So far I've had very poor results and I've had to do the unthinkable and toil it out manually.
I had exactly the same thought! I've got a game I've been wanting to build for over a decade that I recently started working on. The art is going to be very challenging however, because I lack a lot of those skills. I am really hoping the AI tools can help with that.
Is anyone doing this already who can share information on what the best models are?
Use the imagegen skill in codex and ask it to create sprites. It works really well.
Thank you!
It's still bad.
It stands out to me that this page itself is wonderful to go through (the telling of the product through model generated images).
Pretty much all of the kerfuffle over AI would go away of it was accurately priced.
After 2008 and 2020 vast (10s of trillions) amounts of money has been printed (reasonably) by western gov and not eliminated from the money supply. So there are vast sums swilling about - and funding things like using massively Computationally intensive work to help me pick a recipie for tonight.
Google and Facebook had online advertising sewn up - but AI is waaay better at answering my queries. So OpenAI wants some of that - but the cost per query must be orders of magnitude larger
So charge me, or my advertisers the correct amount. Charge me the right amount to design my logo or print an amusing cat photo.
Charge me the right cost for the AI slop on YouTube
Charge the right amount - and watch as people just realise it ain’t worth it 95% of the time.
Great technology - but price matters in an economy.
Yay, let's burn the planet computing more slopium..
It seems to still have this gpt image color that you can just feel. The slight sepia and softness.
I was just wondering about that. Did they embrace it as a “signature look”? it cant be accidental, right?
It's definitely not accidental but I'm not completely sure whether or not it is simply a "tell" or watermark or an attempt to foster brand association.
I would love to see prompt examples that created the images on the announcement page.
You can by changing the view before the gallery
It definitely lost the characteristic slop look.
there's something funny going on with the live stream audio
Can it generate anything high resolution at increased cost and time? Or is it always restricted?
great obfuscation idea - hidden message on a grain of rice
for video game assets this is massive.
but in general though - will people believe in anything photographic ?
imagine dating apps, photographic evidence.
I'm guessing we're gonna reach a point where - you fuck up things purposely to leave a human mark.
> but in general though - will people believe in anything photographic ?
Hopefully film makes a come back.
lol at the fake handwritten homework assignment. Know your customer!
played around with it; still dumb.
https://openai.com/index/introducing-chatgpt-images-2-0/
Thanks, all displayed images look horrible and artificial. This will fail like Sora.
Hard disagree on this, I was coming here to comment that this is the first time I really can't tell that some of the photos are AI generated.
Your single other comment is simplistic hyperbole as well, so this is presumably a bot account.
I felt the same, particularly with the diagrams / magazines anyway.
I don't think it'll fail like Sora though. gpt-image-1.5 didn't fail.
Denial is real…
This is so stupid. As a free OSS tool it’s amazing. Paying money for this is fucking stupid. How blind are we all to now before this tech?
No gpt-5.5
Thursday
Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
I wake up everyday, read the tech news, and usually see some step change in AI or whatever. It's wild to think I'm living through such a massive transformation in my lifetime. The future of tech is going to be so different from when I was born (1980), I guess this is how people born in 1900 felt when they got to see man land on the moon?
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
"Solved" seems a tad overstated if you scroll up to Simonw's Where's Waldo test with deformed faces plus a confabulated target when prompted for an edit to highlight the hidden character with an arrow.
It's "solved" in that we have a way forward to reduce the errors down to 0.00001% (a number I just made up). Throwing more compute/time/money at these problems seems to reduce that error number.
As someone born in 1975 I always felt until the last couple of years that I had been stuck in a long period of stagnation compared to an earlier generation. My grandmother who was born in the 1910s got to witness adoption of electricity, mass transit, radio, television, telephony, jet flights and even space exploration before I was born.
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
You will likely witness strongly superhuman AI, which dwarfs any changes your grandmother saw.
Chess exists solely for the sake of the humans playing it. Even if machines solved chess, people would rather play chess against a person than a machine because it is a social activity in a way. It's like playing tennis versus a person compared to tennis against a wall.
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
Well, for some of these images for the first time I can't tell that they are AI generated
Suggest renaming this to "OpenAI Livestream: ChatGPT Images 2.0"
(We've since merged the threads and moved the livestream link to the toptext)
or "How we make money with your images 2.0".
In 5 years and 3 months between DALL-E and Images 2.0 we've managed to progress from exuberant excitement to jaded indifference.
Who's 'we'? Speak for yourself!
Because we are all seeing the harm these tools are being used for.
It's just another step into hell.
Can it generate Chibi figures to mask the oligarchy's true intentions on Twitter and make them more relatable?
Image generation? Hmm, would be cool if OpenAI also made a video-generation model someday..
If only there was a social network with solely AI generated videos, I would pay literal money for it...
Oh no.
Each day when my AI girlfriend wakes me up and shows me the latest news, I feel: This is it! We are living in a revolution!
Never before in history did humanity have the possibility of seeing a picture of a pack of wolves! The dearth of photographs has finally been addressed!
I told my AI girlfriend that I will save money to have access to this new technology. She suggested a circular scheme where OpenAI will pay me $10,000 per year to have access to this rare resource of 21th century daguerreotype.
I am hopeful that OpenAI will potentially offer clarity on their loss-leading subscription model. I'd prefer to know the real cost of a token from OpenAI as opposed to praying the venture-funded tokens will always be this cheap.