Right? Came to the comments specifically for this, but am confused by people's responses. With prompt adherence this bad, is it worth the 2 cents you spent on it? I don't see how it's even useful for deciding if you want to use the ultra version, or for anything else really.... Maybe if you want to redo it in Photoshop? But at that point, breaking out the old Wacom tablet and making a composite image would probably be just as time intensive, but with much higher image quality (and none of the tale tell signs of AIgen)
> Imagen 4 Ultra: When your creative vision demands the highest level of detail and strict adherence to your prompts, Imagen 4 Ultra delivers highly-aligned results.
It seems that you may need the "Ultra" version if you want strict prompt adherence.
It's an interesting strategy. Personally, I notice that most of the times I actually don't need strict prompt adherence for image generation. If it looks nice, I'll accept it. If it doesn't, I'll click generate again. For creativity task, following the prompt too strictly might not be the outcome the users want.
I've found this is an interesting balance with Copilot specifically. Like, on the one hand I'm glad it aims for the bare minimum and doesn't try to refactor my whole codebase on every shot... at the same time, there's certain obvious things where I wish it was able to think a bit bigger picture, or even engage me interactively, like "hey, I can do a self-contained implementation here, but it's a bit gross; it looks like adding dependency X to the project keeps this a one liner— which way should it go?"
I’ve had good experience with iterative prompting when generating images with Gemini (idk which model — it’s whatever we get with our enterprise subscription at work, presumably the latest.) It’s noticeably better than ChatGPT at incorporating its previous image attempt into my instructions to generate the next iteration.
In the little experimentation I did with AI image generation, it seems more a game of trying multiple times until you get something that actually looks right, so I wonder how many attempts they did.
To the left of the "detailed spaceship" I think I see a distortion pattern reminiscent of a cloaked Klingon bird of prey moving to the right. Or I'm just hallucinating patterns in nebular noise.
I guess it's kinda nicely genuine that the "four panel comic strip" has some errors in it (misunderstanding caption + cat high-fiving itself in the bonus fifth panel)
1. The text ”Imagen 4 is now generally available!” is still spoken, not a caption.
2. ”low latency” -> ”low-laten”
(3. Has that ugly gpt-image-1 trademark yellow filter requiring work in post to avoid.)
I didn’t bring up the ”retro comic look” thing. I certainly think it’s an issue with Imagen 4’s version. It doesn’t look very old school at all. But I can’t judge the OpenAI one either on that, I’m no comic book expert, so I just skipped that one.
> I didn’t bring up the ”retro comic look” thing. (…) I’m no comic book expert, so I just skipped that one.
I’m no Scott McCloud, but the OpenAI version definitely does a better job with the retro style. The yellow filter you criticised actually helps to sell the illusion. The Imagen version utterly fails in the retro area, that style is very much modern.
But there are other important flaws in the OpenAI version. The fourth panel has a different cat (the head shape and stripes are wrong) and it bleeds into the previous panel. Technically that could be a stylistic choice, except that the floor/table is inconsistent, making it clear it was a mistake.
Clicking on "Read the documentation" leads to a page that documents nothing about the latest Imagen models and only provides examples using Gemini 2.0 Flash.
I am currently building an AI product which relies on Imagen 3 to generate a lot of photorealistic, cinematic or HDR images. I tried Imagen 4 during preview, but results were too "cartoonish". Did anyone else have the same experience?
>The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
>The model may stop generating partway through. Try again or try a different prompt.
I hate that they always announce their image models months before they make them available. They should just announce them later. OpenAI does this much better, with a few days delay at most.
The comments here are priceless. In less than five years time we have gone from "That's impossible" to "Meh, it doesn't solve P=NP if prompted.".
For those commenting in the latter category, it might be worthwhile to read a bit about the underlying technology and share your insights on why it does not deliver.
this is false and the two things are not correlated.
if you followed news during the GAN cycle you could extrapolate that deep NN could do this type of things. it is really cool that this things happened so fast, but we are talking about companies that have the money to deploy thousands of cars around the globe to collect data, so they absolutely know how to gather data
The webcomics is awful. It feels off, the characters look very fake, unsettling in the way they communicate. The prompt is shown bellow the image, but for me the result looks closer to a prompt "Create lifeless characters reciting marketing slop. They must fake an over exaggerated excitement but it should be clear they don't believe in what they're saying and have no souls".
Also, the prompt specifically ask "Panel 4 should show the cat and dog high-fiving" but the cat is high-fiving ... the cat. Personally I find this hallucinated plot twist good, it makes the ending a bit better. Although technically this is demonstrating a failure of the tool to follow the instructions from the prompt. Interesting choice of example for an official announcement.
The way it totally disregards the many explicit instructions given in the "four panel" comic strip.
Hopefully it's better than midjourney at least. Ignoring key parts of the prompt seems to be a feature.
Right? Came to the comments specifically for this, but am confused by people's responses. With prompt adherence this bad, is it worth the 2 cents you spent on it? I don't see how it's even useful for deciding if you want to use the ultra version, or for anything else really.... Maybe if you want to redo it in Photoshop? But at that point, breaking out the old Wacom tablet and making a composite image would probably be just as time intensive, but with much higher image quality (and none of the tale tell signs of AIgen)
> Imagen 4 Ultra: When your creative vision demands the highest level of detail and strict adherence to your prompts, Imagen 4 Ultra delivers highly-aligned results.
It seems that you may need the "Ultra" version if you want strict prompt adherence.
It's an interesting strategy. Personally, I notice that most of the times I actually don't need strict prompt adherence for image generation. If it looks nice, I'll accept it. If it doesn't, I'll click generate again. For creativity task, following the prompt too strictly might not be the outcome the users want.
I've found this is an interesting balance with Copilot specifically. Like, on the one hand I'm glad it aims for the bare minimum and doesn't try to refactor my whole codebase on every shot... at the same time, there's certain obvious things where I wish it was able to think a bit bigger picture, or even engage me interactively, like "hey, I can do a self-contained implementation here, but it's a bit gross; it looks like adding dependency X to the project keeps this a one liner— which way should it go?"
I’ve had good experience with iterative prompting when generating images with Gemini (idk which model — it’s whatever we get with our enterprise subscription at work, presumably the latest.) It’s noticeably better than ChatGPT at incorporating its previous image attempt into my instructions to generate the next iteration.
In the little experimentation I did with AI image generation, it seems more a game of trying multiple times until you get something that actually looks right, so I wonder how many attempts they did.
Same for the poster. Asks for the ship to be going towards the right, and it's clearly doing the opposite
The ship is reminiscent of Galactica's oldschool vipers. Different, but very similar overall structure.
As seen from the AI's perspective.
To the left of the "detailed spaceship" I think I see a distortion pattern reminiscent of a cloaked Klingon bird of prey moving to the right. Or I'm just hallucinating patterns in nebular noise.
Though that was only Imagen 4 Fast, not Imagen 4 or Imagen 4 Ultra.
I was going to nitpick the missing apostrophe in movie posters caption ("STARFALLS REVENGE") but its missing from the prompt, too.
> its
Muphry's Law strikes again.
I guess it's kinda nicely genuine that the "four panel comic strip" has some errors in it (misunderstanding caption + cat high-fiving itself in the bonus fifth panel)
I was just thinking that. It has many, many errors.
1. Not seen browsing ”ai.dev”.
2. The text ”Imagen 4 is now generally available!” is spoken, not a comic caption.
3. Invalid second panel.
4. Hallucinates ”Meet Imagen 4 fast!”
5. Hallucinates ”It offers low..” etc. (this is the second part of a single sentence said by the cat)
6. Hallucinates ”You can export images in 2K!” (this sentence is not asked for)
7. Doesn’t have the cat and the dog in the fourth panel.
—
Here’s the gpt-image-1 counterpart with the issues I could find:
https://chatgpt.com/share/689f7e4b-01e4-8011-8997-0f37edf8c2...
1. The text ”Imagen 4 is now generally available!” is still spoken, not a caption.
2. ”low latency” -> ”low-laten”
(3. Has that ugly gpt-image-1 trademark yellow filter requiring work in post to avoid.)
I didn’t bring up the ”retro comic look” thing. I certainly think it’s an issue with Imagen 4’s version. It doesn’t look very old school at all. But I can’t judge the OpenAI one either on that, I’m no comic book expert, so I just skipped that one.
I got this result with the basic copilot app
https://i.imgur.com/kSuqCYg.jpeg
The cat also has more fingers on one hand than the other. It's a small, inconsequential thing but it always draws my eye in generated images.
> I didn’t bring up the ”retro comic look” thing. (…) I’m no comic book expert, so I just skipped that one.
I’m no Scott McCloud, but the OpenAI version definitely does a better job with the retro style. The yellow filter you criticised actually helps to sell the illusion. The Imagen version utterly fails in the retro area, that style is very much modern.
But there are other important flaws in the OpenAI version. The fourth panel has a different cat (the head shape and stripes are wrong) and it bleeds into the previous panel. Technically that could be a stylistic choice, except that the floor/table is inconsistent, making it clear it was a mistake.
Clicking on "Read the documentation" leads to a page that documents nothing about the latest Imagen models and only provides examples using Gemini 2.0 Flash.
Classic Google
I asked basically copilot the same and got a much better result lol
https://i.imgur.com/kSuqCYg.jpeg
Interesting how Imagen doesn't suffer this yellow tint effect.
I assume that's from the retro word in the prompt
Looks so much better than the yellow tinted chatgpt output in my eyes
After manually white balancing to remove the tint, I find GPT-Image-1 (the model used in ChatGPT) to be better.
I am currently building an AI product which relies on Imagen 3 to generate a lot of photorealistic, cinematic or HDR images. I tried Imagen 4 during preview, but results were too "cartoonish". Did anyone else have the same experience?
Yes, it seems very reluctant to generate anything that could be mistaken for a photo.
>Image generation may not always trigger:
>The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image").
>The model may stop generating partway through. Try again or try a different prompt.
Seriously?
Does it still charge 2 cents for that? Lol
Anyone know if this can be prompted with image to image?
Wasn't Imagen 4 released months ago?
Yes, but usage was very limited / restricted. Now it's widely available
I hate that they always announce their image models months before they make them available. They should just announce them later. OpenAI does this much better, with a few days delay at most.
They were available, just rate limited.
The comments here are priceless. In less than five years time we have gone from "That's impossible" to "Meh, it doesn't solve P=NP if prompted.".
For those commenting in the latter category, it might be worthwhile to read a bit about the underlying technology and share your insights on why it does not deliver.
You are ignoring all the hyping here.
this is false and the two things are not correlated.
if you followed news during the GAN cycle you could extrapolate that deep NN could do this type of things. it is really cool that this things happened so fast, but we are talking about companies that have the money to deploy thousands of cars around the globe to collect data, so they absolutely know how to gather data
The webcomics is awful. It feels off, the characters look very fake, unsettling in the way they communicate. The prompt is shown bellow the image, but for me the result looks closer to a prompt "Create lifeless characters reciting marketing slop. They must fake an over exaggerated excitement but it should be clear they don't believe in what they're saying and have no souls".
Also, the prompt specifically ask "Panel 4 should show the cat and dog high-fiving" but the cat is high-fiving ... the cat. Personally I find this hallucinated plot twist good, it makes the ending a bit better. Although technically this is demonstrating a failure of the tool to follow the instructions from the prompt. Interesting choice of example for an official announcement.
It's weird because I just asked the basic copilot app the same and got a much better result.
https://i.imgur.com/kSuqCYg.jpeg
It's definitely just a matter of personal preference. To me, your image looks much worse and has the very distinctive look of the GPT-image-1 model.
As others have said, with so many errors, it's just more AI slop.
Does the world need yet another AI slop generator?