I asked it to create a story that described the modes of the major scale with a cartoon treble clef as the main character.
It created a 10 page story that stuck to the topic and was overall coherent. The main character changed color and style on every page, so no consistency there. The overall page layouts and animation style were reasonably consistent.
The metaphor it used was the character climbing a mountain and encountering other characters that represented each mode. Each supporting character was reasonably unique, although note motif was present on 3 or 4. The mountain also changed significantly and the character was frequently back at the bottom. However, in the end, he does reach the summit.
I can't say I am overly impressed but it does mostly do what they claim.
I tried this to and had the same experience on half the books I tried to create. A lot of products I've tried have this issue and I think it will get better. I've been using and will stick to KidsAIStory as it allows me to use the same characters across the books. Also my child would be sad that they can't read their favorite series when google kills off another product.
I think it'd be amazing if I had the energy to make up improv bedtime stories every night. (We have a "King Dragon" improv series happening lately, which involves a lot of farts)
BUT, I don't always have that energy, and I already spend hours a day reading stories to my kids, so I am okay with them spending some fraction of time hearing stories from robots/screens/etc. (Lately, it's "Hey Google, tell a story" if mommy is too busy to read)
I hope we never stop paying amazing children's book illustrators though! I have so many books where I marvel at each page and the ingenuity of the illustrative style.
Lol, I just tried to get it to draw the story about King Dragon farting, but it could not come up with a picture of a dragon farting - it turned it into fire coming from its mouth instead! It's too far outside its training data.
How much of the story was generated by Gemini? The dragon overcomes the problem by using a fart to blow away the fog. Wouldn't that just be replacing one cloud with an equally visibility limiting cloud... only worse smelling?
Feels like this could have been an opportunity by the LLM to take the story in a less pedestrian direction with a loose parable around a noxious equivalent to Maslow's hammer.
Lol, yes, the dragon's torso turned into a man. That man does show up earlier in the story - I think perhaps the model so closely associates dragon stories with stories of men, it just desperately wanted to add one in? The text itself never actually mentions the man/dragon/torso.
If Gemini added a reflection step to its book drawing routine, I think the model could easily notice the errors, and generate images to correct them - the errors do not seem unsurmountable.
Given that, I'm assuming Amazon is or will soon be filled with decently illustrated somewhat amusing stories.
> Is there something lost, when it's not the adult telling the child a bedtime improv story? (IME, kids love this.)
Kids use their imagination because they're encouraged to do so. It's somewhat of a challenge to find the cusp between what is plain and what is incomprehensible (think of the ZPD but for creativity).
When I've seen parents amuse kids with AI slop, the kids ask for more slop. When I've seen parents amuse kids with improv, the kids participate. Kids love both, and like nutrition... kids love sugar.
gemini app is really funny because they ship ridiculously complicated features like this before fixing the basic ability to have a chat history with apps activity turned off
Imagine the meetings where they decide to add personal illustrated storybooks before fixing chat histories
Nobody gets promoted for fixing bugs. That is the sad state of big tech.
My theory is this misalignment of incentives is probably at the heart of most of our quality rot in software. Product managers are incentivized to create new features that boost the daily active users, while generally blind to the death by a thousand cuts caused by all the quality issues.
I don't see this as a misalignment, it is a choice by the company, to the extent companies as a whole make choices. The incentives are the manifestations of those choices.
It's the exact same issue that science has. Reproducing past work won't get you funding so no one bothers unless there's no alternative. Negative results won't get you funding so no one publishes them which means people inevitably repeat the same failed experiments.
It seems like large institutions almost inevitably accumulate misaligned incentives along with an inertia that makes them almost impossible to correct.
If it were possible to turn off reports back to the manufacturer of what features were used and how often they were used on a grand scale, we'd enter into a golden age of software. I hate using technology now, with every button click reporting that it was clicked and features being sorted by "most used" instead of into logical placements relative to one another.
I think that’s on purpose (the chat history thing), because they actually keep the data (I’m the admin in a Workspace and even though we have Apps Activity turned off, everything still gets logged for compliance and I cannot disable it)
I asked it to create the kind of storybook my toddler would have asked for ("create a storybook about a music truck and an ice cream truck and a mailman and a carwash", inspired by his request for a story last night), and the results were certainly... interesting.
Obviously Gemini doesn't know that "music truck" is another name for "ice cream truck", but more concerningly, the illustrations it made for the trucks were this kind of eldritch amalgamation of Cars-movie style cars and people driving cars. The story was just OK, I don't think it would have kept my toddler's attention for the whole ten pages. Plus, the mailman is barely involved.
I work on a product that uses AI to write interactive stories and I think it's a perfect description.
People hear my product "writes stories" and always ask why the site doesn't have any features to share a full story: because it wouldn't make sense.
It'd be like listening to a stream of every song a person has ever played for themselves. Maybe they didn't write the songs, but they chose them based on the moment. Sometimes they start a song and skip half way because they already got the emotion, sometimes they repeat the saddest part 10 times.
They weren't trying to build a playlist for others to consume, it was for them, and only they could have come up with it.
That is so cool! Thanks for the Gemini team for working on that, a great and innovative feature.
Just a heads up: as I tried to print several stories to PDF, most times one of the generated images did not appear on the PDF. It’s surely a bug of some sort, because regenerating stories eventually makes it go away. Hope these kinds of issues will be fixed soon.
Not to sound hyperbolic and this is really just the beginning of significant AI, but will there be anything left for humans to do or create when all this is done?
I can't seem to get it to work. It just summarizes whatever I plug in.
Edit: Even without giving it context, at best, just get a single picture and two paragraphs. Maybe they are slowly rolling the feature out. It doesn't seem to get it.
Try starting your prompt with "Create a storybook about..." - this specific phrasing seems to be the trigger phrase that activates the full storybook generation mode.
https://gemini.google.com/gem/storybook redirects me to https://gemini.google.com/app, and when I ask "Create me a storybook" it replies with "Creating a full storybook with multiple images in one go is a bit tricky for me right now. My current tools have a limitation where I can only generate one image at a time, and a storybook needs many pictures to go with each page."
Damn... pretty good. Generated a 10 page booklet including high quality graphics and cohesive story right on point with my prompt. It would've taken me at least an hour mucking around with LLMs and image generators to get the same result that it spit out in ~30 seconds.
I asked it to create a story that described the modes of the major scale with a cartoon treble clef as the main character.
It created a 10 page story that stuck to the topic and was overall coherent. The main character changed color and style on every page, so no consistency there. The overall page layouts and animation style were reasonably consistent.
The metaphor it used was the character climbing a mountain and encountering other characters that represented each mode. Each supporting character was reasonably unique, although note motif was present on 3 or 4. The mountain also changed significantly and the character was frequently back at the bottom. However, in the end, he does reach the summit.
I can't say I am overly impressed but it does mostly do what they claim.
I tried this to and had the same experience on half the books I tried to create. A lot of products I've tried have this issue and I think it will get better. I've been using and will stick to KidsAIStory as it allows me to use the same characters across the books. Also my child would be sad that they can't read their favorite series when google kills off another product.
> "This is my kid’s drawing. He’s 7 years old. Write a creative storybook that brings his drawing to life.”
Is there something lost, when it's not the adult telling the child a bedtime improv story? (IME, kids love this.)
Is something else gained by the generated storybook?
I think it'd be amazing if I had the energy to make up improv bedtime stories every night. (We have a "King Dragon" improv series happening lately, which involves a lot of farts)
BUT, I don't always have that energy, and I already spend hours a day reading stories to my kids, so I am okay with them spending some fraction of time hearing stories from robots/screens/etc. (Lately, it's "Hey Google, tell a story" if mommy is too busy to read)
I hope we never stop paying amazing children's book illustrators though! I have so many books where I marvel at each page and the ingenuity of the illustrative style.
Lol, I just tried to get it to draw the story about King Dragon farting, but it could not come up with a picture of a dragon farting - it turned it into fire coming from its mouth instead! It's too far outside its training data.
Link: https://g.co/gemini/share/188609ce3e1f
How much of the story was generated by Gemini? The dragon overcomes the problem by using a fart to blow away the fog. Wouldn't that just be replacing one cloud with an equally visibility limiting cloud... only worse smelling?
Feels like this could have been an opportunity by the LLM to take the story in a less pedestrian direction with a loose parable around a noxious equivalent to Maslow's hammer.
Funny story, nicely illustrated.
But what’s wrong with image 10/10?
Lol, yes, the dragon's torso turned into a man. That man does show up earlier in the story - I think perhaps the model so closely associates dragon stories with stories of men, it just desperately wanted to add one in? The text itself never actually mentions the man/dragon/torso.
If Gemini added a reflection step to its book drawing routine, I think the model could easily notice the errors, and generate images to correct them - the errors do not seem unsurmountable.
Given that, I'm assuming Amazon is or will soon be filled with decently illustrated somewhat amusing stories.
> Is there something lost, when it's not the adult telling the child a bedtime improv story? (IME, kids love this.)
Kids use their imagination because they're encouraged to do so. It's somewhat of a challenge to find the cusp between what is plain and what is incomprehensible (think of the ZPD but for creativity).
> Is something else gained by the generated storybook?
Yeah, kids love creating stuff
When I've seen parents amuse kids with AI slop, the kids ask for more slop. When I've seen parents amuse kids with improv, the kids participate. Kids love both, and like nutrition... kids love sugar.
> Is something else gained by the generated storybook?
The opportunity for low-effort, low-talent grifters to make a buck on Amazon?
gemini app is really funny because they ship ridiculously complicated features like this before fixing the basic ability to have a chat history with apps activity turned off
Imagine the meetings where they decide to add personal illustrated storybooks before fixing chat histories
Nobody gets promoted for fixing bugs. That is the sad state of big tech.
My theory is this misalignment of incentives is probably at the heart of most of our quality rot in software. Product managers are incentivized to create new features that boost the daily active users, while generally blind to the death by a thousand cuts caused by all the quality issues.
Exploration over exploitation.
I don't see this as a misalignment, it is a choice by the company, to the extent companies as a whole make choices. The incentives are the manifestations of those choices.
It's the exact same issue that science has. Reproducing past work won't get you funding so no one bothers unless there's no alternative. Negative results won't get you funding so no one publishes them which means people inevitably repeat the same failed experiments.
It seems like large institutions almost inevitably accumulate misaligned incentives along with an inertia that makes them almost impossible to correct.
If it were possible to turn off reports back to the manufacturer of what features were used and how often they were used on a grand scale, we'd enter into a golden age of software. I hate using technology now, with every button click reporting that it was clicked and features being sorted by "most used" instead of into logical placements relative to one another.
I think that’s on purpose (the chat history thing), because they actually keep the data (I’m the admin in a Workspace and even though we have Apps Activity turned off, everything still gets logged for compliance and I cannot disable it)
But yeah, it’s Google after all
Few companies are more gifted than Google at having an incoherent product approach.
Just thinking that this is the company that had UIs like Google Reader and still have UIs like Gmail, yet Gemini has the most retarded UX ever.
Yeah product management at Google is a complete mess.
I asked it to create the kind of storybook my toddler would have asked for ("create a storybook about a music truck and an ice cream truck and a mailman and a carwash", inspired by his request for a story last night), and the results were certainly... interesting.
Obviously Gemini doesn't know that "music truck" is another name for "ice cream truck", but more concerningly, the illustrations it made for the trucks were this kind of eldritch amalgamation of Cars-movie style cars and people driving cars. The story was just OK, I don't think it would have kept my toddler's attention for the whole ten pages. Plus, the mailman is barely involved.
Pretty consistent, but the story quality is a bit bad, really boring.
So many startups in that space that now get killed. Oscar Stories is going to have a hard time
The quality of the images, text, and layout was very high for the simple prompt I tried it with.
The problem of character consistency still exists.
The ending tag in the demo video is "For stories only you could imagine"
That's... uh... a pretty bold description for a tool where you are in fact outsourcing the "imagination" part to the machine.
I work on a product that uses AI to write interactive stories and I think it's a perfect description.
People hear my product "writes stories" and always ask why the site doesn't have any features to share a full story: because it wouldn't make sense.
It'd be like listening to a stream of every song a person has ever played for themselves. Maybe they didn't write the songs, but they chose them based on the moment. Sometimes they start a song and skip half way because they already got the emotion, sometimes they repeat the saddest part 10 times.
They weren't trying to build a playlist for others to consume, it was for them, and only they could have come up with it.
bookslop! get your bookslop here!
https://g.co/gemini/share/8d296b91b77b
Nice ! Curious what your prompt was.
It would be nice if Google could make Gemini in Slides and Sheets more than completely useless. Then we can talk about illustrated storybooks.
That is so cool! Thanks for the Gemini team for working on that, a great and innovative feature.
Just a heads up: as I tried to print several stories to PDF, most times one of the generated images did not appear on the PDF. It’s surely a bug of some sort, because regenerating stories eventually makes it go away. Hope these kinds of issues will be fixed soon.
This is just incredible.
Finally it gives generated text and images some sort of coherence that makes everything immediately "usable".
It is easier to develop something from a lot of text and images than having to assemble everything from zero.
Hope that it's editable too?
Not to sound hyperbolic and this is really just the beginning of significant AI, but will there be anything left for humans to do or create when all this is done?
Will there be humans left when all this is done?
Reminds me of The Primer from Diamond Age
I can't seem to get it to work. It just summarizes whatever I plug in.
Edit: Even without giving it context, at best, just get a single picture and two paragraphs. Maybe they are slowly rolling the feature out. It doesn't seem to get it.
Try this link: https://gemini.google.com/gem/storybook
Thank you!
Try starting your prompt with "Create a storybook about..." - this specific phrasing seems to be the trigger phrase that activates the full storybook generation mode.
https://gemini.google.com/gem/storybook
https://gemini.google.com/gem/storybook redirects me to https://gemini.google.com/app, and when I ask "Create me a storybook" it replies with "Creating a full storybook with multiple images in one go is a bit tricky for me right now. My current tools have a limitation where I can only generate one image at a time, and a storybook needs many pictures to go with each page."
> Try it today in the Gemini app. Available globally on desktop and mobile...
They refer (sort of) to a desktop app on the page, but I've never seen a Gemini desktop app. It seems they're just saying the web app on desktop...
I don't understand why Google doesn't have a true desktop app nor keyboard shortcuts. These things are so easy... (Especially the keyboard shortcuts.)
Their execution on everything but the model just seems terrible.
Damn... pretty good. Generated a 10 page booklet including high quality graphics and cohesive story right on point with my prompt. It would've taken me at least an hour mucking around with LLMs and image generators to get the same result that it spit out in ~30 seconds.
Are you just asking it to make a Storybook? I can't get it to work. Just an image and two paragraphs.
Try this link: https://gemini.google.com/gem/storybook