TokenVerse: Multi-Concept Personalization in Token Modulation Space by Google

(token-verse.github.io)

106 points | by 037 4 days ago ago

22 comments

  • jasonjmcghee a day ago

    Feels like a moodboarding multiplier for some design disciplines, if these aren't cherry-picked / transfer to other domains.

    Pretty interesting.

    Seems like you could apply similar ideas to text too.

    • test6554 11 hours ago

      Something like this could make creatives feel more in control and less averse to AI image generation.

    • mettamage a day ago

      Not just a moodboard if you can highlight the words you want in your output.

  • sdenton4 a day ago

    This looks like an excellent step towards being able to apply consistency to generated images across a series.

    • basch a day ago

      It looks as if it would trivially integrate into Whisk, which already has a similar feature for defining an outputs “subjects” “scene” and “style”.

  • eichi 18 hours ago

    Seems good tool for automated PowerPoint generation but I believe the content is much more important.

    So why not automate the attachment of images to concentrate on the more important tasks? I hope the code will be available soon!

  • PoignardAzur 17 hours ago

    Looks like it lets you transfer textures, poses, and general "vibe of object" stuff, but still doesn't let you control composition.

    Overall the example images still look like overly corporate slop.

  • ziofill 20 hours ago

    Storytelling with characters is a huge selling point

  • jiggawatts a day ago

    “Code coming soon” from Google means basically never.

    I don’t understand why they keep making these announcements and then just sitting on the results.

    This is an immediately commercially useful product even as an API. You could make a mobile app for kids to “create their own cartoon story”.

    Someone else will have to reproduce this for it to see the light of day.

    • wildermuthn 14 hours ago

      We might eventually see the code for the models and inference, but I do doubt we’ll see training code or be granted access to training data. Google is pretty bad at this.

    • Kiro 21 hours ago

      I have the same feeling but I wonder if it's actually true. Do we have any examples of announcements where the code was never released?

  • HenryBemis 20 hours ago

    My eyes are so tired... I read it as "TolkienVerse" (wishful thinking took over actual letters I guess) and I thought WOW!!!!! that is amazing!!!! When I clicked and saw "a dog wearing a hat".. uff...

  • doctorpangloss a day ago

    If it worked with people, they’d show people.

    • SeanAnderson a day ago

      The second example in the "Results" section includes a human.

      • doctorpangloss a day ago

        They do not show a realistic photo face transfer. They blur out the faces even.

        It would be a huge invention, but they did not achieve that.

        • drewbeck a day ago

          Below the first Results header is a carousel of images. If you tap the arrows you can explore — I believe there are three examples where the final image is a person who’s face was applied from a reference photo.

          • doctorpangloss a day ago

            Yes. But. The reference photo is blurred. The smallest details matter for faces! That's the whole point. I have no doubt you can do a kind-of-looks-like faces. But this is the same issue since Dreambooth. All the IP transfer approaches, even the best like Ideogram's, are failing on faces.

            • rprwhite a day ago

              There's two images where the face is transferred to the final image. The references images with blurred faces are all being used for a different reference; the pose, or "necklace", etc. The faces are blurred in every image unless they explicitly want the face transferred to the final image, at least that's how it seems.

              • doctorpangloss a day ago

                I know. But there are no unblurred source images of faces. This isn't complicated.

                • danpalmer a day ago

                  https://token-verse.github.io/results/multi_concepts/25.png

                  https://token-verse.github.io/results/multi_concepts/06.png

                  Both of these show a man's face in a source image being used in a newly generated image. I agree that it isn't complicated, but you seem to be drawing different conclusions to everyone else here.

                  If your point is that it can't perform face transfer, you seem to be wrong - that's what's happening here. If your point is that the blurred photos used for other parts of the input mean that this suggests the model may get confused by other faces, then that's a fair point, but it seems clear they have demonstrated face transfer, and requiring blurring irrelevant faces seems a minor point compared to transferring the face that's intended. I'm not sure how that would really impact use-cases.

                  • doctorpangloss a day ago

                    Well. If they had working face / human character transfer, listen, my dude, every single image would show a face transfer. It's one of the biggest challenges.

                    • rhet0rica 18 hours ago

                      Hot take: there are no legitimate use cases for human face transfer.