URAvatar: Universal Relightable Gaussian Codec Avatars

(junxuan-li.github.io)

91 points | by mentalgear 14 hours ago ago

12 comments

  • dwallin 10 hours ago

    Given the complete lack of any actual details about performance I would hazard a guess that this approach is likely barely realtime, requiring top hardware, and/or delivering an unimpressive fps. I would love to get more details though.

    • ladberg 4 hours ago

      Gaussian splats can pretty much be rendered in any off the shelf 3D engine with reasonable performance, and the focus of the paper is generating the splats so there's no real reason for them to mention runtime details

      • dwallin 4 hours ago

        Relightable Gaussian Codec Avatars are very, very far from your off-the-shelf splatting tech. It's fair to say that this paper is more about a way of generating more efficiently, but in the original paper from the codec avatars team (https://arxiv.org/pdf/2312.03704) they required a A100 to run at just above 60fps at 1024x1024.

        Nothing here seems to have moved that needle.

  • jy14898 8 hours ago

    Interesting that under the "URAvatar from Phone Scan" section, the first example shows a lady with blush/flush, which only appears in the center video when viewed straight on - the other angles remove this

  • michaelt 10 hours ago

    Those demo videos look great! Does anyone know how this compares to the state of the art in generating realistic, relightable models of things more broadly? For example, for video game assets?

    I'm aware of traditional techniques like photogrammetry - which is neat, but the lighting always looks a bit off to me.

    • zitterbewegung 7 hours ago

      I don’t do video game programming but what I have heard about engines is that lighting is controlled by the game engine and it’s one step in the pipeline to render the game. Ray tracing is one technique where the light source and the location of the 3d model has simulated light rays in relation of the light source and model.

      They are probably rendering with a simple lighting model since this is a system where lighting in a game is handled by another algorithm

  • mentalgear 11 hours ago

    With the computational efficiency of Gaussian splatters, this could be ground-breaking for photorealistic avatars, possible driven by LLMs and generative audio.

  • chpatrick 11 hours ago

    Wow that looks pretty much solved! Is there code?

    • mentalgear 11 hours ago

      Unfortunately not yet. Also code alone without the training data and weights might still requires considerable effort. I also wonder how diverse their training data is, i.e. how well the solution will generalize.

      • vessenes 7 hours ago

        I'll note that they had pretty good diversity in the test subjects shown - weight, gender, some racial diversity. I thought it was above average compared to many AI papers that aren't specifically focused on diversity as a training goal or metric. I'm curious to try this. Something tells me this is more likely to get bought and turned into a product or an offering than to be open sourced, though.

  • petesergeant 8 hours ago

    This is great work, although I note that the longer you look at them, and the more examples you look at in the page, the wow factor drops off a bit. The first example is exceptional, but when you get down to the video of "More from Phone Scan" and look at any individual avatar, you find yourself deep in the uncanny valley very quickly

    • brk 6 hours ago

      I noticed that too. It also doesn't seem to always know how to map (or remove) certain things, like the hair bun on the input image, to the generated avatars once you get outside of the facial region.