44 comments

  • gwd 19 minutes ago

    This reminds me of a scene in "A Fire Upon the Deep" (1992) where they're on a video call with someone on another spaceship; but something seems a bit "off". Then someone notices that the actual bitrate they're getting from the other vessel is tiny -- far lower than they should be getting given the conditions -- and so most of what they're seeing on their own screens isn't actual video feed, but their local computer's reconstruction.

  • red0point 2 hours ago

    > But one overlooked use case of the technology is (talking head) video compression.

    > On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.

    It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.

    • gambiting an hour ago

      One cool use would be communication in space - where it's feasible that both sides would have access to high-end compute units but have a very limited bandwidth between each other.

      • bityard 33 minutes ago

        Bandwidth is not the limitation in space comms, latency is.

        • cogman10 8 minutes ago

          Underwater communications, on the other hand, could use this.

          Though, I somewhat doubt even 22kbps is available generally.

      • bliteben an hour ago

        Wonder if its better than a single color channel hologram though

      • JamesLeonis 36 minutes ago

        Increasingly mobile networks are like this. There are all kinds of bandwidth issues, especially when customers are subject to metered pricing for data.

    • omh an hour ago

      One use case might be if you have limited bandwidth, perhaps only a voice call, and want to join a video conference. I could imagine dialling in to a conference with a virtual face as an improvement over no video at all.

    • jl6 2 hours ago

      130m parameters isn’t insanely large, even for smartphone memory. The high GPU usage is a barrier at the moment, but I wouldn’t put it past Apple to have 4090-level GPU performance in an iPhone before 2030.

  • LeoPanthera 2 hours ago

    This is very impressive, but “perceptually lossless” isn’t a thing and doesn’t make sense. It means “lossy”.

    • Bjartr 17 minutes ago

      It may sound like marketing wank, but it does a appear to be an established term of art in academia as far back as 1997 [1]

      It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.

      [1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...

    • tatersolid 23 minutes ago

      I read “perceptually lossless” to be equivalent to “transparent”, a more common phrase used in the audio/video codec world. It’s the bitrate/quality at which some large fraction of human viewers can’t distinguish a losslessly-encoded sample and the lossy-encoded sample, for some large fraction of content (constants vary in research papers).

      As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.

    • high_byte 2 hours ago

      why not? if you change one pixel by one pixel brightness unit it is perceptually the same.

      for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.

      • codeflo an hour ago

        GP is correct, that’s the definition of “lossy”. We don’t need to invent ever new marketing buzzwords for well-established technical concepts.

        • AndrewDucker 29 minutes ago

          GP is incorrect.

          There is "Is identical", "looks identical" and "has lost sufficient detail to clearly not be the original." - being able to differentiate between these three states is useful.

    • rob74 37 minutes ago

      Yeah, all lossy compression could be called "perceptually lossless" if the perception is bad enough...

    • _ZeD_ 2 hours ago

      also are .mp3, yet they are hardly discernible from the originals

      • bityard 20 minutes ago

        Ability to tell MP3 from the original source was always dependent on encoder quality, bitrate, and the source material. In the mid 2000's, I tried to encode all of my music as MP3. Most of it sounded just fine because pop/rock/alt/etc are busy and "noisy" by design. But some songs (particularly with few instruments, high dynamic range, and female vocals) were just awful no matter how high I cranked the bitrate. And I'm not even an "audiophile," whatever that means these days.

        No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.

      • rini17 41 minutes ago

        not at 22kbit :)

    • lifthrasiir 2 hours ago

      It is definitely a thing given a good perceptual metric. The metric even doesn't have to be very accurate if the distortion is highly bounded, like only altering the lowermost bit. It is unfortunate that most commonly used distortion metrics like PSNR are not really that, though.

      • rini17 41 minutes ago

        But that's mathematically impossible, to restore signal from extremely low bitrate stream with any highly bounded distortion. Perhaps only if you have highly restricted set of posible input, which online meetings aren't.

        • lifthrasiir 23 minutes ago

          > Perhaps only if you have highly restricted set of posible input, which online meetings aren't.

          Are you sure? After all, you can effectively summarize meetings in a plain text which is extremely restricted in comparison to the original input. Guaranteed, exact manner of speech and motions and all subtleties should be also included to be fair, but that information is still far limited to fill the 20 kbps bandwidth.

          We need far more bandwidth only because we don't yet have an efficient way to reconstruct the input faithfully from such highly condensed information. Whenever we actually could, we ended up having a very efficient lossy algorithm that still preserves enough information for us human. Unless you are strictly talking about the lossless compression---which is however very irrelevant in this particular topic---, we should expect much more compression in the future even though that might not be feasible today.

  • Vecr 2 hours ago

    Fire Upon the Deep had more or less this. Story important, so I won't say more. That series in general had absolutely brutal bandwidth limitations.

  • JimDabell an hour ago

    I got some interesting replies when I suggested this technique here:

    https://news.ycombinator.com/item?id=22907718

  • pastelsky an hour ago

    Did not expect to see Emraan Hashmi in this post!

    • shaan7 an hour ago

      Indeed! Bollywood makes it to HN xD

  • antiquark 34 minutes ago

    Not quite lossless... look at the bicycle seat behind him. When he tilts his head, the seat moves with his hair.

    • manmal 2 minutes ago

      His gaze also doesn’t quite match.

  • AndrewVos 2 hours ago

    Elon weirdly looks more human than usual in the AI version!

  • andrewstuart 3 hours ago

    The more magic AI makes, the less magical the world becomes.

    • xyzsparetimexyz 2 hours ago

      Oh shut up. There's plenty of awful uses for ai but this isn't one of them

    • EarlKing 3 hours ago

      Clearly Sauron is a jealous ringmaker and doesn't like hobbits using his ring to shitpost.

      • Joel_Mckay 2 hours ago

        Probably just disappointed at the wasted bandwidth:

        24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps

        And this is done in Unreal games on a potato graphics card all the time:

        https://apps.apple.com/us/app/live-link-face/id1495370836

        I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3

        • scotty79 2 hours ago

          I think the point here is to make it photorealistic which everything apart from AI still fails at superhard.

          • Joel_Mckay 2 hours ago

            Take a minute to look something up first, and then formulate a more interesting opinion for us to discuss:

            https://www.unrealengine.com/en-US/metahuman

            The artifacts in raster image data is nowhere near what a reasonable model can achieve even at low resolutions. =3

            • scotty79 2 hours ago

              I know metahuman. As impressive as it is, when you judge by the standards of game graphics, if you are ever mislead into thinking metahumans are real humans or even real physically existing things it's time to see your eye doctor (and/or do MRI head scan).

              On the other hand AI videos can be easily mistaken for people or hyper realistic physical sculptures.

              https://img-9gag-fun.9cache.com/photo/aYQ776w_460svvp9.webm

              There's something basic about how light works that traditional computer graphics still fails to grasp. Looking at its productions and comparing it to what AI generates is like looking at output of amateur and an artist. Sure, maybe artist doesn't always draw all 5 fingers but somehow captures the essence of the image in seemingly random arrangement of light and dark strokes, while amateur just tries to do their best but fails in some very significant ways.

              • Joel_Mckay 2 hours ago

                "AI" videos make many errors all the time, but most people are not aware of what to look for... Undetectable CGI is done in film/games all the time, and indeed it takes talent to hide the fact it is fake.

                One could rely on the media encoder to garble output enough to look more plausible (people on potato devices are used to looking at garbage content.) However, at the end of the day the "uncanny valley" effect takes over every-time even for live action data in a auto-generated asset, as the missing data can't be "Magically" recovered with 100% certainty.

                Bye =3

                • scotty79 an hour ago

                  Undetectable CGI in games ... right. I don't think you are a gamer.

                  In movies it can be done with enough of manual tweaking by artists and a lot of photographic content around to borrow sense of reality from it.

                  "Potato" devices by which I assume you mean average phones, currently have better resolutions than PCs had very recently and a lot still do (1080p).

                  And a photo on 480p still looks more real than anything CGI (not AI).

                  Your signature is hilarious. I won't comment about the reasons because I don't want this whole thread to get flagged.

    • psychoslave 2 hours ago

      The greatest feat ever: let magic disappear before wonder of understanding.

    • andai an hour ago

      What did you mean by this?

    • satvikpendem 3 hours ago

      > Any sufficiently advanced technology is indistinguishable from magic.

      - Arthur C. Clarke

    • HPsquared 3 hours ago

      This is the power of numerical methods.

      • andrewstuart 3 hours ago

        There’s a finite amount of magic and if AI borrows it here then it must be repaid there.

    • andai 3 hours ago

      ?