5 comments

  • amd92 16 hours ago

    The LLM/deterministic split is the smart call here. You can iterate on a script without the rest of the pipeline drifting under you. Curious how far the vowel-per-word heuristic holds before you wish you had Rhubarb, but "regenerates instantly" sounds like the right tradeoff for a studio loop.

  • vaporaviatorlab a day ago

    This looks great. Curious about the lip-sync — viseme set or just open/closed mouths? The South Park style is super forgiving but HyperFrames quality seems like it'd need more.

  • comicink 21 hours ago

    Very cool! I will definitely try this out - cartoons is something I have been interested in for a while. Will check it out.

  • mdrzn a day ago

    static video with text2speech audio and two circles moving representing the mouths: "OMG I might have a show on my hands "

  • fractallyte a day ago

    I went into this imagining something like Synfig Studio (https://www.synfig.org/) or Moho (https://moho.lostmarble.com/). "Studio" here is quite far from what it actually is: lip-syncing in static characters.

    Also, Moho offers far more comprehensive (and comprehensible!) lip-sync: https://lostmarble.com/papagayo/

    I get that you're using AI to boost capability with less effort, but at the moment, I think the more specialized tools are still a better avenue for this.

    Lastly, I followed the link to Jellypod (https://www.jellypod.com/). It's pretty good, but falls into a vocal "uncanny valley". Even a human reading from a script wouldn't sound that perfect; the enunciations immediately come across as artificial.

    Now, if this was an extension to Synfig (also open source!), it would be a much more interesting venture...