This seems like a massive improvement for openly available local ASR. Even the 300M model outperforms whisper-large-v3 according to the paper's benchmarks.
How hard is it to make TTS out of this? A few independent journalists from Belarus asked for TTS in their language, but I am no expert, was thinking about re-using Mozilla's work. What's the easiest way to get working TTS for a language?
This seems like a massive improvement for openly available local ASR. Even the 300M model outperforms whisper-large-v3 according to the paper's benchmarks.
How hard is it to make TTS out of this? A few independent journalists from Belarus asked for TTS in their language, but I am no expert, was thinking about re-using Mozilla's work. What's the easiest way to get working TTS for a language?
From TFA, it says that it’s extremely easy to add new languages with just a few examples. I didn’t see specifics on how “few” it really is, though.
any insights on latency?
HF Demo: https://huggingface.co/spaces/facebook/omniasr-transcription...
GitHub: https://github.com/facebookresearch/omnilingual-asr
Thanks! I've added those links to the toptext as well.