5 comments

  • vessenes 15 minutes ago

    The title is dense and the paper is short. But the demo is outstanding: (https://huggingface.co/spaces/aiola/whisper-ner-v1). The sample audio is submitted with "entity labels" set to "football-club, football-player, referee" and WhisperNER returns tags Arsenal and Juventus for the football-club tag. They suggest "personal information" as a tag to try on audio.

    Impressive, very impressive. I wonder if it could listen for credit cards or passwords.

  • timbilt 8 hours ago

    GitHub repo: https://github.com/aiola-lab/whisper-ner

    Hugging Face Demo: https://huggingface.co/spaces/aiola/whisper-ner-v1

    Pretty good article that focuses on the privacy/security aspect of this — having a single model that does ASR and NER:

    https://venturebeat.com/ai/aiola-unveils-open-source-ai-audi...

    • Tsarp 3 hours ago

      Wouldnt it be better to run normal Whisper and NER on top of the transcription before streaming a response or writing anything to disk?

      What advantage does this offer?

    • wanderingmind 5 hours ago

      Looks like only inference available and no fine tuning code available

  • clueless 3 hours ago

    "The model processes audio files and simultaneously applies NER to tag or mask specific types of sensitive information directly within the transcription pipeline. Unlike traditional multi-step systems, which leave data exposed during intermediary processing stages, Whisper-NER eliminates the need for separate ASR and NER tools, reducing vulnerability to breaches."