The title is dense and the paper is short. But the demo is outstanding: (https://huggingface.co/spaces/aiola/whisper-ner-v1). The sample audio is submitted with "entity labels" set to "football-club, football-player, referee" and WhisperNER returns tags Arsenal and Juventus for the football-club tag. They suggest "personal information" as a tag to try on audio.
Impressive, very impressive. I wonder if it could listen for credit cards or passwords.
"The model processes audio files and simultaneously applies NER to tag or mask specific types of sensitive information directly within the transcription pipeline. Unlike traditional multi-step systems, which leave data exposed during intermediary processing stages, Whisper-NER eliminates the need for separate ASR and NER tools, reducing vulnerability to breaches."
The title is dense and the paper is short. But the demo is outstanding: (https://huggingface.co/spaces/aiola/whisper-ner-v1). The sample audio is submitted with "entity labels" set to "football-club, football-player, referee" and WhisperNER returns tags Arsenal and Juventus for the football-club tag. They suggest "personal information" as a tag to try on audio.
Impressive, very impressive. I wonder if it could listen for credit cards or passwords.
GitHub repo: https://github.com/aiola-lab/whisper-ner
Hugging Face Demo: https://huggingface.co/spaces/aiola/whisper-ner-v1
Pretty good article that focuses on the privacy/security aspect of this — having a single model that does ASR and NER:
https://venturebeat.com/ai/aiola-unveils-open-source-ai-audi...
Wouldnt it be better to run normal Whisper and NER on top of the transcription before streaming a response or writing anything to disk?
What advantage does this offer?
Looks like only inference available and no fine tuning code available
"The model processes audio files and simultaneously applies NER to tag or mask specific types of sensitive information directly within the transcription pipeline. Unlike traditional multi-step systems, which leave data exposed during intermediary processing stages, Whisper-NER eliminates the need for separate ASR and NER tools, reducing vulnerability to breaches."