15 points | by noahkay13 2 hours ago ago
2 comments
I built a C++ inference engine for NVIDIA's Parakeet speech recognition models using Axiom(https://github.com/Frikallo/axiom) my tensor library.
What it does: - Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU, Nemotron), and speaker diarization (Sortformer) - Word-level timestamps - Streaming transcription from microphone input - Speaker diarization detecting up to 4 speakers
Off topic but if anyone is looking for a nice web-GUI frontend for a locally-hosted transcription engine, Scriberr is nice
https://github.com/rishikanthc/Scriberr
I built a C++ inference engine for NVIDIA's Parakeet speech recognition models using Axiom(https://github.com/Frikallo/axiom) my tensor library.
What it does: - Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU, Nemotron), and speaker diarization (Sortformer) - Word-level timestamps - Streaming transcription from microphone input - Speaker diarization detecting up to 4 speakers
Off topic but if anyone is looking for a nice web-GUI frontend for a locally-hosted transcription engine, Scriberr is nice
https://github.com/rishikanthc/Scriberr