OpenAI’s Whisper: 7 must-know libraries and add-ons built on top of it
In Sept 2022, OpenAI released Whisper, the world’s most accurate speech recognition (ASR) that can transcribe and translate speech audio from 97 languages!
While it touted state-of-the-art accuracies in speech-to-text of many languages, it lacked several key aspects of speech-to-text systems like word level time stamps, speaker diarisation, etc.
Here are the top 7 community enhancements on top of Whisper that address some of the shortcomings!
1. WhisperX — Word-level time stamps with Whisper
Whisper’s transcription accuracy was great but lacked fine-grained word-level timestamps.
This repo combines Whisper with Phoneme-based ASR to deliver word-level timestamps using forced alignment!
GitHub - m-bain/whisperX: WhisperX: Automatic Speech Recognition with Word-level Timestamps (&…
Whisper-Based Automatic Speech Recognition (ASR) with improved timestamp accuracy using forced alignment. This…
Transcribe and translate audio on your personal computer using OpenAI’s Whisper.
Supports real-time transcription and translation from your computer’s microphones by time chunking and passing it to Whisper!
GitHub - chidiwilliams/buzz: Buzz transcribes and translates audio offline on your personal…
Transcribe and translate audio offline on your personal computer. Powered by OpenAI's Whisper. To install Buzz…
3. Fine-Tune Whisper For Multilingual ASR with HuggingFace Transformers
Whisper is great but can you further improve the accuracy by fine-tuning on your custom dataset?
This blog has you covered!