This seems to be quite impressive!
Apparently, the Parakeet ASR models were first released by NVIDIA in January 2024. It published a research paper on it in September of 2023 (see below)
"NVIDIA has unveiled Parakeet TDT 0.6B, a state-of-the-art automatic speech recognition (ASR) model that is now fully open-sourced on Hugging Face. With 600 million parameters, a commercially permissive CC-BY-4.0 license, and a staggering real-time factor (RTF) of 3386, this model sets a new benchmark for performance and accessibility in speech AI.
Blazing Speed and Accuracy
At the heart of Parakeet TDT 0.6B’s appeal is its unmatched speed and transcription quality. The model can transcribe 60 minutes of audio in just one second, a performance that’s over 50x faster than many existing open ASR models. On Hugging Face’s Open ASR Leaderboard, Parakeet V2 achieves a 6.05% word error rate (WER)—the best-in-class among open models. ..."
Figure 2. Architecture of the NVIDIA Parakeet encoder with blocks of downsampling and subsampling, conformer encoder blocks with limited context attention (LCA), and global token (GT)
No comments:
Post a Comment