Microsoft Open-Sources Production-Grade Speech AI
Microsoft open-sources VibeVoice, a 7B parameter speech AI with 60-minute audio processing, speaker diarization, and 50+ language support. ASR available, TTS removed due to abuse risks.
VibeVoice is Microsoft’s open-source, cutting-edge speech AI model. Its core capabilities are just two: Automatic Speech Recognition (ASR) and Text-to-Speech (TTS). However, the TTS portion of the code has already been removed due to potential abuse risks. So what’s currently usable is the speech recognition part.
I’ll put the project link at the end. First, let me explain what this project can actually do.



