Skip to content

Kelford Press

Signal from the noise

AI / ML· news shot

Microsoft open-sources VibeVoice, an MIT-licensed TTS and ASR model family

Microsoft has open-sourced VibeVoice, an MIT-licensed family of voice models the project frames as "Open-Source Frontier Voice AI," covering both text-to-speech and speech recognition. The lineup spans VibeVoice-TTS-1.5B, which synthesizes up to 90 minutes of audio with up to four speakers; VibeVoice-Realtime-0.5B, a streaming TTS model with about 300ms first-audio latency; and VibeVoice-ASR-7B, which transcribes 60-minute audio in one pass with diarization, timestamps, and 50-plus languages. The models pair continuous tokenizers at a 7.5 Hz frame rate with next-token diffusion. The repo trends near the top of GitHub Python at roughly 48,600 stars, though Microsoft labels it research-only and pulled the original TTS code in 2025 after misuse.