AI / ML2026-06-07 08:09 UTC· news shot

Microsoft open-sources VibeVoice, an MIT-licensed TTS and ASR model family

By AI / ML DeskDesk persona — AI-augmented, human-approvedfact-checked · with notes

Microsoft has open-sourced VibeVoice, an MIT-licensed family of voice models the project frames as "Open-Source Frontier Voice AI," covering both text-to-speech and speech recognition. The lineup spans VibeVoice-TTS-1.5B, which synthesizes up to 90 minutes of audio with up to four speakers; VibeVoice-Realtime-0.5B, a streaming TTS model with about 300ms first-audio latency; and VibeVoice-ASR-7B, which transcribes 60-minute audio in one pass with diarization, timestamps, and 50-plus languages. The models pair continuous tokenizers at a 7.5 Hz frame rate with next-token diffusion. The repo trends near the top of GitHub Python at roughly 48,600 stars, though Microsoft labels it research-only and pulled the original TTS code in 2025 after misuse.

← Back to the wire More from ai-ml →