Voxtral transcribes at the speed of sound

Overview

Mistral released Voxtral Transcribe 2, a new family of audio transcription models that includes both an open-weight version and an API service. The models demonstrate real-time transcription capabilities with impressive accuracy even for technical jargon, positioning Mistral as a direct competitor to OpenAI’s Whisper in the speech-to-text space.

Key Facts

Open-weight Voxtral Realtime model available under Apache 2.0 license - developers can run transcription locally without API dependencies
Live demo shows real-time transcription of technical terms like Django and WebAssembly - handles specialized vocabulary that often trips up generic models
API model priced at $0.003/minute ($0.18/hour) - significantly cheaper than many existing transcription services
Includes speaker diarization and context biasing features - can distinguish between speakers and improve accuracy for domain-specific terms
Console provides transcript export in text, SRT, and JSON formats - streamlines workflow from audio to usable text formats
Claims to transcribe ‘at the speed of sound’ - near-instantaneous results rather than batch processing delays

Why It Matters

This represents a major challenge to OpenAI’s dominance in speech-to-text AI, offering both open-source alternatives and competitive pricing that could democratize high-quality transcription technology for developers and businesses.