Overview
Mistral released Voxtral Transcribe 2, a new family of audio transcription models that includes both an open-weight version and an API service. The models demonstrate real-time transcription capabilities with impressive accuracy even for technical jargon, positioning Mistral as a direct competitor to OpenAI’s Whisper in the speech-to-text space.
Key Facts
- Open-weight Voxtral Realtime model available under Apache 2.0 license - developers can run transcription locally without API dependencies
- Live demo shows real-time transcription of technical terms like Django and WebAssembly - handles specialized vocabulary that often trips up generic models
- API model priced at $0.003/minute ($0.18/hour) - significantly cheaper than many existing transcription services
- Includes speaker diarization and context biasing features - can distinguish between speakers and improve accuracy for domain-specific terms
- Console provides transcript export in text, SRT, and JSON formats - streamlines workflow from audio to usable text formats
- Claims to transcribe ‘at the speed of sound’ - near-instantaneous results rather than batch processing delays
Why It Matters
This represents a major challenge to OpenAI’s dominance in speech-to-text AI, offering both open-source alternatives and competitive pricing that could democratize high-quality transcription technology for developers and businesses.