Mistral AI NewsJun 8, 2026, 9:02 AMimportant 78

Voxtral

Original: Research Voxtral July 15, 2025 Mistral AI

Mistral AI releases open Voxtral speech understanding models for transcription, audio Q&A, summaries, and voice-triggered actions.

Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.

In a research article, Mistral AI introduces Voxtral, positioned as a series of open, low-cost, production-ready speech understanding models. This release includes the 24B version Voxtral Small and the 3B version Voxtral Mini; the former targets larger-scale cloud or enterprise applications, while the latter is suited to on-device and edge deployment. Both versions are released under the Apache 2.0 license and can be downloaded from Hugging Face, as well as used via the Mistral API. On the API side, there is also Voxtral Mini Transcribe, optimized for transcription, with a focus on cost and latency efficiency, priced from $0.001 per minute. In terms of functionality, Voxtral's focus is not pure ASR but integrating speech transcription and semantic understanding into the same model pipeline. The capabilities the company lists include a 32k token long context, able to handle up to about 30 minutes of transcription or 40 minutes of understanding tasks; the ability to directly ask questions about audio content and generate structured summaries; support for automatic language detection and multilingual performance; and the ability to trigger function calling based on user intent in the speech, connecting voice interaction directly to backend workflows or APIs. Mistral claims that Voxtral outperforms Whisper large-v3 on English and multilingual transcription benchmarks and beats GPT-4o mini Transcribe and Gemini 2.5 Flash on some tasks, and is also competitive in speech translation and audio understanding. The company also mentions that Voxtral retains the text understanding capability of Mistral Small 3.1 as its language model backbone, so it can be used for downstream applications such as summarization, Q&A, analysis, and insights. For Taiwanese developers and product teams, the key points of this article are the open-source license, self-hostable deployment, low pricing, and speech-to-action integration, which could lower the barrier to adopting voice AI in voice customer service, meeting summaries, multilingual content processing, and privacy-sensitive industries.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Mistral AI News →

Summaries are AI-generated; the original article is authoritative.