Voxtral TTS

Mistral AI released Voxtral TTS, a multilingual low-latency text-to-speech model for voice agents.

Mistral AI introduced Voxtral TTS, its first text-to-speech model, targeting natural multilingual voice generation across nine languages. The 4B-parameter model supports voice adaptation from short references, emotional expressiveness, dialect handling, and low-latency streaming. It is available through API, Mistral Studio, and Le Chat, with open weights on Hugging Face under a non-commercial CC BY NC 4.0 license.

Mistral AI has released Voxtral TTS, the company's first text-to-speech model, positioned as a multilingual, low-latency, voice-customizable enterprise-grade speech output layer. The company says the model is roughly 4B parameters and supports 9 languages including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic, and stresses that it does more than read text aloud — it can produce more natural rhythm, pauses, intonation, and emotional expression based on context. For voice agents, customer service, automated announcements, and voice-interaction products, this means Mistral is extending its existing speech-understanding capabilities to the speech-generation side, completing the speech-to-speech workflow.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Mistral AI News →

Summaries are AI-generated; the original article is authoritative.