Introducing Scribe v2 Realtime

ElevenLabs launched Scribe v2 Realtime for low-latency multilingual speech-to-text and voice agent workflows.

ElevenLabs introduced Scribe v2 Realtime, a low-latency speech-to-text model built for live transcription, voice agents, meeting assistants, and real-time captions. The company says it transcribes in under 150 ms across several major languages and supports 90 languages. Key features include automatic language detection, VAD, manual commit, text conditioning, multiple audio formats, API access, ElevenLabs Agents integration, and enterprise compliance options.

ElevenLabs released Scribe v2 Realtime on its official blog, a model aimed at real-time speech-to-text, with a focus on low latency, live-use scenarios, and voice-agent integration. The company positions it as a Speech to Text technology suitable for voice agents, meeting assistants, and real-time captioning, and states that it can achieve transcription latency below 150ms in languages such as English, French, German, Italian, Spanish, and Portuguese, while supporting 90 languages. For developers and product teams in Taiwan, this means it is not merely an offline transcript tool, but is closer to a real-time speech-understanding component that can be embedded into interactive flows.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on ElevenLabs Blog →

Summaries are AI-generated; the original article is authoritative.