Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

ServiceNow AI benchmarks leading ASR models on code-switched (bilingual mixed) speech to evaluate real-world voice agent readiness.

Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.

Code-switching is a common phenomenon in bilingual communities worldwide: speakers naturally switch between two languages within the same sentence, or even within the same semantic unit. For Taiwanese users, this scenario is all too familiar—phrases like "We have a meeting this afternoon to discuss the Q3 roadmap and KPI achievement rate" are common in the workplace. However, this mixed language model poses a significant challenge for the Automatic Speech Recognition (ASR) system: the model needs to detect language switches in real time within the same speech stream and correctly transcribe vocabulary from both languages without explicit separators.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.