Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
This Hugging Face Blog post appears to be a practical tutorial for fine-tuning NVIDIA Nemotron 3.5 ASR. Based on the title, it focuses on adapting speech recognition to a target language, specialized domain, or accent. The original text was not provided, so implementation details, datasets, commands, metrics, and hardware requirements cannot be confirmed.
Hugging Face has recently made a major update to its popular Open ASR (Automatic Speech Recognition) leaderboard, aimed at combating the increasingly serious…
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
This technical blog post from Hugging Face introduces how to build a powerful and efficient speech processing system using Hugging Face Inference Endpoints — a…
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
Meta's MMS (Massively Multilingual Speech) project, released in 2023, extends speech technology to over 1,000 languages, covering automatic speech recognition…
Microsoft's SpeechT5 model has been officially integrated into Hugging Face's Transformers library. This represents a significant advancement in the field of…
OpenAI's Whisper is a powerful automatic speech recognition (ASR) model. While its zero-shot capabilities are impressive, there remains significant room for…
In the field of automatic speech recognition (ASR), Wav2Vec2 is a revolutionary model, but it faces a significant challenge when processing long audio files…
This technical blog post from Hugging Face introduces how combining n-gram language models (LMs) can significantly improve the performance of Wav2Vec2…
Automatic speech recognition (ASR) has achieved remarkable success for resource-rich languages such as English and standard Mandarin, but building…
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…