As multimodal large language models (such as GPT-4o, Gemini, and various open-source audio models) continue to proliferate, AI's ability to process audio has…
Automatic speech recognition (ASR) has achieved remarkable success for resource-rich languages such as English and standard Mandarin, but building…