Latest in AI

Showing:speech-recognitionResearchersClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Benchmarking Google Eloquent Exposes Major On-Device Dictation Reliability Issues
r/LocalLLaMA top day3 days agoBenchmark
A LocalLLaMA user tried to benchmark Google’s new fully local dictation app, Eloquent, against open ASR models such as Qwen3-ASR and NVIDIA Parakeet V3. The tester reported that roughly half of dictations returned only fragments, even during manual use. When Eloquent produced complete transcripts, its word error rate was competitive, but the missing-output behavior made the app unreliable for evaluation and practical use.
Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech
Hugging Face Blog4 days agoBenchmark
Code-switching—where bilingual speakers blend two languages in a single utterance—is common in markets like Taiwan, Singapore, and India, yet most ASR benchmarks focus on monolingual audio. ServiceNow AI evaluates frontier speech recognition models specifically on this mixed-language scenario. The findings help enterprise teams make informed ASR model choices when deploying voice agents for multilingual customer-facing applications.
Introducing Cohere Transcribe: A New State-of-the-Art in Open-Source Speech Recognition★ 80
Cohere Blog6 days agoRelease
Cohere has announced "Cohere Transcribe," a new state-of-the-art open-source speech recognition model. Designed to deliver highly accurate and efficient speech-to-text capabilities, it represents Cohere's expansion into open-source audio AI. The model aims to challenge existing industry benchmarks like OpenAI's Whisper by offering superior multilingual performance.
How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent
Hugging Face Blog10 days agoTutorial
This Hugging Face Blog post appears to be a practical tutorial for fine-tuning NVIDIA Nemotron 3.5 ASR. Based on the title, it focuses on adapting speech recognition to a target language, specialized domain, or accent. The original text was not provided, so implementation details, datasets, commands, metrics, and hardware requirements cannot be confirmed.
Hugging Face 為 Open ASR 排行榜引入「防刷榜機制」，使用私有測試數據打擊 Benchmaxxer★ 75
Hugging Face Blog39 days agoRelease
Hugging Face has recently made a major update to its popular Open ASR (Automatic Speech Recognition) leaderboard, aimed at combating the increasingly serious…
Hugging Face 音訊資料集完整指南：從載入、預處理到串流處理★ 70
Hugging Face Blog1,277 days agoTutorial
With the rapid growth of voice AI (such as Whisper), efficiently handling audio datasets has become critically important. This guide from the official Hugging…
在 🤗 Transformers 中使用 n-gram 提升 Wav2Vec2 語音識別效能
Hugging Face Blog1,614 days agoTutorial
This technical blog post from Hugging Face introduces how combining n-gram language models (LMs) can significantly improve the performance of Wav2Vec2…