Latest in AI

Showing:speech-to-textResearchersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Offline CPU Voice Loop for Ollama and LM Studio Agents
r/LocalLLaMA top day3 days agoNew Tool
A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day5 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
Research: Voxtral transcribes at the speed of sound
Mistral AI News6 days agoPaper
The title says Mistral AI’s Voxtral can transcribe “at the speed of sound,” suggesting a focus on fast speech-to-text. No article body is available, so details such as benchmarks, languages, pricing, API access, or release status cannot be confirmed. The item is most relevant to developers and researchers tracking Mistral’s work in speech and transcription models.
AI 原生醫療革命：每年節省醫生 10-20 小時、數分鐘內完成預先授權——專訪 Abridge 團隊★ 80
Latent Space30 days agoCommentary
Abridge is an AI-native startup focused on the healthcare sector. Its core product uses "Ambient Clinical Intelligence" technology to record clinical…
開源 AI 資源週報 (#20)：全新組織與模型類型登場！涵蓋 Nemotron Super、Sarvam、Cohere Transcribe 等最新進展
Interconnects (Nathan L.)76 days agoRelease
Prominent AI scholar and commentator Nathan Lambert, in his latest edition of Latest Open Artifacts (#20), has compiled the major recent developments in the…
EVA：ServiceNow AI 推出全新語音 Agent 評估框架★ 75
Hugging Face Blog82 days agoRelease
With the proliferation of GPT-4o, Gemini Live, and various end-to-end voice models, Voice Agents have become an important frontier in AI applications. However…
Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75
Hugging Face Blog205 days agoRelease
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
Hugging Face 推出極速 Whisper 語音轉文字 Inference Endpoints 部署方案★ 75
Hugging Face Blog397 days agoNew Tool
Hugging Face recently announced a brand-new, ultra-fast optimized deployment solution for OpenAI's open-source speech recognition model Whisper on its hosted…
Replicate Intelligence #4：探索 GPT 模型中的概念、瀏覽器即時語音轉文字與 H100 GPU 即將上線
Replicate Blog730 days agoRelease
Replicate has published its technical newsletter, Replicate Intelligence #4, summarizing recent major developments in the AI field as well as the latest…
使用 🤗 Transformers 微調 W2V2-BERT 以進行低資源語音辨識 (ASR)★ 75
Hugging Face Blog877 days agoTutorial
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
使用投機解碼（Speculative Decoding）將 Whisper 推論速度提升 2 倍★ 75
Hugging Face Blog907 days agoTutorial
The Hugging Face official blog introduces how to use "Speculative Decoding" to more than double the inference speed of OpenAI's Whisper speech-to-text model…
使用 🤗 Transformers 微調 Whisper 進行多語言語音辨識 (ASR)★ 80
Hugging Face Blog1,319 days agoTutorial
OpenAI's Whisper is a powerful automatic speech recognition (ASR) model. While its zero-shot capabilities are impressive, there remains significant room for…
在 🤗 Transformers 中使用 Wav2Vec2 處理超長音檔的自動語音辨識 (ASR)
Hugging Face Blog1,594 days agoTutorial
In the field of automatic speech recognition (ASR), Wav2Vec2 is a revolutionary model, but it faces a significant challenge when processing long audio files…
使用 🤗 Transformers 在 Hugging Face 中微調 Wav2Vec2 進行英文語音辨識 (ASR)★ 70
Hugging Face Blog1,920 days agoTutorial
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…