A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
The title says Mistral AI’s Voxtral can transcribe “at the speed of sound,” suggesting a focus on fast speech-to-text. No article body is available, so details such as benchmarks, languages, pricing, API access, or release status cannot be confirmed. The item is most relevant to developers and researchers tracking Mistral’s work in speech and transcription models.
Abridge is an AI-native startup focused on the healthcare sector. Its core product uses "Ambient Clinical Intelligence" technology to record clinical…
Prominent AI scholar and commentator Nathan Lambert, in his latest edition of Latest Open Artifacts (#20), has compiled the major recent developments in the…
With the proliferation of GPT-4o, Gemini Live, and various end-to-end voice models, Voice Agents have become an important frontier in AI applications. However…
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
Hugging Face recently announced a brand-new, ultra-fast optimized deployment solution for OpenAI's open-source speech recognition model Whisper on its hosted…
Replicate has published its technical newsletter, Replicate Intelligence #4, summarizing recent major developments in the AI field as well as the latest…
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
The Hugging Face official blog introduces how to use "Speculative Decoding" to more than double the inference speed of OpenAI's Whisper speech-to-text model…
OpenAI's Whisper is a powerful automatic speech recognition (ASR) model. While its zero-shot capabilities are impressive, there remains significant room for…
In the field of automatic speech recognition (ASR), Wav2Vec2 is a revolutionary model, but it faces a significant challenge when processing long audio files…
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…