Latest in AI

Showing:speech-to-textDevelopersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Offline CPU Voice Loop for Ollama and LM Studio Agents
r/LocalLLaMA top day3 days agoNew Tool
A r/LocalLLaMA post introduces an offline voice loop for talking to local models through Ollama, LM Studio, or vLLM. The stack uses Silero VAD, Parakeet TDT 0.6B v3 STT, and Supertonic TTS 3, all running on CPU so GPU memory stays available for the LLM. The author reports measured CPU-only benchmarks, agent integrations, cross-platform installers, and an MIT-licensed GitHub release.
Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72
r/LocalLLaMA top day5 days agoRelease
Omi Health’s founder says he fine-tuned NVIDIA Parakeet TDT 0.6B v2 for clinical speech and released Omi Med STT v1 under CC-BY-4.0. The runtime supports Mac, Windows, and Linux, auto-selecting MLX, NeMo, or GGUF/parakeet.cpp backends. In the author’s held-out medical benchmark, it reports 2.37% medical-WER and 145× realtime on local A10 compute.
Research: Voxtral transcribes at the speed of sound
Mistral AI News6 days agoPaper
The title says Mistral AI’s Voxtral can transcribe “at the speed of sound,” suggesting a focus on fast speech-to-text. No article body is available, so details such as benchmarks, languages, pricing, API access, or release status cannot be confirmed. The item is most relevant to developers and researchers tracking Mistral’s work in speech and transcription models.
Introducing Scribe v2 Realtime★ 72
ElevenLabs Blog6 days agoRelease
ElevenLabs introduced Scribe v2 Realtime, a low-latency speech-to-text model built for live transcription, voice agents, meeting assistants, and real-time captions. The company says it transcribes in under 150 ms across several major languages and supports 90 languages. Key features include automatic language detection, VAD, manual commit, text conditioning, multiple audio formats, API access, ElevenLabs Agents integration, and enterprise compliance options.
Introducing Scribe v2
ElevenLabs Blog6 days agoRelease
ElevenLabs published a blog post titled “Introducing Scribe v2.” With no source text provided, the only confirmed information is that it introduces Scribe v2. It likely concerns an updated transcription or speech-to-text product, but features, accuracy claims, pricing, API access, language support, and rollout details cannot be verified from the title alone.
ElevenAPI
ElevenLabs Blog6 days agoNew Tool
ElevenAPI is a developer category on the ElevenLabs blog rather than a single detailed article. It collects updates and tutorials around speech, music, conversational agents, API keys, web components, and integrations. Listed posts mention Lovable, ElevenLabs UI, Music API, Claude 3.7 Sonnet, Gemini 2.0 Flash, DeepSeek R1, Voice Isolator API, timestamped TTS endpoints, and Speech-to-Speech API.
AI 原生醫療革命：每年節省醫生 10-20 小時、數分鐘內完成預先授權——專訪 Abridge 團隊★ 80
Latent Space30 days agoCommentary
Abridge is an AI-native startup focused on the healthcare sector. Its core product uses "Ambient Clinical Intelligence" technology to record clinical…
開源 AI 資源週報 (#20)：全新組織與模型類型登場！涵蓋 Nemotron Super、Sarvam、Cohere Transcribe 等最新進展
Interconnects (Nathan L.)76 days agoRelease
Prominent AI scholar and commentator Nathan Lambert, in his latest edition of Latest Open Artifacts (#20), has compiled the major recent developments in the…
EVA：ServiceNow AI 推出全新語音 Agent 評估框架★ 75
Hugging Face Blog82 days agoRelease
With the proliferation of GPT-4o, Gemini Live, and various end-to-end voice models, Voice Agents have become an important frontier in AI applications. However…
Nova 2 Lite 現已在 Vercel AI Gateway 上線支援
Vercel Changelog193 days agoRelease
Vercel has released an update announcing that its AI Gateway service now officially supports the Nova 2 Lite model. Vercel AI Gateway is an AI middleware layer…
Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75
Hugging Face Blog205 days agoRelease
Hugging Face recently made a major upgrade to its flagship "Open ASR Leaderboard," officially launching two brand-new evaluation tracks: "Multilingual" and…
Hugging Face 推出極速 Whisper 語音轉文字 Inference Endpoints 部署方案★ 75
Hugging Face Blog397 days agoNew Tool
Hugging Face recently announced a brand-new, ultra-fast optimized deployment solution for OpenAI's open-source speech recognition model Whisper on its hosted…
Replicate Intelligence #4：探索 GPT 模型中的概念、瀏覽器即時語音轉文字與 H100 GPU 即將上線
Replicate Blog730 days agoRelease
Replicate has published its technical newsletter, Replicate Intelligence #4, summarizing recent major developments in the AI field as well as the latest…
使用 🤗 Transformers 微調 W2V2-BERT 以進行低資源語音辨識 (ASR)★ 75
Hugging Face Blog877 days agoTutorial
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
使用投機解碼（Speculative Decoding）將 Whisper 推論速度提升 2 倍★ 75
Hugging Face Blog907 days agoTutorial
The Hugging Face official blog introduces how to use "Speculative Decoding" to more than double the inference speed of OpenAI's Whisper speech-to-text model…
在 Unity 中實現 AI 語音辨識：利用 Hugging Face API 輕鬆整合 Whisper 模型
Hugging Face Blog1,108 days agoTutorial
This official Hugging Face blog post details how to quickly implement AI speech recognition (Automatic Speech Recognition, ASR) functionality in the Unity game…
使用 🤗 Transformers 微調 Whisper 進行多語言語音辨識 (ASR)★ 80
Hugging Face Blog1,319 days agoTutorial
OpenAI's Whisper is a powerful automatic speech recognition (ASR) model. While its zero-shot capabilities are impressive, there remains significant room for…
在 🤗 Transformers 中使用 Wav2Vec2 處理超長音檔的自動語音辨識 (ASR)
Hugging Face Blog1,594 days agoTutorial
In the field of automatic speech recognition (ASR), Wav2Vec2 is a revolutionary model, but it faces a significant challenge when processing long audio files…
使用 🤗 Transformers 在 Hugging Face 中微調 Wav2Vec2 進行英文語音辨識 (ASR)★ 70
Hugging Face Blog1,920 days agoTutorial
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…

Latest in AI

Offline CPU Voice Loop for Ollama and LM Studio Agents

Omi Med STT v1: Open-Weight Medical ASR Fine-Tuned from Parakeet 0.6B★ 72

Research: Voxtral transcribes at the speed of sound

Introducing Scribe v2 Realtime★ 72

Introducing Scribe v2

ElevenAPI

AI 原生醫療革命：每年節省醫生 10-20 小時、數分鐘內完成預先授權——專訪 Abridge 團隊★ 80

開源 AI 資源週報 (#20)：全新組織與模型類型登場！涵蓋 Nemotron Super、Sarvam、Cohere Transcribe 等最新進展

EVA：ServiceNow AI 推出全新語音 Agent 評估框架★ 75

Nova 2 Lite 現已在 Vercel AI Gateway 上線支援

Hugging Face 推出 Open ASR Leaderboard 新賽道：聚焦多語言與長音訊語音辨識趨勢★ 75

Hugging Face 推出極速 Whisper 語音轉文字 Inference Endpoints 部署方案★ 75

Replicate Intelligence #4：探索 GPT 模型中的概念、瀏覽器即時語音轉文字與 H100 GPU 即將上線

使用 🤗 Transformers 微調 W2V2-BERT 以進行低資源語音辨識 (ASR)★ 75

使用投機解碼（Speculative Decoding）將 Whisper 推論速度提升 2 倍★ 75

在 Unity 中實現 AI 語音辨識：利用 Hugging Face API 輕鬆整合 Whisper 模型

使用 🤗 Transformers 微調 Whisper 進行多語言語音辨識 (ASR)★ 80

在 🤗 Transformers 中使用 Wav2Vec2 處理超長音檔的自動語音辨識 (ASR)

使用 🤗 Transformers 在 Hugging Face 中微調 Wav2Vec2 進行英文語音辨識 (ASR)★ 70