According to AINews, the AI research team Thinking Machines (affectionately nicknamed "Team Thinky" by the community) has recently unveiled a new native…
OpenAI has continued to expand the reach of its GPT-5 technology, officially launching three new voice and audio APIs: GPT-Realtime-2, GPT-Translate, and…
Google DeepMind has officially unveiled its latest voice model, "Gemini 3.1 Flash Live." This model is positioned to deliver lower-latency, higher-precision…
With the proliferation of GPT-4o, Gemini Live, and various end-to-end voice models, Voice Agents have become an important frontier in AI applications. However…
Vercel officially released "AI Voice Elements," a brand-new UI component library and toolkit designed for frontend developers, aimed at simplifying the…
With the rapid advancement of voice cloning technology, generating hyper-realistic synthetic voices has become remarkably easy. However, this has also…
As real-time voice interaction technologies like GPT-4o become more widespread, the open-source community is also actively developing speech-to-speech (S2S)…
This technical blog post from Hugging Face introduces how to build a powerful and efficient speech processing system using Hugging Face Inference Endpoints — a…