NVIDIA has officially launched a new lightweight multimodal model, "Nemotron 3 Nano Omni." This model is designed to deliver powerful multimodal intelligence…
Google DeepMind has announced a major upgrade to its Gemini audio models, aimed at delivering a more natural, fluid, and low-latency voice interaction…
Google DeepMind has officially released a preview of its new open model "Gemma 3n." This is a cutting-edge open model purpose-built for mobile devices and…
Hugging Face recently announced a brand-new, ultra-fast optimized deployment solution for OpenAI's open-source speech recognition model Whisper on its hosted…
As multimodal large language models (such as GPT-4o, Gemini, and various open-source audio models) continue to proliferate, AI's ability to process audio has…
This technical blog post from Hugging Face provides a detailed walkthrough of how to use the `transformers` library to fine-tune Meta's open-source W2V2-BERT…
AI cloud deployment and runtime platform Replicate has announced official support for fine-tuning Meta's open-source music generation model MusicGen. This new…
With the rapid growth of voice AI (such as Whisper), efficiently handling audio datasets has become critically important. This guide from the official Hugging…
Hugging Face announced new official Audio and Vision documentation guides for its core open-source library `datasets`. As multimodal AI models continue to…
This is a landmark technical tutorial published by the Hugging Face team in 2021, detailing how to fine-tune Meta AI's Wav2Vec2 model using the Hugging Face…