Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.
NVIDIA has officially launched a new lightweight multimodal model, "Nemotron 3 Nano Omni." This model is designed to deliver powerful multimodal intelligence…
Google and Hugging Face have jointly announced a new generation of open-weight models — "Gemma 4." This model represents a major breakthrough in on-device AI…
Hugging Face has published its Spring 2026 "State of Open Source AI" report, offering a comprehensive review of the explosive growth and paradigm shifts that…
Google DeepMind has officially announced the addition of a highly distinctive and specialized new member to its open-source model family — Gemma 3 270M. This…
This technical article from Hugging Face introduces how to deploy a state-of-the-art (SOTA) optical character recognition (OCR) model called dots.ocr using…
Google's open-source model family welcomes a new member! The all-new Gemma 3n model series is now fully available within the Hugging Face ecosystem. Gemma 3n…
Google DeepMind has released the "Gemini Robotics On-Device" model, a significant breakthrough that brings advanced Gemini AI capabilities directly to local…
As generative AI technology becomes more widespread, AI Sound Generation has become an indispensable part of modern multimedia creation, game development, and…
Google DeepMind has officially released a preview of its new open model "Gemma 3n." This is a cutting-edge open model purpose-built for mobile devices and…
Hugging Face has introduced SmolVLM2, the latest addition to its Smol family of lightweight models. SmolVLM2 is designed to bring advanced vision-language…
Hugging Face has officially introduced the newest members of the SmolVLM family, pushing vision-language model (VLM) sizes even further down to 256M (256…
Meta has officially introduced the Llama 3.2 family of open-source models, marking a significant architectural upgrade with two major breakthroughs: multimodal…
Hugging Face has officially announced the release of a new open-source Swift package — `swift-transformers`. This tool is designed specifically for developers…
While Stable Diffusion (SD) 1.5 has demonstrated powerful image generation capabilities, its 860 million parameter count still presents challenges for edge…