Google DeepMind BlogJun 9, 2026, 2:10 PMimportant 85

Google Introduces Gemma 4 12B: A Unified, Encoder-Free Multimodal Model

Original: Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google launched Gemma 4 12B, a unified, encoder-free multimodal open model that simplifies cross-modal processing.

Google DeepMind has unveiled Gemma 4 12B, a next-generation open-weights model featuring a unified, encoder-free multimodal architecture. By eliminating the traditional separate vision encoder (such as ViT), it processes diverse modalities directly within a single Transformer network. This design simplifies training, reduces inference latency, and enhances cross-modal alignment, marking a significant milestone for open-source AI.

Google DeepMind today announced the latest member of its open-source model family—Gemma 4 12B. This model, with 12 billion parameters, represents a major breakthrough in architectural design for open-source multimodal models.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Google DeepMind Blog →

open-source other #multimodal #encoder-free #gemma-4 #vision-llm #on-device

Summaries are AI-generated; the original article is authoritative.