Hugging Face BlogFeb 19, 2025, 12:00 AMimportant 80

Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型

Original: PaliGemma 2 Mix - New Instruction Vision Language Models by Google

Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models…

Google 與 Hugging Face 合作推出了 PaliGemma 2 Mix 系列模型。這是專為指令遵循（Instruction-following）設計的輕量級視覺語言模型（VLM），結合了 SigLIP 視覺編碼器與 Gemma 2 語言解碼器。 PaliGemma 2 Mix 提供多種參數大小（包含 3B、10B 與 28B），並在多種視覺問答、圖像描述及目標檢測等任務上進行了混合微調，開箱即可展現優異的多模態理解能力。開發者可直接在 Hugging Face 上取得權重，並透過 Transformers 庫輕鬆進行部署與微調。

Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models (VLMs) now available on the Hugging Face platform. PaliGemma 2 Mix combines a powerful SigLIP vision encoder with Google's Gemma 2 language model, and is designed specifically for multimodal instruction-following tasks.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source #vlm #multimodal #computer-vision #open-weights #gemma-2

Summaries are AI-generated; the original article is authoritative.