Hugging Face BlogFeb 19, 2025, 12:00 AMimportant 80

Google 推出 PaliGemma 2 Mix:全新指令微調視覺語言模型

Original: PaliGemma 2 Mix - New Instruction Vision Language Models by Google

Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models…

Google 與 Hugging Face 合作推出了 PaliGemma 2 Mix 系列模型。這是專為指令遵循(Instruction-following)設計的輕量級視覺語言模型(VLM),結合了 SigLIP 視覺編碼器與 Gemma 2 語言解碼器。 PaliGemma 2 Mix 提供多種參數大小(包含 3B、10B 與 28B),並在多種視覺問答、圖像描述及目標檢測等任務上進行了混合微調,開箱即可展現優異的多模態理解能力。 開發者可直接在 Hugging Face 上取得權重,並透過 Transformers 庫輕鬆進行部署與微調。

Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models (VLMs) now available on the Hugging Face platform. PaliGemma 2 Mix combines a powerful SigLIP vision encoder with Google's Gemma 2 language model, and is designed specifically for multimodal instruction-following tasks.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.