Hugging Face BlogDec 5, 2024, 12:00 AMimportant 80

Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型

Original: Welcome PaliGemma 2 – New vision language models by Google

Google and Hugging Face have jointly announced the release of a new generation of open-weight vision-language model (VLM) — PaliGemma 2…

Google 發表全新一代輕量級視覺語言模型 PaliGemma 2，基於 SigLIP 視覺編碼器與 Gemma 2 文本解碼器。本次釋出包含 3B、10B 與 28B 三種參數規模，並提供多種輸入解析度（最高達 896x896）。PaliGemma 2 在圖像描述、視覺問答、目標檢測與文件理解等任務上表現優異，且已全面整合至 Hugging Face 生態系，支援快速微調與部署。

Google and Hugging Face have jointly announced the release of a new generation of open-weight vision-language model (VLM) — PaliGemma 2. This model continues the design philosophy of its predecessor PaliGemma, combining the powerful vision encoder SigLIP with Google's latest lightweight language model Gemma 2, with the goal of providing the community with more efficient and precise multimodal understanding capabilities.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

gemini open-source transformers #vlm #multimodal #computer-vision #open-weights #gemma-2

Summaries are AI-generated; the original article is authoritative.