Hugging Face BlogMay 14, 2024, 12:00 AMimportant 80

Google 推出 PaliGemma：結合 SigLIP 與 Gemma 的開源視覺語言模型

Original: PaliGemma – Google's Cutting-Edge Open Vision Language Model

Google has officially launched PaliGemma, a powerful yet lightweight open-source Vision-Language Model (VLM). The release of PaliGemma…

Google 發表全新開源視覺語言模型 PaliGemma，結合了 SigLIP 視覺編碼器與 Gemma-2B 語言模型。PaliGemma 具備強大的圖像描述、視覺問答（VQA）、物件偵測與 OCR 能力，並提供多種解析度版本。該模型已深度整合至 Hugging Face 生態系，非常適合開發者進行特定下游任務的微調。

Google has officially launched PaliGemma, a powerful yet lightweight open-source Vision-Language Model (VLM). The release of PaliGemma represents a significant milestone in the field of open-source multimodal models. It combines two of Google's top technologies — the SigLIP vision encoder and the Gemma language model — with the goal of providing developers and researchers with a highly fine-tunable, high-performance foundation model.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source other #vlm #multimodal #computer-vision #fine-tuning #siglip

Summaries are AI-generated; the original article is authoritative.