Hugging Face 推出 Idefics2:強大的 8B 開源視覺語言模型
Original: Introducing Idefics2: A Powerful 8B Vision-Language Model for the community
Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B)…
Hugging Face 正式發布 Idefics2,這是一款擁有 80 億參數的開源視覺語言模型(VLM)。它基於 Mistral-7B 與 SigLIP 構建,顯著提升了 OCR、圖表理解及多圖對話能力。Idefics2 支援原生解析度與長寬比,並以 Apache 2.0 授權釋出,極適合開發者進行微調與商業部署。
Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B) parameters, this model aims to provide the community with a multimodal solution that strikes a balance between size and performance. Idefics2's architecture is built on the Mistral-7B-v0.1 language model and the SigLIP-SO400M-14 vision encoder, fused through a new projection layer (Projector). Compared to its predecessor Idefics1, Idefics2 represents a dramatic leap forward in parameter efficiency, visual question answering, document understanding (OCR), and multi-image reasoning.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Hugging Face Blog →Summaries are AI-generated; the original article is authoritative.