Hugging Face 推出 Idefics2：強大的 8B 開源視覺語言模型

Original: Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B)…

Hugging Face 正式發布 Idefics2，這是一款擁有 80 億參數的開源視覺語言模型（VLM）。它基於 Mistral-7B 與 SigLIP 構建，顯著提升了 OCR、圖表理解及多圖對話能力。Idefics2 支援原生解析度與長寬比，並以 Apache 2.0 授權釋出，極適合開發者進行微調與商業部署。

Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B) parameters, this model aims to provide the community with a multimodal solution that strikes a balance between size and performance. Idefics2's architecture is built on the Mistral-7B-v0.1 language model and the SigLIP-SO400M-14 vision encoder, fused through a new projection layer (Projector). Compared to its predecessor Idefics1, Idefics2 represents a dramatic leap forward in parameter efficiency, visual question answering, document understanding (OCR), and multi-image reasoning.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.