Hugging Face 推出 IDEFICS:開源重現 SOTA 多模態視覺語言模型 Flamingo
Original: Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model
Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal…
Hugging Face 發表開源多模態視覺語言模型 IDEFICS,旨在重現 DeepMind 閉源模型 Flamingo 的強大功能。該模型基於 LLaMA 與 OpenCLIP 構建,提供 9B 與 80B 兩種參數版本,能同時處理交錯的文本與圖片輸入。IDEFICS 的開源為社群提供了強大的多模態研究基礎,並同步釋出了大規模數據集 OBELICS。
Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal vision-language model (VLM). The model is an open-source reproduction of DeepMind's well-known closed-source model Flamingo, and delivers advanced vision-language integration capabilities comparable to Flamingo's.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Hugging Face Blog →Related
Summaries are AI-generated; the original article is authoritative.