Hugging Face BlogAug 22, 2023, 12:00 AMimportant 78

Hugging Face 推出 IDEFICS：開源重現 SOTA 多模態視覺語言模型 Flamingo

Original: Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model

Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal…

Hugging Face 發表開源多模態視覺語言模型 IDEFICS，旨在重現 DeepMind 閉源模型 Flamingo 的強大功能。該模型基於 LLaMA 與 OpenCLIP 構建，提供 9B 與 80B 兩種參數版本，能同時處理交錯的文本與圖片輸入。IDEFICS 的開源為社群提供了強大的多模態研究基礎，並同步釋出了大規模數據集 OBELICS。

Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal vision-language model (VLM). The model is an open-source reproduction of DeepMind's well-known closed-source model Flamingo, and delivers advanced vision-language integration capabilities comparable to Flamingo's.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

llama open-source huggingface transformers #vlm #multimodal #open-source-model #computer-vision #dataset

Summaries are AI-generated; the original article is authoritative.