Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行

Original: SmolVLM - small yet mighty Vision Language Model

Hugging Face has officially launched a lightweight vision language model (VLM) called **SmolVLM**, designed to bring powerful multimodal…

Hugging Face 發表全新輕量級視覺語言模型 SmolVLM（約 2.2B 參數），專為本機與邊緣設備設計。該模型結合了 SigLIP 視覺編碼器與 SmolLM2 語言模型，不僅支援多圖輸入與影片分析，在多項基準測試中更展現出媲美更大尺寸模型的性能。SmolVLM 採 Apache 2.0 開源授權，極低記憶體佔用使其成為開發者在終端裝置部署 VLM 的理想選擇。

Hugging Face has officially launched a lightweight vision language model (VLM) called **SmolVLM**, designed to bring powerful multimodal understanding capabilities to consumer-grade hardware and mobile devices. SmolVLM has approximately 2.2 billion parameters, with SmolLM2-1.7B as its language backbone and SigLIP as the vision encoder. This compact architectural design allows it to maintain an extremely low memory footprint and high inference speed while still delivering impressive visual understanding capabilities.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.