Hugging Face BlogJun 3, 2025, 12:00 AMimportant 75

SmolVLA:基於 LeRobot 社群數據訓練的高效視覺-語言-動作(VLA)模型

Original: SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

Hugging Face has recently taken an important step in the field of embodied AI, officially launching **SmolVLA** — a lightweight…

Hugging Face 發表全新開源模型 SmolVLA,專為具身智能與機器人控制設計。該模型屬於「Smol」輕量化系列,結合視覺、語言與動作(VLA)能力,並完全採用 LeRobot 社群的真實機器人操作數據進行訓練。SmolVLA 的高效能與小體積,讓開發者能在邊緣設備上實現低延遲的機器人視覺決策與控制。

Hugging Face has recently taken an important step in the field of embodied AI, officially launching **SmolVLA** — a lightweight Vision-Language-Action (VLA) model designed specifically for robot control. Continuing the lightweight and high-performance approach of Hugging Face's "Smol" series (such as SmolLM and SmolVLM), SmolVLA aims to address the pain point of traditional VLA models being too large to run in real time on edge devices or directly on robot hardware.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

Summaries are AI-generated; the original article is authoritative.