迎來 aMUSEd：高效的輕量級 Text-to-Image 文本生成圖像模型

Original: Welcome aMUSEd: Efficient Text-to-Image Generation

The Hugging Face official blog formally introduced a brand-new open-source text-to-image model called "aMUSEd." This model is based on a…

Hugging Face 發表了名為 aMUSEd 的開源文字生成圖片模型，基於 Google 的 MUSE 架構。與主流的擴散模型（Diffusion Models）不同，aMUSEd 採用遮罩圖像建模（MIM）技術，僅需 12 個步驟即可生成圖像。其參數規模僅約 8 億，非常適合在消費級硬體上進行快速推理與微調，並支援圖生圖與局部重繪。

The Hugging Face official blog formally introduced a brand-new open-source text-to-image model called "aMUSEd." This model is based on a reproduction and optimization of the MUSE architecture previously proposed by Google. Unlike the mainstream diffusion models such as Stable Diffusion, aMUSEd employs Masked Image Modeling (MIM) technology. This non-autoregressive architecture allows it to generate images without going through dozens of denoising steps as diffusion models do — instead, it can predict complete image features in as few as 12 inference steps, achieving extremely fast generation speeds.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.