NVIDIA has released DiffusionGemma 26B A4B IT NVFP4 on Hugging Face, a quantized version of Google DeepMind's open-weights multimodal model. Built on a Mixture-of-Experts architecture with 25.2B total but only 3.8B active parameters, it generates text in parallel 256-token blocks using discrete diffusion, exceeding 1,100 tokens per second on H100 hardware. The model supports a 256K-token context, text/image/video inputs, native function calling, reasoning mode, and 35+ languages.
In late 2022, while continuous-space diffusion models represented by Stable Diffusion were stealing the spotlight, diffusion models operating in discrete space…