Google’s DiffusionGemma is an Apache 2.0 experimental open model using text diffusion instead of standard autoregressive decoding. The 26B MoE model activates 3.8B parameters during inference and is designed for low-latency local workflows. Google claims up to 4x faster generation on dedicated GPUs, while noting that output quality is below standard Gemma 4 and production-quality use cases should still prefer Gemma 4.
A r/LocalLLaMA post says a Bilibili creator has shown a single-slot, half-height PCIe V100 with NVLink on a custom PCB. The card is described as 16 cm long, passively cooled by default, capped at 75W, with another version supporting up to 300W. The 16GB model is expected around or below ¥1500, with a 32GB version reportedly planned, but it is not yet available for purchase.
Hugging Face's official blog has announced exciting news for the open-source AI community: Hugging Face has formed a deep partnership with Unsloth — the…