A r/LocalLLaMA post notes that Unsloth’s Gemma 4 QAT MTP assistant models are now available in GGUF format. The root directories include q8_0 files named mtp-gemma-4-*.gguf, while MTP folders contain q8_0 and larger quantized variants. The listed releases cover 12B, 26B-A4B, 31B, E2B, E2B mobile, E4B, and E4B mobile it-qat-GGUF repositories.
Luce Spark is an open-source MoE offload system for running 33B-35B A3B models on 16GB-class GPUs. It keeps frequently routed experts on GPU, stores the long tail in system RAM, and swaps cold experts through a bounded async cache. The author reports 13.3 GiB for Qwen3.6 35B-A3B and about 100 tok/s with Spark optimizations, but notes real 16GB GPU testing is still missing.