Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL
Original: QATs Q4_0 from Google have more precision than Q4_K_XL from Unsloth (at least some)
A Reddit user found Google's official Gemma 4 QAT Q4_0 GGUFs use mixed-precision, making them larger and more precise than Unsloth's Q4_K_XL.
An analysis of Gemma 4 QAT GGUF files reveals that Google's official 'Q4_0' releases actually employ a mixed-precision strategy. For smaller models like E2B and E4B, Google keeps critical token embeddings in Q6_K and certain projection weights in F16. This makes Google's Q4_0 files larger and more precise than Unsloth's 'Q4_K_XL' versions, which default to standard Q4_0 for almost all tensors.
Following Google's release of the Quantization-Aware Training (QAT) version of Gemma 4, GGUF versions released by both Google officially and the well-known lightweight fine-tuning team Unsloth appeared on Hugging Face. However, a user (alex20_202020) on Reddit's LocalLLaMA subreddit discovered a counterintuitive phenomenon when comparing the two: Google's official GGUF file labeled `Q4_0` was actually larger and more precise than Unsloth's version labeled `Q4_K_XL`.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on r/LocalLLaMA top day →Summaries are AI-generated; the original article is authoritative.