r/LocalLLaMA top dayJun 8, 2026, 4:26 AM/u/alex20_202020

Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL

Original: QATs Q4_0 from Google have more precision than Q4_K_XL from Unsloth (at least some)

A Reddit user found Google's official Gemma 4 QAT Q4_0 GGUFs use mixed-precision, making them larger and more precise than Unsloth's Q4_K_XL.

An analysis of Gemma 4 QAT GGUF files reveals that Google's official 'Q4_0' releases actually employ a mixed-precision strategy. For smaller models like E2B and E4B, Google keeps critical token embeddings in Q6_K and certain projection weights in F16. This makes Google's Q4_0 files larger and more precise than Unsloth's 'Q4_K_XL' versions, which default to standard Q4_0 for almost all tensors.

Following Google's release of the Quantization-Aware Training (QAT) version of Gemma 4, GGUF versions released by both Google officially and the well-known lightweight fine-tuning team Unsloth appeared on Hugging Face. However, a user (alex20_202020) on Reddit's LocalLLaMA subreddit discovered a counterintuitive phenomenon when comparing the two: Google's official GGUF file labeled `Q4_0` was actually larger and more precise than Unsloth's version labeled `Q4_K_XL`.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

grok gemini llama other koboldcpp #quantization #gguf #qat #gemma-4 #local-llm

Summaries are AI-generated; the original article is authoritative.