r/LocalLLaMA top dayJun 7, 2026, 5:29 PM/u/pftbest

Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version

Original: QAT variant of Gemma4 26B A4B is not working well for me

Users report that the QAT (Quantization-Aware Training) variant of Gemma-4-26B-A4B performs worse than the standard version in llama.cpp.

A LocalLLaMA user highlighted that the newly released QAT (Quantization-Aware Training) variant of Google's Gemma-4-26B-A4B model underperforms compared to its non-QAT predecessor. Testing via llama.cpp on a chessboard SVG generation task showed significant rendering errors in the QAT version. The non-QAT GGUF version, however, produced highly accurate results under identical settings.

This discussion from the Reddit LocalLLaMA community points out that the QAT (Quantization-Aware Training) version of Google Gemma-4-26B-A4B seems to perform worse than expected in actual deployment and inference, even worse than the traditional non-QAT (Post-Training Quantization, PTQ) version.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

Summaries are AI-generated; the original article is authoritative.