Gemma-4-26B-A4B QAT Variant Performs Poorly in llama.cpp Compared to Non-QAT Version
Original: QAT variant of Gemma4 26B A4B is not working well for me
Users report that the QAT (Quantization-Aware Training) variant of Gemma-4-26B-A4B performs worse than the standard version in llama.cpp.
A LocalLLaMA user highlighted that the newly released QAT (Quantization-Aware Training) variant of Google's Gemma-4-26B-A4B model underperforms compared to its non-QAT predecessor. Testing via llama.cpp on a chessboard SVG generation task showed significant rendering errors in the QAT version. The non-QAT GGUF version, however, produced highly accurate results under identical settings.
This discussion from the Reddit LocalLLaMA community points out that the QAT (Quantization-Aware Training) version of Google Gemma-4-26B-A4B seems to perform worse than expected in actual deployment and inference, even worse than the traditional non-QAT (Post-Training Quantization, PTQ) version.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on r/LocalLLaMA top day →Summaries are AI-generated; the original article is authoritative.