Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)·9 days ago·Release
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.