Latest in AI

Showing:model-compressionClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

An Implementation of NanoQuant: A Flexible Binary Quantization Method
r/LocalLLaMA top day6 days agoNew Tool
A r/LocalLLaMA post presents an unofficial PyTorch implementation of NanoQuant, a 2026 post-training quantization method for dense transformers. The method factorizes weights into scaling vectors and binary matrices, then quantizes and fine-tunes blocks sequentially to reduce hardware requirements. Early Qwen3-0.6B and Qwen3-4B experiments are promising for base models, but instruct quality remains weak and highly dependent on calibration data.
Cohere Announces Strategic MOUs with Indra Group and Multiverse Computing
Cohere Blog6 days agoBusiness
Cohere has signed strategic Memorandums of Understanding (MOUs) with Spanish multinational tech giant Indra Group and quantum software leader Multiverse Computing. The collaborations aim to accelerate enterprise AI adoption in Europe, combining Cohere's LLMs with Indra's digital transformation expertise and Multiverse's quantum-inspired model optimization capabilities.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)9 days agoRelease
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.
Hugging Face 推出 Quanto：適用於 Optimum 的全新 PyTorch 量化後端★ 75
Hugging Face Blog818 days agoRelease
Hugging Face has officially introduced Quanto, a brand-new quantization library designed for PyTorch, which has been integrated as a backend into the Hugging…