Latest in AI

Showing:inference-speedResearchersClear ×

🔥 Trending today

anthropic4 open-source3 amazon3 ai-regulation2 government-policy2 export-controls2 geopolitics2 privacy2 python-packaging2 webassembly2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Google Quietly Releases a Faster Model in Mythos’ Shadow
量子位 QbitAI3 days agoRelease
The provided QbitAI title indicates that Google released a model quietly while attention was focused on Mythos. The only concrete performance claim available is that speed increased by 4x, but the model name, task scope, benchmark method, and availability are not provided. Based on the title alone, this appears to be a model-release item relevant to developers and AI practitioners tracking latency and throughput improvements.
DiffusionGemma: 4x Faster Text Generation
r/LocalLLaMA top day4 days agoRelease
Google has announced DiffusionGemma, a text-generation model that applies diffusion-based techniques to the Gemma architecture, claiming speeds four times faster than standard autoregressive generation. Unlike conventional language models that predict tokens one at a time, diffusion-based methods generate text through iterative denoising, enabling parallel output. The release, published on Google's official blog, drew immediate attention from the local-LLM community for its potential inference-efficiency gains.
[3090] Gemma4 QAT + MTP quick TPS numbers
r/LocalLLaMA top day6 days agoBenchmark
A r/LocalLLaMA user shared quick throughput numbers for Gemma4 QAT with MTP speculative decoding on an RTX 3090 24GB setup. They report roughly 1.2-1.8x TPS improvement, with Gemma 4 31B moving from about 40 tok/s to 70-80 tok/s. The author frames this as a rough benchmark, using 11 task categories and noting stochastic variation from temp 1.0.