A popular Reddit thread on r/LocalLLaMA discusses the potential of 2-bit Quantization Aware Training (QAT) for large MoE models (120B to 400B). While current QAT efforts focus on 4-bit, users speculate whether a 2-bit QAT model could fit into consumer hardware (64GB/128GB RAM) and outperform a 4-bit model of half its size. This approach is proposed as a practical alternative to training ternary (1.58-bit) LLMs from scratch.
According to AINews, the AI research team Thinking Machines (affectionately nicknamed "Team Thinky" by the community) has recently unveiled a new native…
This blog post from Hugging Face reviews the full year of technical evolution since the "DeepSeek Moment" at the start of 2025 — the release of DeepSeek-V3 and…
The DeepSeek-V3 and R1 models released in January 2025 have been hailed as the "DeepSeek Moment" in the AI world. This upheaval not only shattered the myth…
The Technology Innovation Institute (TII) of Abu Dhabi has officially launched the new Falcon 3 open-source model family on Hugging Face. This marks a major…
Looking back on 2023, the most notable trend in the AI landscape was the explosive growth of open-source large language models (Open LLMs). In this annual…
French AI startup Mistral AI officially released its highly anticipated open-source Mixture of Experts (MoE) model — Mixtral 8x7B. The model caused a sensation…