Google DeepMind has recently unveiled a new distributed AI training technique called "Decoupled DiLoCo." This technology represents a major upgrade to its…
This issue of Import AI (No. 449) dives deep into several core frontier topics in the current AI landscape, spanning technical breakthroughs and broad…
As large language models (LLMs) push the demand for long context toward the million-token scale, the VRAM of a single GPU can no longer accommodate the…
As the parameter counts of generative AI and large language models (LLMs) push into the tens and hundreds of billions, the memory of a single GPU has long been…
Hugging Face has officially released version 1.0.0 of its core open-source library, Accelerate. This is a milestone update, signifying that since the project's…
In the era of large language models (LLMs), the VRAM of a single GPU is often insufficient to hold models with tens of billions of parameters. To overcome this…
When fine-tuning massively large open-source models like Llama 2 70B — with its 70 billion parameters — developers frequently encounter a bottleneck that goes…
This technical guide from Hugging Face provides a detailed walkthrough of how to efficiently train language models by combining TensorFlow, the Hugging Face…
This case study introduces a deep technical collaboration between Databricks and Hugging Face, aimed at addressing the efficiency and cost challenges…
As privacy awareness grows and regulatory requirements tighten, training machine learning models without centralizing sensitive data has become a critical…
This classic technical blog post from Hugging Face systematically guides developers in understanding and mastering distributed training techniques within the…
As the parameter counts of large language models (LLMs) grow exponentially, how to load and run these models on limited hardware has become a major pain point…
As language model scales continue to expand, the memory (VRAM) of a single GPU has long been unable to accommodate models with tens or hundreds of billions of…
This article documents in detail how the BigScience project trained BLOOM, an open-source multilingual large language model with 176 billion parameters. This…
This official Hugging Face blog post provides a detailed walkthrough of how to combine the `Accelerate` library with Microsoft's `DeepSpeed` deep learning…
As AI model scale has grown exponentially, training large models with billions of parameters has become the norm — but this also presents enormous hardware…
While GPUs dominate deep learning training today, a collaboration between Intel and Hugging Face demonstrates that through software and hardware optimization…
Hugging Face has officially released a new open-source library called `Accelerate` — a lightweight helper library designed for PyTorch that aims to solve the…
This technical guide, published by Hugging Face in 2021, details how to use Amazon SageMaker's managed infrastructure and distributed training capabilities to…
As the parameter scale of Transformer models (such as GPT, T5, etc.) grows exponentially, deep learning faces a severe "Memory Wall" challenge. With limited…