As the demand for computational efficiency in deep learning models continues to grow, writing custom CUDA kernels (GPU core programs) has become a key…
Hugging Face recently announced a major update for AMD GPU users and developers, aimed at simplifying the process of building, packaging, and sharing ROCm…
As the architecture and scale of deep learning models (such as large language models, or LLMs) continue to expand, standard PyTorch operators sometimes fall…
As AMD Instinct MI300 series GPUs (such as the MI300X) gradually increase their market share in the AI compute market, how to perform low-level optimization…
The Hugging Face official blog published a "Get Started with Hugging Face Kernel Hub in 5 Minutes" tutorial, formally introducing this new platform to the…