As large language models (LLMs) such as Llama 2 become more widely adopted, achieving efficient and cost-effective inference in production environments has…
Mistral 7B is a milestone open-source large language model (LLM) released by the Mistral AI team in the autumn of 2023. Despite having only 7 billion…
As the parameter count of large language models (LLMs) has grown dramatically, running and fine-tuning these models on consumer-grade GPUs or limited hardware…
The Technology Innovation Institute (TII) in Abu Dhabi, UAE has officially released what is currently the largest openly accessible large language model on…
Meta has officially launched Code Llama, a family of state-of-the-art open-source code generation models fine-tuned on Llama 2. Code Llama achieves leading…
Replicate announced that its API now officially supports streaming output for language models (LLMs). This update addresses one of the most common pain points…
Meta's Llama 2 represents a landmark milestone in the history of open-source large language model (LLM) development. Its performance was regarded at the time…
Meta officially launched the highly anticipated open-source large language model Llama 2 on July 18, 2023, immediately triggering a tsunami of cascading…
Meta and Microsoft jointly announced Llama 2, a new generation of open-source large language models. Compared to the original Llama, Llama 2 increases training…
This official Hugging Face blog post systematically maps out the complete ecosystem it has built around open-source large language models (LLMs). As…
This official Hugging Face blog post introduces how to use their hosted service "Inference Endpoints" to deploy large language models (LLMs). With the rapid…
The Falcon series of large language models (including Falcon-40B and Falcon-7B), developed by Abu Dhabi's Technology Innovation Institute (TII), have…
This official Hugging Face blog post introduces a deep integration with the `bitsandbytes` library, formally adding 4-bit quantization support to…
This article introduces the latest outcome of a collaboration between Hugging Face and Intel: "Q8-Chat," a project designed to demonstrate how to efficiently…
This blog post from Hugging Face provides a detailed walkthrough of how to deploy and run an open-source ChatGPT-like chatbot on a single AMD GPU using AMD's…
Large language models (LLMs) typically generate text using an "autoregressive" mechanism, meaning the model must generate one token at a time. Each generation…
Hugging Face has announced the launch of StarChat Alpha, a conversational AI assistant designed specifically for programming. The model is based on StarCoder…
The BigCode community project, led jointly by Hugging Face and ServiceNow, has officially released StarCoder (along with its base version, StarCoderBase) — a…
The spring of 2023 was a golden era for open-source large language model (LLM) development. In April 2023, Replicate — the well-known AI model hosting platform…
Replicate, the well-known AI model hosting platform, has announced official support for large language models (LLMs) on its platform. Previously, Replicate was…
This article presents the results of a collaboration between Hugging Face and the Intel Habana team, focusing on how to leverage Intel's Habana Gaudi2 deep…
Open-source AI community leader Hugging Face and cloud computing giant Amazon Web Services (AWS) have announced an expanded partnership aimed at making…
As the parameter scale of large language models (LLMs) continues to grow, full fine-tuning has become prohibitively expensive and impractical. To lower the…
### A New Dimension of Game Storytelling: AI-Powered Dynamic Story Generation In traditional game development, writing rich, branching narratives is an…
Amid the generative AI wave sparked by ChatGPT, Hugging Face published this in-depth article exploring how to transform "base language models" — which can only…
Hugging Face Inference Endpoints is a fully managed service designed for developers and enterprises, built to solve the pain points of deploying machine…
In late 2022, as massive language models like BLOOM and OPT emerged one after another, the AI community faced a core pain point: how to effectively and…
As the parameter counts of large language models (LLMs) grow exponentially, how to load and run these models on limited hardware has become a major pain point…
BLOOM is a massive open-source multilingual model with 176 billion parameters. Running BLOOM at FP16 precision requires at least 352 GB of video memory (VRAM)…
In July 2022, Hugging Face and the BigScience collaborative community officially released BLOOM (BigScience Large Open-science Open-access Multilingual…