Latest in AI

Showing:quantizationResearchersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

微調 LLM 至 1.58-bit：讓極限模型量化變得簡單★ 85
Hugging Face Blog634 days agoTutorial
The deployment of large language models (LLMs) has long faced a dual bottleneck of VRAM capacity and memory bandwidth. Microsoft previously introduced the…
GGML 基礎入門介紹：讓大語言模型在消費級硬體上高效運行的關鍵技術★ 80
Hugging Face Blog670 days agoTutorial
GGML is a lightweight, zero-dependency C/C++ tensor library developed by Georgi Gerganov. It was originally designed to enable efficient local inference of the…
使用 Quanto 與 Diffusers 打造記憶體高效的 Diffusion Transformers (DiT)★ 80
Hugging Face Blog684 days agoRelease
### Background and Challenges As generative AI technology evolves, image and video generation models are increasingly transitioning from traditional UNet…
Meta 推出 Llama 3.1：405B、70B 與 8B 旗艦開源模型，支援多語言與 128K 超長上下文★ 95
Hugging Face Blog691 days agoRelease
Meta's Llama 3.1 represents a major milestone in the open-source AI landscape. The most notable model is the 405B (405 billion parameter) version — the first…
WWDC 24：使用 Core ML 在 Apple 裝置上運行 Mistral 7B 模型★ 75
Hugging Face Blog692 days agoTutorial
Following Apple's major Core ML updates announced at WWDC 24, Hugging Face published a practical guide detailing how to convert the popular open-source large…
解鎖更長的文本生成：深入探討 Key-Value (KV) 快取量化技術★ 80
Hugging Face Blog759 days agoTutorial
During the inference process of large language models (LLMs), the self-attention mechanism needs to store the Key and Value vectors of historical tokens (i.e…
Hugging Face 推出二進位與純量嵌入向量量化技術：大幅提升檢索速度並降低成本★ 85
Hugging Face Blog814 days agoTutorial
As RAG (Retrieval-Augmented Generation) and semantic search have become widespread, the maintenance costs of vector databases — especially RAM overhead — have…
筆電上的聊天機器人：在 Intel Meteor Lake 上運行 Phi-2★ 70
Hugging Face Blog816 days agoTutorial
This technical blog post from Hugging Face details how to locally deploy and run Microsoft's lightweight Phi-2 language model (2.7 billion parameters) on a…
Hugging Face 推出 Quanto：適用於 Optimum 的全新 PyTorch 量化後端★ 75
Hugging Face Blog818 days agoRelease
Hugging Face has officially introduced Quanto, a brand-new quantization library designed for PyTorch, which has been integrated as a backend into the Hugging…
使用 🤗 Optimum Intel 在 Xeon 處理器上加速 StarCoder：Q8/Q4 量化與投機解碼
Hugging Face Blog866 days agoTutorial
This Hugging Face blog post explores in detail how to use the `Optimum Intel` library to accelerate inference for the StarCoder code-generation model on Intel…
Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80
Hugging Face Blog922 days agoRelease
Hugging Face announced the launch of a new open-source library called "Optimum-NVIDIA," the result of a deep collaboration with NVIDIA, aimed at seamlessly…
在生產環境中優化你的大語言模型 (LLM) — Hugging Face 實戰指南★ 85
Hugging Face Blog1,003 days agoTutorial
This technical guide from Hugging Face systematically introduces the core strategies for deploying and optimizing large language models (LLMs) in production…
Hugging Face Transformers 原生支援量化方案全解析：bitsandbytes 與 GPTQ 實戰指南★ 75
Hugging Face Blog1,006 days agoTutorial
As the parameter count of large language models (LLMs) has grown dramatically, running and fine-tuning these models on consumer-grade GPUs or limited hardware…
使用 AutoGPTQ 與 transformers 讓大型語言模型更輕量化★ 85
Hugging Face Blog1,026 days agoRelease
This Hugging Face official blog post introduces a major update that integrates AutoGPTQ into the `transformers` and `optimum` libraries. GPTQ (Generalized…
邁向加密大語言模型：利用全同態加密（FHE）實現隱私保護推論★ 75
Hugging Face Blog1,047 days agoTutorial
This blog post, co-authored by Hugging Face and Zama — a cryptography company specializing in Fully Homomorphic Encryption (FHE) — explores how to address a…
Stable Diffusion XL 登陸 Mac：利用先進 Core ML 量化技術實現高效本地運行★ 72
Hugging Face Blog1,053 days agoRelease
Since the release of Stable Diffusion XL (SDXL), its exceptional image generation quality has attracted widespread attention. However, its massive 1.3 billion…
在 iPhone、iPad 和 Mac 上使用 Core ML 實現更快的 Stable Diffusion★ 75
Hugging Face Blog1,095 days agoTutorial
In the era of rapidly advancing generative AI, deploying large deep learning models to users' personal devices (edge devices) has long been a major challenge…
使用 NNCF 與 🤗 Optimum 在 Intel CPU 上優化 Stable Diffusion
Hugging Face Blog1,116 days agoTutorial
In the current boom of generative AI, image generation models like Stable Diffusion have become widely popular thanks to their remarkable capabilities…
Hugging Face 整合 bitsandbytes、4-bit 量化與 QLoRA，讓大型語言模型更親民★ 90
Hugging Face Blog1,117 days agoRelease
This official Hugging Face blog post introduces a deep integration with the `bitsandbytes` library, formally adding 4-bit quantization support to…
越小越好：Q8-Chat，在 Intel Xeon 處理器上實現高效的生成式 AI 體驗
Hugging Face Blog1,125 days agoRelease
This article introduces the latest outcome of a collaboration between Hugging Face and Intel: "Q8-Chat," a project designed to demonstrate how to efficiently…
在免費版 Google Colab 上使用 🧨 diffusers 運行 DeepFloyd IF 模型
Hugging Face Blog1,145 days agoTutorial
### Core Background and Challenges DeepFloyd IF is an advanced text-to-image model released by DeepFloyd, a research lab under Stability AI. Unlike the…
在 Intel CPU 上加速 Stable Diffusion 推論
Hugging Face Blog1,174 days agoTutorial
This technical blog post from Hugging Face provides a detailed guide on optimizing and accelerating Stable Diffusion model inference on Intel CPUs…
在 24GB 消費級 GPU 上使用 RLHF 微調 20B 大型語言模型★ 85
Hugging Face Blog1,193 days agoRelease
This technical blog post from Hugging Face introduces how to combine TRL (Transformer Reinforcement Learning) and PEFT (Parameter-Efficient Fine-Tuning)…
使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）
Hugging Face Blog1,224 days agoTutorial
This article is the second installment of a Hugging Face series on accelerating PyTorch Transformer models on Intel's 4th-generation Xeon Scalable Processors…
加速 Document AI：Hugging Face 提升多模態文件理解模型的推論效率★ 70
Hugging Face Blog1,301 days agoTutorial
"Document AI" is a key driver of enterprise digital transformation in recent years, aimed at automating the processing of unstructured documents such as…
使用 🤗 Optimum Intel 與 OpenVINO 加速你的 Hugging Face 模型
Hugging Face Blog1,320 days agoNew Tool
As Transformer models become increasingly prevalent in natural language processing (NLP) and computer vision (CV), efficiently deploying these large models in…
優化故事：BLOOM 超大模型推理優化實踐
Hugging Face Blog1,341 days agoTutorial
This technical blog post from Hugging Face documents in detail the practical process of optimizing inference for BLOOM, the open-source multilingual large…
輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化★ 80
Hugging Face Blog1,397 days agoRelease
This article introduces the deep integration between Hugging Face and the bitsandbytes library, aimed at solving the enormous memory challenges posed by…
使用 Optimum 與 Transformers Pipelines 加速模型推論★ 75
Hugging Face Blog1,496 days agoRelease
When deploying Transformer models in production, reducing inference latency and increasing throughput while keeping computational costs under control has…
案例研究：使用 Hugging Face Infinity 與現代 CPU 實現毫秒級延遲
Hugging Face Blog1,613 days agoNew Tool
This case study focuses on the performance of "Hugging Face Infinity" — Hugging Face's high-performance inference container solution — on modern CPUs…

← PreviousPage 2Next →

Latest in AI

微調 LLM 至 1.58-bit：讓極限模型量化變得簡單★ 85

GGML 基礎入門介紹：讓大語言模型在消費級硬體上高效運行的關鍵技術★ 80

使用 Quanto 與 Diffusers 打造記憶體高效的 Diffusion Transformers (DiT)★ 80

Meta 推出 Llama 3.1：405B、70B 與 8B 旗艦開源模型，支援多語言與 128K 超長上下文★ 95

WWDC 24：使用 Core ML 在 Apple 裝置上運行 Mistral 7B 模型★ 75

解鎖更長的文本生成：深入探討 Key-Value (KV) 快取量化技術★ 80

Hugging Face 推出二進位與純量嵌入向量量化技術：大幅提升檢索速度並降低成本★ 85

筆電上的聊天機器人：在 Intel Meteor Lake 上運行 Phi-2★ 70

Hugging Face 推出 Quanto：適用於 Optimum 的全新 PyTorch 量化後端★ 75

使用 🤗 Optimum Intel 在 Xeon 處理器上加速 StarCoder：Q8/Q4 量化與投機解碼

Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80

在生產環境中優化你的大語言模型 (LLM) — Hugging Face 實戰指南★ 85

Hugging Face Transformers 原生支援量化方案全解析：bitsandbytes 與 GPTQ 實戰指南★ 75

使用 AutoGPTQ 與 transformers 讓大型語言模型更輕量化★ 85

邁向加密大語言模型：利用全同態加密（FHE）實現隱私保護推論★ 75

Stable Diffusion XL 登陸 Mac：利用先進 Core ML 量化技術實現高效本地運行★ 72

在 iPhone、iPad 和 Mac 上使用 Core ML 實現更快的 Stable Diffusion★ 75

使用 NNCF 與 🤗 Optimum 在 Intel CPU 上優化 Stable Diffusion

Hugging Face 整合 bitsandbytes、4-bit 量化與 QLoRA，讓大型語言模型更親民★ 90

越小越好：Q8-Chat，在 Intel Xeon 處理器上實現高效的生成式 AI 體驗

在免費版 Google Colab 上使用 🧨 diffusers 運行 DeepFloyd IF 模型

在 Intel CPU 上加速 Stable Diffusion 推論

在 24GB 消費級 GPU 上使用 RLHF 微調 20B 大型語言模型★ 85

使用 Intel Sapphire Rapids 加速 PyTorch Transformer 模型推論（第二部分）

加速 Document AI：Hugging Face 提升多模態文件理解模型的推論效率★ 70

使用 🤗 Optimum Intel 與 OpenVINO 加速你的 Hugging Face 模型

優化故事：BLOOM 超大模型推理優化實踐

輕鬆上手 8-bit 矩陣乘法：使用 Transformers、Accelerate 與 bitsandbytes 實現超大規模 Transformer 模型量化★ 80

使用 Optimum 與 Transformers Pipelines 加速模型推論★ 75

案例研究：使用 Hugging Face Infinity 與現代 CPU 實現毫秒級延遲