Latest in AI

Showing:inferenceClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

非同步機器人推論：解耦動作預測與執行★ 75
Hugging Face Blog339 days agoOpinion
In the fields of robot learning and embodied AI, enabling controllers based on deep learning or large language/vision models (VLAs) to run in real time has…
SGLang 整合 Hugging Face Transformers 後端：大幅提升模型相容性與開發彈性★ 75
Hugging Face Blog356 days agoRelease
SGLang (Structured Generation Language) is a high-performance LLM inference and serving framework developed by the LMSYS team, renowned for its efficient…
Groq 正式加入 Hugging Face 推理提供商（Inference Providers）支援極速開源模型推理★ 75
Hugging Face Blog363 days agoRelease
Hugging Face announced a deep partnership with Groq, a chip company focused on ultra-fast AI inference, formally bringing Groq into the Hugging Face "Inference…
Featherless AI 正式加入 Hugging Face 推理供應商（Inference Providers）★ 75
Hugging Face Blog367 days agoRelease
Hugging Face officially announced a partnership with Featherless AI, a serverless GPU inference platform, integrating it into the Hugging Face Inference…
在 Hugging Face 上用 Replicate 運行超過 30,000 個 LoRA 模型★ 75
Replicate Blog395 days agoNew Tool
The AI-managed inference platform Replicate has announced a deep partnership with Hugging Face, the giant of the open-source AI community, officially bringing…
Hugging Face 推出極速 Whisper 語音轉文字 Inference Endpoints 部署方案★ 75
Hugging Face Blog397 days agoNew Tool
Hugging Face recently announced a brand-new, ultra-fast optimized deployment solution for OpenAI's open-source speech recognition model Whisper on its hosted…
併發請求下的 Prefill 與 Decode：優化 LLM 推論效能的關鍵技術★ 82
Hugging Face Blog424 days agoTutorial
When deploying large language models (LLMs), maintaining low latency and high throughput under high concurrency (concurrent requests) is one of the greatest…
效率化請求佇列：優化 LLM 推論效能的關鍵策略★ 75
Hugging Face Blog438 days agoTutorial
### The Unique Challenges and Memory Bottlenecks of LLM Inference Traditional web services primarily handle concurrent requests through multi-threading or…
在 Intel Gaudi 上使用 TGI 加速大型語言模型（LLM）推理★ 75
Hugging Face Blog443 days agoRelease
Hugging Face's official blog has announced that its widely adopted open-source large model inference framework, Text Generation Inference (TGI), now officially…
Hugging Face Inference Endpoints 推出全新分析儀表板，全面提升模型監控與 MLOps 體驗
Hugging Face Blog450 days agoRelease
Hugging Face recently announced a major upgrade to its hosted model deployment service, "Inference Endpoints," introducing a brand-new and far more modern…
Groq、fal 與 DeepInfra 正式加入 Vercel Marketplace★ 75
Vercel Changelog453 days agoRelease
Vercel has officially announced that three prominent AI infrastructure service providers — Groq, fal, and DeepInfra — have formally joined the Vercel…
Hugging Face 推出三家全新無伺服器推論服務商：Hyperbolic、Nebius AI Studio 與 Novita AI★ 75
Hugging Face Blog481 days agoRelease
On February 18, 2025, Hugging Face announced the addition of three new partners to its serverless inference ecosystem: Hyperbolic, Nebius AI Studio, and Novita…
歡迎 Fireworks.ai 加入 Hugging Face Hub 🎆★ 75
Hugging Face Blog485 days agoRelease
On February 14, 2025, Hugging Face — the leading open-source AI community — officially announced the integration of high-performance AI inference platform…
10 億次分類的啟示：Hugging Face 分享如何用開源模型極速且超低成本完成大規模分類任務★ 80
Hugging Face Blog486 days agoTutorial
In the current era of generative AI sweeping the globe, many developers habitually feed all tasks — including simple text classification, sentiment analysis…
如何在 AWS 上部署與微調 DeepSeek 模型：Hugging Face 官方指南★ 85
Hugging Face Blog500 days agoTutorial
As DeepSeek-R1 swept through the AI landscape on the strength of its powerful reasoning capabilities, how to safely and efficiently deploy and fine-tune these…
Hugging Face Hub 推出「Inference Providers」：一鍵切換多個第三方高效能推理服務商★ 85
Hugging Face Blog502 days agoRelease
Hugging Face has officially launched the "Inference Providers" feature on the Hugging Face Hub — a major update designed to address the pain points developers…
Hugging Face TGI 宣布支援多後端引擎：整合 TensorRT-LLM 與 vLLM★ 85
Hugging Face Blog514 days agoRelease
Text Generation Inference (TGI), Hugging Face's open-source LLM inference and deployment framework, has received a major architectural update, officially…
Replicate 正式支援 NVIDIA L40S GPU：性能更佳、成本更低
Replicate Blog576 days agoNew Tool
The AI deployment platform Replicate has announced the official availability of NVIDIA L40S GPU compute on its platform. This update aims to provide developers…
微調 LLM 至 1.58-bit：讓極限模型量化變得簡單★ 85
Hugging Face Blog634 days agoTutorial
The deployment of large language models (LLMs) has long faced a dual bottleneck of VRAM capacity and memory bandwidth. Microsoft previously introduced the…
GGML 基礎入門介紹：讓大語言模型在消費級硬體上高效運行的關鍵技術★ 80
Hugging Face Blog670 days agoTutorial
GGML is a lightweight, zero-dependency C/C++ tensor library developed by Georgi Gerganov. It was originally designed to enable efficient local inference of the…
Hugging Face 聯手 NVIDIA NIM 推出無伺服器推論服務 (Serverless Inference)★ 82
Hugging Face Blog685 days agoRelease
Hugging Face and NVIDIA announced a major partnership in late July 2024, officially launching a serverless inference service powered by NVIDIA NIM (NVIDIA…
TGI Multi-LoRA：部署一次即可同時提供 30 個微調模型服務★ 80
Hugging Face Blog696 days agoRelease
The Hugging Face official blog has introduced a major update to its open-source text generation inference engine, Text Generation Inference (TGI): the…
Google Cloud TPU 正式登陸 Hugging Face，支援 Inference Endpoints 與 Spaces★ 75
Hugging Face Blog705 days agoRelease
Hugging Face announced a deep partnership with Google Cloud, officially integrating Google Cloud TPUs (Tensor Processing Units) into the Hugging Face platform…
NVIDIA H100 GPU 即將登陸 Replicate：支援更快速的模型推理與訓練
Replicate Blog732 days agoRelease
The official blog of Replicate, the popular AI model hosting and deployment platform, has announced that NVIDIA H100 Tensor Core GPUs will soon be officially…
評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75
Hugging Face Blog746 days agoTutorial
This official Hugging Face blog post takes an in-depth look at how to benchmark Text Generation Inference (TGI), Hugging Face's open-source LLM inference and…
在 Hugging Face 上輕鬆將模型部署至 AWS Inferentia2 晶片★ 75
Hugging Face Blog753 days agoRelease
Hugging Face has announced official support for AWS Inferentia2 (Inf2) instances within its hosted Inference Endpoints service. This update gives developers…
使用 Intel Gaudi 2 與 Intel Xeon 建構高性價比的企業級 RAG 應用★ 70
Hugging Face Blog766 days agoTutorial
As enterprise demand for Retrieval-Augmented Generation (RAG) technology surges, how to maintain high performance while controlling hardware costs has become…
在 Hugging Face Endpoints 上運行隱私保護的全同態加密 (FHE) 推理★ 75
Hugging Face Blog789 days agoRelease
This article introduces how to run privacy-preserving inference based on Fully Homomorphic Encryption (FHE) on Hugging Face Endpoints. In traditional…
告別冷啟動：Hugging Face 如何將 LoRA 推論速度提升 300%★ 85
Hugging Face Blog922 days agoRelease
In real-world generative AI applications, fine-tuning for specific tasks or clients is a common requirement. However, deploying a full base model for every…
Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80
Hugging Face Blog922 days agoRelease
Hugging Face announced the launch of a new open-source library called "Optimum-NVIDIA," the result of a deep collaboration with NVIDIA, aimed at seamlessly…

← PreviousPage 2Next →

Latest in AI

非同步機器人推論：解耦動作預測與執行★ 75

SGLang 整合 Hugging Face Transformers 後端：大幅提升模型相容性與開發彈性★ 75

Groq 正式加入 Hugging Face 推理提供商（Inference Providers）支援極速開源模型推理★ 75

Featherless AI 正式加入 Hugging Face 推理供應商（Inference Providers）★ 75

在 Hugging Face 上用 Replicate 運行超過 30,000 個 LoRA 模型★ 75

Hugging Face 推出極速 Whisper 語音轉文字 Inference Endpoints 部署方案★ 75

併發請求下的 Prefill 與 Decode：優化 LLM 推論效能的關鍵技術★ 82

效率化請求佇列：優化 LLM 推論效能的關鍵策略★ 75

在 Intel Gaudi 上使用 TGI 加速大型語言模型（LLM）推理★ 75

Hugging Face Inference Endpoints 推出全新分析儀表板，全面提升模型監控與 MLOps 體驗

Groq、fal 與 DeepInfra 正式加入 Vercel Marketplace★ 75

Hugging Face 推出三家全新無伺服器推論服務商：Hyperbolic、Nebius AI Studio 與 Novita AI★ 75

歡迎 Fireworks.ai 加入 Hugging Face Hub 🎆★ 75

10 億次分類的啟示：Hugging Face 分享如何用開源模型極速且超低成本完成大規模分類任務★ 80

如何在 AWS 上部署與微調 DeepSeek 模型：Hugging Face 官方指南★ 85

Hugging Face Hub 推出「Inference Providers」：一鍵切換多個第三方高效能推理服務商★ 85

Hugging Face TGI 宣布支援多後端引擎：整合 TensorRT-LLM 與 vLLM★ 85

Replicate 正式支援 NVIDIA L40S GPU：性能更佳、成本更低

微調 LLM 至 1.58-bit：讓極限模型量化變得簡單★ 85

GGML 基礎入門介紹：讓大語言模型在消費級硬體上高效運行的關鍵技術★ 80

Hugging Face 聯手 NVIDIA NIM 推出無伺服器推論服務 (Serverless Inference)★ 82

TGI Multi-LoRA：部署一次即可同時提供 30 個微調模型服務★ 80

Google Cloud TPU 正式登陸 Hugging Face，支援 Inference Endpoints 與 Spaces★ 75

NVIDIA H100 GPU 即將登陸 Replicate：支援更快速的模型推理與訓練

評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75

在 Hugging Face 上輕鬆將模型部署至 AWS Inferentia2 晶片★ 75

使用 Intel Gaudi 2 與 Intel Xeon 建構高性價比的企業級 RAG 應用★ 70

在 Hugging Face Endpoints 上運行隱私保護的全同態加密 (FHE) 推理★ 75

告別冷啟動：Hugging Face 如何將 LoRA 推論速度提升 300%★ 85

Optimum-NVIDIA：只需一行程式碼，即可解鎖極速 LLM 推理★ 80