Latest in AI

Showing:latencyClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

DiffusionGemma: 4x Faster Text Generation★ 76
Hacker News (AI keywords)4 days agoRelease
Google released DiffusionGemma, a 26B MoE experimental open model using text diffusion instead of token-by-token autoregressive decoding. It can generate blocks of text in parallel, reaching up to 4x faster output on dedicated GPUs. The model targets local, speed-sensitive workflows, but Google says its output quality is below standard Gemma 4 and recommends Gemma 4 for quality-critical production use.
Real-Time LLM Inference on Standard GPUs at 3k Tokens/s per Request
Hacker News (AI keywords)16 days agoBenchmark
The post’s title indicates a performance claim for real-time LLM inference on standard GPUs, reporting 3,000 tokens per second per request. No article body is available, so the underlying model, GPU type, batch size, latency profile, precision, serving stack, and benchmark method are not stated. The item is best treated as an inference-performance benchmark claim rather than a verified deployment guide.
Vercel AI Gateway 新增功能：可依成本、延遲或吞吐量自動排序 Provider★ 75
Vercel Changelog30 days agoRelease
Vercel recently announced in its Changelog that its AI Gateway service has launched an important update: developers can now automatically sort and route…
Vercel AI Gateway 現已支援 Opus 4.7 的「快速模式」(Fast Mode)★ 75
Vercel Changelog33 days agoRelease
Vercel announced in its official Changelog that its AI Gateway service now officially supports "Fast Mode" for the Opus 4.7 model. Vercel AI Gateway is an API…
Vercel AI Gateway 正式支援 Claude 4.6 Opus Fast Mode (快速模式)★ 75
Vercel Changelog68 days agoRelease
Vercel published an update announcing that its AI Gateway service now officially supports "Fast Mode" for Anthropic's latest flagship model, Claude 4.6 Opus…
Vercel AI Gateway 支援自訂提供商級別逾時設定，實現更快的自動容錯移轉★ 70
Vercel Changelog101 days agoRelease
Vercel has introduced a new feature for its AI Gateway product that allows developers to configure custom provider-level timeout settings. This update is…
Vercel Workflow 效能大升級：執行速度提升兩倍
Vercel Changelog103 days agoRelease
Vercel published an update on March 3, 2026, announcing that the execution speed of its "Vercel Workflow" service has been successfully improved to twice its…
Vercel 推出加拿大蒙特婁新區域 (yul1)
Vercel Changelog145 days agoRelease
On January 20, 2026, Vercel published an update officially launching a new deployment region located in Montréal, Canada, with the region code `yul1`. This…
Vercel 實戰經驗：為什麼我們刪除了 AI Agent 80% 的工具？★ 85
Vercel Changelog174 days agoOpinion
When building AI applications, developers often fall into the trap of "more tools equals a smarter Agent." In early versions of Vercel's AI assistants and…
Vercel Blob 現已支援所有 Vercel 區域
Vercel Changelog339 days agoRelease
Vercel has officially announced that Vercel Blob, its object storage solution designed for web developers, is now available across all of Vercel's service…
Vercel 推出全新杜拜區域 (dxb1)，優化中東地區連線延遲
Vercel Changelog363 days agoRelease
Vercel has announced in its official update log the launch of a new Dubai region, with the region code `dxb1`. This infrastructure expansion is aimed at…
Beyond Menu 如何結合 Hypertune 與 Vercel 實現超低延遲且不影響轉換率的功能發布
Vercel Changelog682 days agoBusiness
In modern web development, feature flags and A/B testing are core tools for product iteration. However, traditional solutions often require additional network…
評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75
Hugging Face Blog746 days agoTutorial
This official Hugging Face blog post takes an in-depth look at how to benchmark Text Generation Inference (TGI), Hugging Face's open-source LLM inference and…
Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75
Hugging Face Blog772 days agoNew Tool
Hugging Face has announced a partnership with the independent AI performance analytics firm Artificial Analysis, officially integrating its "LLM Performance…
每個前端開發者都該知道的延遲數據 (Latency Numbers)★ 75
Vercel Changelog782 days agoTutorial
Drawing inspiration from the classic computer science reference "Latency Numbers Every Programmer Should Know," Vercel has compiled a dedicated latency guide…
使用 ONNX Runtime 加速超過 130,000 個 Hugging Face 模型★ 75
Hugging Face Blog984 days agoNew Tool
Hugging Face officially announced a deep collaboration with Microsoft to integrate ONNX Runtime (ORT) into the Hugging Face ecosystem. This partnership enables…
Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲
Hugging Face Blog1,017 days agoBusiness
This case study examines how Fetch, a leading consumer rewards platform in the United States, leveraged the collaboration between Amazon SageMaker and Hugging…
Replicate API 支援語言模型串流輸出，大幅提升應用程式響應速度★ 70
Replicate Blog1,035 days agoRelease
Replicate announced that its API now officially supports streaming output for language models (LLMs). This update addresses one of the most common pain points…
Hugging Face 推出 Assisted Generation：邁向低延遲文本生成的新方向★ 85
Hugging Face Blog1,130 days agoRelease
Large language models (LLMs) typically generate text using an "autoregressive" mechanism, meaning the model must generate one token at a time. Each generation…
Vercel Edge Functions 助力 Read.cv 實現全球極低延遲的個人檔案交付
Vercel Changelog1,248 days agoRelease
In modern web development, balancing "dynamic content" with "ultra-fast loading" has always been a significant challenge. Read.cv, the well-known platform for…
Vercel 推出 Edge Config：在邊緣實現極低延遲的數據讀取
Vercel Changelog1,299 days agoRelease
Vercel officially launched "Edge Config," a new feature designed to solve the latency problem of reading configuration data in edge computing. In modern web…
Vercel 推出區域執行功能（Regional Execution），實現極低延遲的邊緣渲染
Vercel Changelog1,333 days agoRelease
Vercel has officially announced a new feature called "Regional Execution," aimed at solving the latency issues that arise between edge rendering and database…
案例研究：使用 Hugging Face Infinity 與現代 CPU 實現毫秒級延遲
Hugging Face Blog1,613 days agoNew Tool
This case study focuses on the performance of "Hugging Face Infinity" — Hugging Face's high-performance inference container solution — on modern CPUs…

Latest in AI

DiffusionGemma: 4x Faster Text Generation★ 76

Real-Time LLM Inference on Standard GPUs at 3k Tokens/s per Request

Vercel AI Gateway 新增功能：可依成本、延遲或吞吐量自動排序 Provider★ 75

Vercel AI Gateway 現已支援 Opus 4.7 的「快速模式」(Fast Mode)★ 75

Vercel AI Gateway 正式支援 Claude 4.6 Opus Fast Mode (快速模式)★ 75

Vercel AI Gateway 支援自訂提供商級別逾時設定，實現更快的自動容錯移轉★ 70

Vercel Workflow 效能大升級：執行速度提升兩倍

Vercel 推出加拿大蒙特婁新區域 (yul1)

Vercel 實戰經驗：為什麼我們刪除了 AI Agent 80% 的工具？★ 85

Vercel Blob 現已支援所有 Vercel 區域

Vercel 推出全新杜拜區域 (dxb1)，優化中東地區連線延遲

Beyond Menu 如何結合 Hypertune 與 Vercel 實現超低延遲且不影響轉換率的功能發布

評測 Text Generation Inference (TGI)：如何量化與優化大語言模型推理性能★ 75

Hugging Face 聯手 Artificial Analysis 推出 LLM 效能與成本排行榜★ 75

每個前端開發者都該知道的延遲數據 (Latency Numbers)★ 75

使用 ONNX Runtime 加速超過 130,000 個 Hugging Face 模型★ 75

Fetch 採用 Amazon SageMaker 與 Hugging Face，成功降低 50% 機器學習處理延遲

Replicate API 支援語言模型串流輸出，大幅提升應用程式響應速度★ 70

Hugging Face 推出 Assisted Generation：邁向低延遲文本生成的新方向★ 85

Vercel Edge Functions 助力 Read.cv 實現全球極低延遲的個人檔案交付

Vercel 推出 Edge Config：在邊緣實現極低延遲的數據讀取

Vercel 推出區域執行功能（Regional Execution），實現極低延遲的邊緣渲染

案例研究：使用 Hugging Face Infinity 與現代 CPU 實現毫秒級延遲