Simon Willison's WeblogMay 20, 2026, 5:57 PM

每秒 10 個 Token 到底有多快?網頁工具「tokenspeed」直觀模擬 LLM 輸出速度

Original: How fast is 10 tokens per second really?

In the current generative AI landscape, a model's output speed — typically measured in tokens per second (t/s) — is one of the key factors…

開發者 Mike Veerman 製作了一個名為「tokenspeed」的 HTML 模擬工具,能呈現大語言模型(LLM)在每秒 5 到 800 個 Token 之間的生成速度。當各大廠商宣稱其模型達到特定 Token 速度時,使用者常難以想像其實際體感。此工具能幫助開發者與設計師直觀評估不同速度下的使用者體驗與 UI 設計。

In the current generative AI landscape, a model's output speed — typically measured in tokens per second (t/s) — is one of the key factors determining user experience (UX). Yet when a new model or inference solution claims to deliver "30 tokens per second" or "100 tokens per second," it can be hard to intuitively picture just how fast that actually is from the numbers alone.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Simon Willison's Weblog →

Summaries are AI-generated; the original article is authoritative.