使用自投機解碼(Self-Speculative Decoding)加速文本生成:Meta 推出 LayerSkip 技術★ 78
Hugging Face Blog·571 days ago·Release
The slow autoregressive generation speed of large language models (LLMs) has long been a major bottleneck in real-world deployment. While "speculative…