A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.
TSMC senior vice president Cliff Hou said customers across smartphones and AI data centers are increasingly focused on improving performance without increasing power use. The comment reflects rising energy pressure as AI workloads expand. For chipmakers and infrastructure buyers, energy efficiency is becoming a central metric alongside raw computing performance.