A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.
AI infrastructure startups Fireworks and Baseten have reportedly reached massive valuations, reflecting intense investor interest in developer-focused inference and deployment platforms. OpenRouter, the popular LLM API aggregator, is also on a rapid growth trajectory. This funding wave highlights a major capital shift toward cost-effective, developer-friendly API and hosting solutions.