Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Latent Space interviews Andon Labs on VendingBench and building durable frontier AI evals.
Latent Space talks with Lukas Petersson and Axel Backlund of Andon Labs, the authors behind VendingBench. The episode focuses on evaluating Claude models across a range from Haiku to Mythos. It also discusses how they build frontier evals from scratch, with an emphasis on creating benchmarks that remain useful and meaningful over time.
This content from Latent Space is an interview with two members of Andon Labs, Lukas Petersson and Axel Backlund, on the topic of AI model evaluation, drawing in particular on their experience as the authors of VendingBench. According to the original summary, the focus of the interview is not simply introducing a new model or product, but discussing how to design evals that can measure the capabilities of frontier models—that is, the evaluation benchmarks that are becoming increasingly important in the process of AI R&D and productization.
Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.
See Pro plans →Want the original English / full article?
Read on Latent Space →Summaries are AI-generated; the original article is authoritative.