Latest in AI

Showing:ai-benchmarkResearchersClear ×

🔥 Trending today

anthropic5 amazon3 export-controls3 national-security2 model-access2 open-source2 ai-regulation2 government-policy2 geopolitics2 privacy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

First GPT-5.6 tests arrive, targeting Mythos
量子位 QbitAI4 days agoBenchmark
The title indicates that QbitAI is covering the first hands-on tests of GPT-5.6, framed around a comparison with Mythos. Because the article body is unavailable, the testing setup, metrics, task types, and actual performance gap cannot be verified. The item is best treated as an early benchmark or model-comparison report that needs the original article for proper evaluation.
Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs
Latent Space10 days agoBenchmark
Latent Space talks with Lukas Petersson and Axel Backlund of Andon Labs, the authors behind VendingBench. The episode focuses on evaluating Claude models across a range from Haiku to Mythos. It also discusses how they build frontier evals from scratch, with an emphasis on creating benchmarks that remain useful and meaningful over time.