Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.
Paseo provides one interface for tools such as Claude Code, Codex, Copilot, OpenCode, and Pi. It runs agents through a local daemon on the user's own machine and supports desktop, mobile, web, and CLI clients. Its appeal is multi-agent orchestration and cross-device control, though real adoption depends on workflow fit, security, and reliability.