Latest in AI

Showing:code-qualityResearchersClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

FrontierCode: Benchmarking for Code Quality over Slop
Latent Space5 days agoBenchmark
Latent Space briefly announced FrontierCode with the line “We made a thing!” From the title, FrontierCode appears to be a benchmark for frontier coding systems that prioritizes code quality rather than sheer code generation volume. The provided excerpt does not include methodology, model results, datasets, or tooling details, so conclusions should remain cautious.
Introducing FrontierCode★ 78
Hacker News (AI keywords)5 days agoBenchmark
Cognition launched FrontierCode, a coding benchmark focused on mergeability rather than only functional correctness. It evaluates correctness, tests, scope discipline, style, and repository-specific quality standards. Built with open-source maintainers and extensive quality control, it shows current frontier models still struggle: Claude Opus 4.8 scores 13.4% on the hardest Diamond subset, ahead of GPT-5.5 and Gemini 3.1 Pro.
Coders are refusing to work without AI — and that could come back to bite them
TechCrunch AI15 days agoCommentary
TechCrunch reports that developers have become so attached to AI coding tools that METR struggled to repeat a no-AI control study. Earlier research found developers felt more productive with AI, while measured task completion could be slower due to debugging, steering, and waiting. The article warns that token usage and code volume are weak productivity proxies if AI-generated code creates more bugs, review work, and long-term maintenance costs.
Show HN: AISlop, a CLI for catching AI generated code smells
Hacker News (AI keywords)16 days agoNew Tool
AISlop appeared on Hacker News as a Show HN project. From the title, it is a command-line tool focused on catching code smells associated with AI-generated code. Without the original article or documentation content, its exact rules, supported languages, accuracy, and workflow integrations cannot be confirmed, but it is relevant to developers using AI coding tools.