r/LocalLLaMA top dayJun 7, 2026, 8:13 PM/u/SteppenAxolotl

Qwen 3.6 27B DeepSWE Benchmark Results Highlight Gap Between Local and Closed-Source Models

Original: Qwen 3.6 27B on DeepSWE

Qwen 3.6 27B scored 1.79% on the DeepSWE benchmark, highlighting the persistent performance gap between local open-source and closed-source models.

A community benchmark of Qwen 3.6 27B on DeepSWE yielded a score of 1.79% (18/20th place), slightly outperforming Haiku 4.5. Run on a single RTX 6000 Blackwell GPU via vLLM with reasoning enabled, the test averaged 32 minutes and 44k output tokens per task. The author notes that while Qwen 3.6 27B represents a 'poor man's local SOTA,' the massive gap compared to frontier closed models suggests local LLMs are struggling to keep pace in complex coding.

This is a popular evaluation report from the Reddit r/LocalLLaMA community. The author shared the complete data and insights from running the Qwen 3.6 27B model on DeepSWE (a benchmark that evaluates an LLM's software engineering capabilities).

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on r/LocalLLaMA top day →

qwen vllm runpod modal #swe-bench #coding #reasoning #local-llm #benchmark

Summaries are AI-generated; the original article is authoritative.