Latest in AI

Showing:QwenClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Qwen3.7-Plus launches as a multimodal agent base for recreating desktop software
量子位 QbitAI6 days agoRelease
QbitAI’s headline says Qwen3.7-Plus has launched and positions it as a new foundation for multimodal agents. The highlighted capability is one-click recreation of professional desktop software, suggesting UI understanding and app-generation workflows. Since no article body is available, technical details, availability, benchmarks, licensing, and real-world reliability cannot be verified from the provided source.
Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75
r/LocalLLaMA top day6 days agoBenchmark
A Reddit user shared benchmark results showing Google's Gemma 4 31B (FP8) performing on par with Claude Sonnet 4.6 Medium. The custom evaluation harness tested complex tasks including Neo4j Cypher queries, entity extraction, agentic tool calling, Python coding, and multi-vector retrieval synthesis. This highlights how quantized mid-sized open-source models are closing the gap with leading proprietary frontier models.
club-3090 Adds Experimental FP8 Support for Qwen3.6-27B
r/LocalLLaMA top day6 days agoNew Tool
The open-source project club-3090 has rolled out experimental FP8 quantization support for Qwen3.6-27B. This update is highly anticipated by dual RTX 3090 users, allowing them to run the model with significantly reduced VRAM requirements. According to reports, the official Qwen3.6-27B-FP8 model performs virtually identically to the original unquantized BF16 version.
Qwen 3.6 27B DeepSWE Benchmark Results Highlight Gap Between Local and Closed-Source Models
r/LocalLLaMA top day6 days agoBenchmark
A community benchmark of Qwen 3.6 27B on DeepSWE yielded a score of 1.79% (18/20th place), slightly outperforming Haiku 4.5. Run on a single RTX 6000 Blackwell GPU via vLLM with reasoning enabled, the test averaged 32 minutes and 44k output tokens per task. The author notes that while Qwen 3.6 27B represents a 'poor man's local SOTA,' the massive gap compared to frontier closed models suggests local LLMs are struggling to keep pace in complex coding.
Qwen3.6 35B-A3B on a Laptop: A Local LLM "Zero to One" Milestone
r/LocalLLaMA top day7 days agoOpinion
A Reddit user detailed running Qwen3.6 35B-A3B (IQ3_XXS quantization) on an ASUS Zenbook Pro 14 (RTX 4060 8GB VRAM, 64GB RAM). Using llama.cpp, they achieved 27 TPS at 32k context and 18 TPS at 256k context. This setup serves as a highly capable, fully private local agent for file operations, CLI execution, and brainstorming, bypassing cloud privacy concerns.
Qwen 3.6 27B KV Cache Quantization Benchmarks: KVarN, Turbo, and TCQ Evaluated
r/LocalLLaMA top day7 days agoBenchmark
Reddit user Anbeeld shared comprehensive KV cache quantization benchmarks for Qwen 3.6 27B across 75 configuration pairs. Using BeeLlama.cpp (a custom llama.cpp fork), the test evaluates q8, q6, q5, and q4 quantization levels. It specifically highlights advanced implementations like KVarN, TurboQuant, and TCQ to optimize long-context inference efficiency.
LLM Research Papers: The 2026 List (January to May)
Ahead of AI (Raschka)8 days agoPaper
Sebastian Raschka compiles a curated reference list of LLM papers he bookmarked from January through May 2026. The list is not comprehensive, but organized around topics useful for future articles, lectures, code examples, and research work. Public sections emphasize reasoning, RL, efficient inference, long context, agent systems, tool use, coding agents, diffusion language models, and serving infrastructure.
Did Claude Increase Bugs in rsync?
Hacker News (AI keywords)9 days agoBenchmark
The article analyzes rsync releases to test whether versions containing Claude commits had unusually high bug rates. It uses severity-weighted bugs per 10 commits, exact permutation testing, and Fisher's exact test. With only two Claude-exposed releases, the evidence is limited, but both releases appear within normal historical variation rather than clear negative outliers.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)9 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
Fine-tuning an LLM to write docs like it's 1995
Hacker News (AI keywords)9 days agoTutorial
The author builds a corpus from old Microsoft manuals, cleans OCR text, generates instruction-style JSONL examples, and fine-tunes Llama 3.1 8B and Qwen 2.5 7B with QLoRA. Tests cover malloc(), a fictional Win32 API, and a deliberately anachronistic REST API prompt. Qwen fine-tunes transfer the period documentation style best, but the experiment also shows hallucination risks, tuning complexity, and why these models augment rather than replace technical writers.
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Hacker News (AI keywords)10 days agoBenchmark
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.
How LLMs Actually Work
Hacker News (AI keywords)10 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78
Latent Space11 days agoRelease
Microsoft used Build to present itself as both an AI platform and a first-party model lab, announcing seven MAI models across reasoning, code, image, transcription, and voice. The standout was MAI-Thinking-1, described as a 35B active MoE with 256K context and clean data lineage. The recap also ties the launches to GitHub Copilot, Windows agent runtime ambitions, Web IQ grounding APIs, Foundry distribution, and MAIA 200 hardware.
Qwen 3.7 Plus now available on AI Gateway
Vercel Changelog13 days agoRelease
Vercel announced that Qwen 3.7 Plus is now available through AI Gateway. The provided source contains only the headline, so supported features, pricing, limits, and performance details cannot be confirmed. Developers using Vercel AI Gateway can consider adding the model to their evaluation list and verify its documented API capabilities before adoption.
CAPTCHAs can still detect AI agents★ 72
Hacker News (AI keywords)16 days agoPaper
Roundtable argues that CAPTCHA image recognition is largely solved, but process-level behavior still separates humans from AI agents. Their CogCAPTCHA30 benchmark combines CAPTCHA with cognitive psychology tasks to test not only outputs, but how answers are produced. Results suggest frontier models like Claude, GPT, and Gemini are not necessarily more humanlike than smaller or cognition-trained models.
LLMs believe false statements even after explicit warnings that they're false★ 74
Ars Technica AI16 days agoPaper
A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.
Reachy Mini goes fully local
Hugging Face Blog18 days agoHardware
Hugging Face published a tutorial for running Reachy Mini conversations without cloud audio processing or API keys. The setup uses its speech-to-speech library as a cascaded VAD, STT, LLM, and TTS pipeline exposed through a Realtime API-compatible WebSocket. Recommended defaults include llama.cpp with Gemma 4, Silero VAD, Parakeet-TDT, and Qwen3-TTS, while allowing swaps to vLLM, MLX, Transformers, or hosted Responses API providers.
Some ideas for what comes next, May 2026
Interconnects (Nathan L.)19 days agoCommentary
Nathan Lambert argues that 2026 AI progress is becoming higher-stakes, with model capabilities, work patterns, economics, and real-world risks all escalating. He says open models still lack a true Claude Code and Opus 4.5-style agent moment, and Gemini has no clear competitor to Claude Code or Codex yet. The essay also tracks Mythos, American open-model momentum, frontier-lab competition, and mounting intervention from governments and other power structures.

← PreviousPage 2

Latest in AI

Qwen3.7-Plus launches as a multimodal agent base for recreating desktop software

Gemma 4 31B FP8 Matches Claude Sonnet 4.6 Medium in Custom Benchmark★ 75

club-3090 Adds Experimental FP8 Support for Qwen3.6-27B

Qwen 3.6 27B DeepSWE Benchmark Results Highlight Gap Between Local and Closed-Source Models

Qwen3.6 35B-A3B on a Laptop: A Local LLM "Zero to One" Milestone

Qwen 3.6 27B KV Cache Quantization Benchmarks: KVarN, Turbo, and TCQ Evaluated

LLM Research Papers: The 2026 List (January to May)

Did Claude Increase Bugs in rsync?

Arithmetic Without Numbers: How LLMs Do Math

Fine-tuning an LLM to write docs like it's 1995

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

How LLMs Actually Work

Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78

Qwen 3.7 Plus now available on AI Gateway

CAPTCHAs can still detect AI agents★ 72

LLMs believe false statements even after explicit warnings that they're false★ 74

Reachy Mini goes fully local

Some ideas for what comes next, May 2026