Latest in AI

Showing:LlamaClear ×

🔥 Trending today

anthropic7 export-controls5 model-access3 ai-infrastructure3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day4 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super
r/LocalLLaMA top day4 days agoBenchmark
A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.
Furiosa AI inference chip could be a game changer for local LLMs
r/LocalLLaMA top day4 days agoHardware
A r/LocalLLaMA post discusses Furiosa AI’s RNGD inference chip, citing TSMC 5nm, Hynix HBM3, 48GB VRAM, 1.5TB/s bandwidth, and 180W TDP. The author argues it could matter for local LLM users if Furiosa opens its programming interface and works with llama.cpp on a GGML backend. The post later clarifies Furiosa is not selling to consumers; this is a wish and market commentary, not a launch.
A llama.cpp CLI Command Builder
r/LocalLLaMA top day5 days agoNew Tool
A r/LocalLLaMA post introduces a llama.cpp CLI Command Builder with no accounts, email, pop-ups, cookies, or ads. It stores information locally in the browser and includes editable fields for flags and arguments found in the documentation. Users can build CLI or server commands, log run information, and compare which configurations work best for their hardware; only Linux is currently supported.
Arguing with an AI bot posting outdated Llama 3.1 takes
r/LocalLLaMA top day5 days agoCommentary
A r/LocalLLaMA post jokes about arguing with an AI bot that posted outdated commentary involving Llama 3.1. The author says such bots should enable web search instead of relying on stale knowledge. The post also mocks exaggerated model testimonial posts, using Qwen3.6 27B as a sarcastic example, making it more of a community quality complaint than technical news.
When every other post is an AI benchmark, best-model question, or slop app
r/LocalLLaMA top day5 days agoCommentary
This r/LocalLLaMA post is a meme-like complaint about the subreddit’s recent content quality. The author points to repeated AI-generated benchmark reports, recurring “best model” questions, and hastily built apps or engines presented as groundbreaking. It is not a technical release or evidence-based analysis, but it reflects frustration with noise, hype, and low-effort AI-generated discussion in local model communities.
Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76
Import AI (Jack Clark)6 days agoCommentary
Import AI 460 covers SocioHack, a benchmark where RL-trained LLMs discover loopholes in institutional rule systems. It also discusses Anthropic evidence for a practical form of recursive self-improvement, reflected in sharply increased code merged during 2026. Other sections examine multi-agent RL drones outperforming a champion human pilot, plus research showing state-controlled media can shape LLM responses in local languages.
Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL
r/LocalLLaMA top day6 days agoCommentary
An analysis of Gemma 4 QAT GGUF files reveals that Google's official 'Q4_0' releases actually employ a mixed-precision strategy. For smaller models like E2B and E4B, Google keeps critical token embeddings in Q6_K and certain projection weights in F16. This makes Google's Q4_0 files larger and more precise than Unsloth's 'Q4_K_XL' versions, which default to standard Q4_0 for almost all tensors.
llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM
r/LocalLLaMA top day6 days agoCommentary
A Reddit user highlighted a limitation in llama-server's router mode (`--models-preset`): child processes spawn and initialize CUDA contexts on all available GPUs, even when pinned to a single card. When other GPUs are fully utilized by a large model, launching a smaller model fails with a CUDA OOM error because it cannot allocate the context stub on the maxed-out cards. Currently, child processes inherit the base environment, preventing per-model `CUDA_VISIBLE_DEVICES` configuration.
start-llama: A Handy CLI Launcher for llama-server with Easy Customization
r/LocalLLaMA top day7 days agoNew Tool
A developer has released 'start-llama', a command-line utility designed to simplify launching llama-server (llama.cpp). It allows users to manage sensible default configurations, support multiple server binaries, and apply per-model or command-line overrides. This tool streamlines local LLM deployment into a single, easily configurable step.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)9 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
Fine-tuning an LLM to write docs like it's 1995
Hacker News (AI keywords)9 days agoTutorial
The author builds a corpus from old Microsoft manuals, cleans OCR text, generates instruction-style JSONL examples, and fine-tunes Llama 3.1 8B and Qwen 2.5 7B with QLoRA. Tests cover malloc(), a fictional Win32 API, and a deliberately anachronistic REST API prompt. Qwen fine-tunes transfer the period documentation style best, but the experiment also shows hallucination risks, tuning complexity, and why these models augment rather than replace technical writers.
How LLMs Actually Work
Hacker News (AI keywords)10 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
New AI Infra Decacorns: Fireworks, Baseten, and OpenRouter★ 78
Latent Space18 days agoBusiness
AI infrastructure startups Fireworks and Baseten have reportedly reached massive valuations, reflecting intense investor interest in developer-focused inference and deployment platforms. OpenRouter, the popular LLM API aggregator, is also on a rapid growth trajectory. This funding wave highlights a major capital shift toward cost-effective, developer-friendly API and hosting solutions.
Reachy Mini goes fully local
Hugging Face Blog18 days agoHardware
Hugging Face published a tutorial for running Reachy Mini conversations without cloud audio processing or API keys. The setup uses its speech-to-speech library as a cascaded VAD, STT, LLM, and TTS pipeline exposed through a Realtime API-compatible WebSocket. Recommended defaults include llama.cpp with Gemma 4, Silero VAD, Parakeet-TDT, and Qwen3-TTS, while allowing swaps to vLLM, MLX, Transformers, or hosted Responses API providers.
[AINews] 微調的終結？探討 Fine-tuning 在大模型時代的未來與轉變★ 75
Latent Space32 days agoOpinion
As AI technology continues to iterate at a rapid pace, the developer community is confronting a profound rethinking of the question: "Is fine-tuning heading…
Vercel 推出 AI Gateway 生產環境指標，提升 LLM 監控與效能分析★ 70
Vercel Changelog33 days agoRelease
Vercel recently released an update to its Changelog regarding "AI Gateway production index" metrics. As enterprises and developers push an increasing number of…
蒸餾恐慌：為什麼將「知識蒸餾」稱為安全攻擊是極其糟糕的趨勢★ 75
Interconnects (Nathan L.)41 days agoOpinion
In the field of machine learning, "knowledge distillation" is a well-established technique that generally refers to using the output data generated by a…
DeepInfra 正式加入 Hugging Face 推理服務商（Inference Providers）陣容 🔥★ 72
Hugging Face Blog46 days agoRelease
Hugging Face's official blog has announced that DeepInfra — a well-known high-performance, low-cost serverless inference platform — has officially joined…
解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75
Interconnects (Nathan L.)54 days agoOpinion
In today's AI landscape, the performance gap between open-weights models (such as Meta's Llama family) and closed-source models (such as OpenAI's GPT and…
預測 2026 年年中：我對開源 AI 模型的幾點賭注與開閉源差距分析★ 75
Interconnects (Nathan L.)59 days agoOpinion
In this forward-looking article on the state of AI in mid-2026, Interconnects founder Nathan Lambert takes a deep dive into the dynamic gap between open-weight…
解放你的 OpenClaw：用開源模型打造自主 CLI 開發 Agent★ 75
Hugging Face Blog79 days agoTutorial
With the launch of agent-oriented CLI coding tools like Claude Code from Anthropic, developer demand for "collaborating with AI directly inside the terminal"…
Vercel Chat SDK 迎來 Agent 支援：輕鬆為用戶打造互動式 AI 代理體驗★ 80
Vercel Changelog87 days agoRelease
Vercel recently rolled out a major update to its AI SDK — specifically the Chat SDK — aimed at lowering the barrier for developers to build and deploy AI…
Hugging Face 開源生態報告：2026 春季版★ 85
Hugging Face Blog89 days agoCommentary
Hugging Face has published its Spring 2026 "State of Open Source AI" report, offering a comprehensive review of the explosive growth and paradigm shifts that…
開源模型的下一階段：工業化時代下的市場、能力與生態應對★ 80
Interconnects (Nathan L.)90 days agoOpinion
This article, from Nathan Lambert's well-known AI newsletter Interconnects, offers a deep examination of the critical turning point that open-source language…
Vercel 正式支援部署 LiteLLM 伺服器：一鍵託管多模型統一 API 閘道★ 75
Vercel Changelog90 days agoRelease
Vercel has officially announced support for deploying and hosting LiteLLM servers. LiteLLM is a highly popular open-source LLM proxy and API gateway tool in…
免費訓練 AI 模型！Hugging Face 聯手 Unsloth 推出 Hugging Face Jobs 免費微調服務★ 85
Hugging Face Blog114 days agoNew Tool
Hugging Face's official blog has announced exciting news for the open-source AI community: Hugging Face has formed a deep partnership with Unsloth — the…
GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95
Hugging Face Blog114 days agoBusiness
A historic milestone has arrived in the open-source AI world: GGML and llama.cpp — the open-source projects founded by Georgi Gerganov that laid the…
開源模型陷入「永久追趕」：開源與閉源的差距、蒸餾、創新週期與開源的勝算★ 80
Interconnects (Nathan L.)116 days agoOpinion
This article by Nathan Lambert takes a deep dive into the tangled competitive dynamics between open-source and closed-source AI models. Lambert argues that…
Transformers.js v4 正式上架 NPM！網頁端 WebGPU AI 迎來重大效能升級★ 85
Hugging Face Blog125 days agoRelease
Hugging Face officially published Transformers.js v4 on NPM, marking a major milestone for running local AI models within the JavaScript ecosystem…

Page 1Next →

Latest in AI

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super

Furiosa AI inference chip could be a game changer for local LLMs

A llama.cpp CLI Command Builder

Arguing with an AI bot posting outdated Llama 3.1 takes

When every other post is an AI benchmark, best-model question, or slop app

Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76

Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL

llama-server Router Mode: Pinned Model Grabs CUDA Context on All GPUs, Causing OOM

start-llama: A Handy CLI Launcher for llama-server with Easy Customization

Arithmetic Without Numbers: How LLMs Do Math

Fine-tuning an LLM to write docs like it's 1995

How LLMs Actually Work

New AI Infra Decacorns: Fireworks, Baseten, and OpenRouter★ 78

Reachy Mini goes fully local

[AINews] 微調的終結？探討 Fine-tuning 在大模型時代的未來與轉變★ 75

Vercel 推出 AI Gateway 生產環境指標，提升 LLM 監控與效能分析★ 70

蒸餾恐慌：為什麼將「知識蒸餾」稱為安全攻擊是極其糟糕的趨勢★ 75

DeepInfra 正式加入 Hugging Face 推理服務商（Inference Providers）陣容 🔥★ 72

解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75

預測 2026 年年中：我對開源 AI 模型的幾點賭注與開閉源差距分析★ 75

解放你的 OpenClaw：用開源模型打造自主 CLI 開發 Agent★ 75

Vercel Chat SDK 迎來 Agent 支援：輕鬆為用戶打造互動式 AI 代理體驗★ 80

Hugging Face 開源生態報告：2026 春季版★ 85

開源模型的下一階段：工業化時代下的市場、能力與生態應對★ 80

Vercel 正式支援部署 LiteLLM 伺服器：一鍵託管多模型統一 API 閘道★ 75

免費訓練 AI 模型！Hugging Face 聯手 Unsloth 推出 Hugging Face Jobs 免費微調服務★ 85

GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95

開源模型陷入「永久追趕」：開源與閉源的差距、蒸餾、創新週期與開源的勝算★ 80

Transformers.js v4 正式上架 NPM！網頁端 WebGPU AI 迎來重大效能升級★ 85