Latest in AI

Showing:ResearchersLlamaClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support
r/LocalLLaMA top day4 days agoRelease
Lemonade v10.7 marks a project-level shift toward working-group-driven development, with 19 contributors involved in the release. The update improves LMX-Omni virtual models for Open WebUI and OpenAI-compatible multimedia clients, introduces the `lemonade bench` CLI, and expands backend support. CUDA, Vulkan, llama.cpp, stable-diffusion.cpp, FastFlowLM, and vLLM are part of the broader push toward cross-vendor local AI performance.
Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super
r/LocalLLaMA top day4 days agoBenchmark
A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.
Furiosa AI inference chip could be a game changer for local LLMs
r/LocalLLaMA top day4 days agoHardware
A r/LocalLLaMA post discusses Furiosa AI’s RNGD inference chip, citing TSMC 5nm, Hynix HBM3, 48GB VRAM, 1.5TB/s bandwidth, and 180W TDP. The author argues it could matter for local LLM users if Furiosa opens its programming interface and works with llama.cpp on a GGML backend. The post later clarifies Furiosa is not selling to consumers; this is a wish and market commentary, not a launch.
A llama.cpp CLI Command Builder
r/LocalLLaMA top day5 days agoNew Tool
A r/LocalLLaMA post introduces a llama.cpp CLI Command Builder with no accounts, email, pop-ups, cookies, or ads. It stores information locally in the browser and includes editable fields for flags and arguments found in the documentation. Users can build CLI or server commands, log run information, and compare which configurations work best for their hardware; only Linux is currently supported.
Arguing with an AI bot posting outdated Llama 3.1 takes
r/LocalLLaMA top day5 days agoCommentary
A r/LocalLLaMA post jokes about arguing with an AI bot that posted outdated commentary involving Llama 3.1. The author says such bots should enable web search instead of relying on stale knowledge. The post also mocks exaggerated model testimonial posts, using Qwen3.6 27B as a sarcastic example, making it more of a community quality complaint than technical news.
When every other post is an AI benchmark, best-model question, or slop app
r/LocalLLaMA top day6 days agoCommentary
This r/LocalLLaMA post is a meme-like complaint about the subreddit’s recent content quality. The author points to repeated AI-generated benchmark reports, recurring “best model” questions, and hastily built apps or engines presented as groundbreaking. It is not a technical release or evidence-based analysis, but it reflects frustration with noise, hype, and low-effort AI-generated discussion in local model communities.
Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76
Import AI (Jack Clark)6 days agoCommentary
Import AI 460 covers SocioHack, a benchmark where RL-trained LLMs discover loopholes in institutional rule systems. It also discusses Anthropic evidence for a practical form of recursive self-improvement, reflected in sharply increased code merged during 2026. Other sections examine multi-agent RL drones outperforming a champion human pilot, plus research showing state-controlled media can shape LLM responses in local languages.
Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL
r/LocalLLaMA top day6 days agoCommentary
An analysis of Gemma 4 QAT GGUF files reveals that Google's official 'Q4_0' releases actually employ a mixed-precision strategy. For smaller models like E2B and E4B, Google keeps critical token embeddings in Q6_K and certain projection weights in F16. This makes Google's Q4_0 files larger and more precise than Unsloth's 'Q4_K_XL' versions, which default to standard Q4_0 for almost all tensors.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)9 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
Fine-tuning an LLM to write docs like it's 1995
Hacker News (AI keywords)9 days agoTutorial
The author builds a corpus from old Microsoft manuals, cleans OCR text, generates instruction-style JSONL examples, and fine-tunes Llama 3.1 8B and Qwen 2.5 7B with QLoRA. Tests cover malloc(), a fictional Win32 API, and a deliberately anachronistic REST API prompt. Qwen fine-tunes transfer the period documentation style best, but the experiment also shows hallucination risks, tuning complexity, and why these models augment rather than replace technical writers.
How LLMs Actually Work
Hacker News (AI keywords)10 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
Reachy Mini goes fully local
Hugging Face Blog18 days agoHardware
Hugging Face published a tutorial for running Reachy Mini conversations without cloud audio processing or API keys. The setup uses its speech-to-speech library as a cascaded VAD, STT, LLM, and TTS pipeline exposed through a Realtime API-compatible WebSocket. Recommended defaults include llama.cpp with Gemma 4, Silero VAD, Parakeet-TDT, and Qwen3-TTS, while allowing swaps to vLLM, MLX, Transformers, or hosted Responses API providers.
[AINews] 微調的終結？探討 Fine-tuning 在大模型時代的未來與轉變★ 75
Latent Space32 days agoOpinion
As AI technology continues to iterate at a rapid pace, the developer community is confronting a profound rethinking of the question: "Is fine-tuning heading…
蒸餾恐慌：為什麼將「知識蒸餾」稱為安全攻擊是極其糟糕的趨勢★ 75
Interconnects (Nathan L.)41 days agoOpinion
In the field of machine learning, "knowledge distillation" is a well-established technique that generally refers to using the output data generated by a…
DeepInfra 正式加入 Hugging Face 推理服務商（Inference Providers）陣容 🔥★ 72
Hugging Face Blog46 days agoRelease
Hugging Face's official blog has announced that DeepInfra — a well-known high-performance, low-cost serverless inference platform — has officially joined…
解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75
Interconnects (Nathan L.)55 days agoOpinion
In today's AI landscape, the performance gap between open-weights models (such as Meta's Llama family) and closed-source models (such as OpenAI's GPT and…
預測 2026 年年中：我對開源 AI 模型的幾點賭注與開閉源差距分析★ 75
Interconnects (Nathan L.)60 days agoOpinion
In this forward-looking article on the state of AI in mid-2026, Interconnects founder Nathan Lambert takes a deep dive into the dynamic gap between open-weight…
解放你的 OpenClaw：用開源模型打造自主 CLI 開發 Agent★ 75
Hugging Face Blog79 days agoTutorial
With the launch of agent-oriented CLI coding tools like Claude Code from Anthropic, developer demand for "collaborating with AI directly inside the terminal"…
Hugging Face 開源生態報告：2026 春季版★ 85
Hugging Face Blog89 days agoCommentary
Hugging Face has published its Spring 2026 "State of Open Source AI" report, offering a comprehensive review of the explosive growth and paradigm shifts that…
開源模型的下一階段：工業化時代下的市場、能力與生態應對★ 80
Interconnects (Nathan L.)90 days agoOpinion
This article, from Nathan Lambert's well-known AI newsletter Interconnects, offers a deep examination of the critical turning point that open-source language…
免費訓練 AI 模型！Hugging Face 聯手 Unsloth 推出 Hugging Face Jobs 免費微調服務★ 85
Hugging Face Blog114 days agoNew Tool
Hugging Face's official blog has announced exciting news for the open-source AI community: Hugging Face has formed a deep partnership with Unsloth — the…
GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95
Hugging Face Blog114 days agoBusiness
A historic milestone has arrived in the open-source AI world: GGML and llama.cpp — the open-source projects founded by Georgi Gerganov that laid the…
開源模型陷入「永久追趕」：開源與閉源的差距、蒸餾、創新週期與開源的勝算★ 80
Interconnects (Nathan L.)117 days agoOpinion
This article by Nathan Lambert takes a deep dive into the tangled competitive dynamics between open-source and closed-source AI models. Lambert argues that…
Transformers.js v4 正式上架 NPM！網頁端 WebGPU AI 迎來重大效能升級★ 85
Hugging Face Blog125 days agoRelease
Hugging Face officially published Transformers.js v4 on NPM, marking a major milestone for running local AI models within the JavaScript ecosystem…
CUGA 登陸 Hugging Face：讓可配置 AI Agent 走向大眾化★ 75
Hugging Face Blog181 days agoRelease
IBM Research has officially launched the CUGA (Configurable User-Guided Agents) framework on Hugging Face, aiming to democratize advanced AI Agent technology…
Google DeepMind 推出 FACTS 基準測試套件：系統化評估大型語言模型的真實性★ 80
Google DeepMind Blog187 days agoRelease
As large language models (LLMs) are deployed across a wide range of industries, ensuring the "factuality" of model outputs and reducing "hallucination" has…
OVHcloud 正式加入 Hugging Face 推理供應商行列，主打歐洲數據主權與高性價比算力★ 72
Hugging Face Blog202 days agoRelease
Hugging Face has announced a new partnership with OVHcloud, Europe's leading cloud infrastructure provider, officially incorporating OVHcloud into Hugging Face…
使用開源模型大幅提升你的 OCR 工作流效率★ 80
Hugging Face Blog236 days agoTutorial
Traditional OCR systems (such as Tesseract) often struggle with complex layouts, multi-column tables, handwriting, and mathematical formulas, while using…
Hugging Face 推理提供商迎來新夥伴：Public AI 正式上線 🔥★ 70
Hugging Face Blog270 days agoRelease
Hugging Face continues to expand its "Inference Providers" program, aimed at enabling developers to run open-source models from Hugging Face Hub in the…
Arm 與 ExecuTorch 0.7 聯手：將生成式 AI 推向大眾市場★ 80
Hugging Face Blog305 days agoRelease
As generative AI advances rapidly, deploying massive models to resource-constrained edge devices — such as smartphones, smart hardware, and AI PCs — has become…

Page 1Next →

Latest in AI

Lemonade v10.7 Adds Omni Models, Benchmarks, and Cross-Vendor GPU Support

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super

Furiosa AI inference chip could be a game changer for local LLMs

A llama.cpp CLI Command Builder

Arguing with an AI bot posting outdated Llama 3.1 takes

When every other post is an AI benchmark, best-model question, or slop app

Import AI 460: Reward hacking society, RSI data, and RL quadcopter racing★ 76

Google's Official Gemma 4 QAT Q4_0 GGUFs Have Higher Precision Than Unsloth's Q4_K_XL

Arithmetic Without Numbers: How LLMs Do Math

Fine-tuning an LLM to write docs like it's 1995

How LLMs Actually Work

Reachy Mini goes fully local

[AINews] 微調的終結？探討 Fine-tuning 在大模型時代的未來與轉變★ 75

蒸餾恐慌：為什麼將「知識蒸餾」稱為安全攻擊是極其糟糕的趨勢★ 75

DeepInfra 正式加入 Hugging Face 推理服務商（Inference Providers）陣容 🔥★ 72

解讀當前開源與閉源 AI 模型的性能差距：超越單一評估指標的迷思★ 75

預測 2026 年年中：我對開源 AI 模型的幾點賭注與開閉源差距分析★ 75

解放你的 OpenClaw：用開源模型打造自主 CLI 開發 Agent★ 75

Hugging Face 開源生態報告：2026 春季版★ 85

開源模型的下一階段：工業化時代下的市場、能力與生態應對★ 80

免費訓練 AI 模型！Hugging Face 聯手 Unsloth 推出 Hugging Face Jobs 免費微調服務★ 85

GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95

開源模型陷入「永久追趕」：開源與閉源的差距、蒸餾、創新週期與開源的勝算★ 80

Transformers.js v4 正式上架 NPM！網頁端 WebGPU AI 迎來重大效能升級★ 85

CUGA 登陸 Hugging Face：讓可配置 AI Agent 走向大眾化★ 75

Google DeepMind 推出 FACTS 基準測試套件：系統化評估大型語言模型的真實性★ 80

OVHcloud 正式加入 Hugging Face 推理供應商行列，主打歐洲數據主權與高性價比算力★ 72

使用開源模型大幅提升你的 OCR 工作流效率★ 80

Hugging Face 推理提供商迎來新夥伴：Public AI 正式上線 🔥★ 70

Arm 與 ExecuTorch 0.7 聯手：將生成式 AI 推向大眾市場★ 80