Latest in AI

Showing:edge-aiClear ×

🔥 Trending today

anthropic7 export-controls5 model-access3 ai-infrastructure3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super
r/LocalLLaMA top day4 days agoBenchmark
A LocalLLaMA post benchmarks five Bonsai LM models, from 1.7B to about 8B parameters, on a $250 Jetson Orin Nano Super 8GB using llama.cpp CUDA. The tests compare 7W, 15W, 25W, and MAXN modes across latency, throughput, energy per token, and thermals. The main takeaway is that 25W is usually the best efficiency/performance point for models up to 4B, while Bonsai-8B may favor 15W for lower power.
From Desk-Side to Data Center: Leadtek Showcases On-Prem Agentic AI Computing Strategy at COMPUTEX 2026
INSIDE 硬塞 AI4 days agoHardware
The article says enterprise AI adoption is entering a new phase as security concerns, cloud latency, and model changes push compute needs on premises. At COMPUTEX 2026, Leadtek presented an AI compute spectrum from factory edge environments to data centers. The focus is helping companies keep tighter control over agentic AI secrets and inference responsiveness.
Jetson Orin NX Build for Hermes Agent + Benchmarking
r/LocalLLaMA top day5 days agoHardware
The post describes turning an unused Jetson Orin NX into a compact local LLM server for Hermes Agent testing. The goals were low noise, over 10 tok/s generation, 300 tok/s prompt processing, at least 65K context, and a custom case. After testing Gemma 4, Qwen 3.6, and many quant variants, the author reports Gemma 4 26B A4B UD Q2_K_XL reaching 66K context and 10.21 tok/s near 60K context.
A 4B Edge-Deployable Cognitive Model Built in China
量子位 QbitAI5 days agoRelease
QbitAI’s headline says a domestic Chinese team has built a 4B-parameter “cognitive model” suitable for edge deployment. The framing links it to a model direction previously associated with Andrej Karpathy. Since the article body was not provided, details such as the model name, architecture, benchmark results, hardware requirements, open-source status, and licensing remain unverified.
llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants
r/LocalLLaMA top day5 days agoRelease
The Reddit post links to ggml-org/llama.cpp Pull Request #24282, which adds MTP support for Gemma-4 E2B and E4B assistants. The submitter frames it as useful for tiny Gemma models on phones, low-end machines, Raspberry Pi, or similarly constrained devices. The post does not include benchmarks, merge status, or setup instructions, so it should be treated as a development signal rather than a finished release.
Introducing Mistral 3★ 84
Mistral AI News6 days agoRelease
Mistral AI introduced Mistral 3, a new open model family under Apache 2.0. It includes Mistral Large 3, a 675B-parameter sparse MoE with 41B active parameters, plus Ministral 3 models at 3B, 8B, and 14B. The release targets frontier open-weight use, multimodal and multilingual workflows, enterprise customization, and efficient local or edge deployments.
Introducing Mistral 3★ 78
Mistral AI News6 days agoRelease
Mistral AI introduced Mistral 3, a new open model family including Mistral Large 3 and Ministral 3 models at 3B, 8B, and 14B sizes. Large 3 is a 675B-parameter sparse MoE model with 41B active parameters, while Ministral 3 targets local and edge use cases. The models are released under Apache 2.0 and are available through Mistral AI Studio, Hugging Face, Amazon Bedrock, and other platforms.
Best Local TTS Solution
r/LocalLLaMA top day6 days agoCommentary
A r/LocalLLaMA user says they have tested many local TTS tools, but none match ElevenLabs for expressiveness, voices, and cloning. They list moss-nano and Kokoro as the best edge-device candidates so far, with edgeTTS as a free/cloud option. The post asks for community experience connecting agents such as Hermes, openclaw, or opencode to Telegram voice notes or real-time voice conversations.
Clustering 3x Jetson Nano Orin Supers for Distributed AI
r/LocalLLaMA top day7 days agoTutorial
A developer has shared a practical guide on clustering three NVIDIA Jetson Nano Orin Super boards, leveraging their Ampere CUDA cores and unified memory. This project is part of 'smolcluster,' an initiative to make distributed AI training and inference accessible using everyday hardware like Macs, Raspberry Pis, and Jetsons. The series aims to explore whether heterogeneous clusters (mixing different hardware architectures) can effectively run local LLMs.
Launch HN: General Instinct (YC P26) - Frontier models on edge devices
Hacker News (AI keywords)9 days agoNew Tool
General Instinct is a YC P26 company introduced through a Launch HN post. Its headline positioning is bringing frontier models to edge devices, suggesting local or embedded AI deployment rather than purely cloud-based inference. Since no article body is available, details such as supported models, hardware, benchmarks, pricing, and developer tooling cannot be verified from the provided source.
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72
Hacker News (AI keywords)9 days agoRelease
Google released new Gemma 4 checkpoints optimized with Quantization-Aware Training to preserve quality after compression. The release includes Q4_0 checkpoints and a mobile-focused quantization format that can reduce Gemma 4 E2B memory use to about 1GB, or below 1GB for a text-only configuration. The models are available through Hugging Face and supported across llama.cpp, Ollama, LM Studio, LiteRT-LM, Transformers.js, SGLang, vLLM, MLX, and Unsloth.
QNAP Showcases Ready & Recovery and Edge AI Enterprise IT Architecture at COMPUTEX 2026
INSIDE 硬塞 AI11 days agoHardware
QNAP appeared at COMPUTEX 2026 with “Ready & Recovery” and “Edge AI” as its two main themes. The showcase covered backup and recovery, anti-ransomware protection, high availability, on-prem generative AI, 100G networking, smart surveillance, and media workflows. The company also revealed multiple AI NAS products and enterprise switches, positioning its portfolio around data resilience, AI computing, and security.
NXP Computex 2026 Keynote: Neural Axis for Physical AI Hardware
INSIDE 硬塞 AI11 days agoHardware
At Computex 2026, NXP focused on Physical AI and introduced its Neural Axis architecture for edge devices. The architecture emphasizes low latency, high security, and hardware-based trust for real-time responses. The article frames this as important for robotics, autonomous vehicles, and other physical-world AI deployments where safe operation is essential.
Z-COM to Officially Launch NEW Platform at Computex 2026
INSIDE 硬塞 AI11 days agoHardware
Z-COM will officially introduce NEW Platform at Computex 2026. The edge-native infrastructure combines network control, AI operations, and energy management in a single architecture. Its stated goal is to support local AI computing and help enterprises reduce dependence on cloud providers and avoid cloud lock-in.
Microsoft Build 2026 unveils MAI-Thinking-1, Scout, and Project Solara★ 76
INSIDE 硬塞 AI11 days agoRelease
At Build 2026, Microsoft introduced an agent-first architecture that combines software and hardware into a broader AI platform. The announcement includes a unified Copilot app, self-developed MAI models, the persistent Scout agent, and the Project Solara device platform. The move frames AI agents as an end-to-end execution layer running from cloud services to user devices.
Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP
TechCrunch AI12 days agoHardware
Nvidia is pursuing the $200 billion CPU market through AI agent PCs associated with Microsoft, Dell, and HP. The potential impact depends on whether AI agents can reach mainstream users in a simple, safe, and useful way. The provided excerpt does not specify hardware models, pricing, release dates, or performance details.
Qualcomm Unveils Dragonfly Data Center Brand for the Agentic AI Era
INSIDE 硬塞 AI13 days agoHardware
At Computex 2026, Qualcomm described AI agents as a major driver of cross-device hardware upgrades. The company unveiled Dragonfly, a new data center brand focused on inference computing. The announcement outlines a broader strategy spanning endpoint devices and cloud infrastructure, although the source does not provide specifications, performance figures, or deployment timelines.
NVIDIA Space Computing Gets First Hardware Case as Aitech Integrates IGX Thor
INSIDE 硬塞 AI17 days agoHardware
Aitech announced it will integrate NVIDIA IGX Thor into its space supercomputer for low Earth orbit missions. The goal is to provide onboard AI edge computing and enable real-time inference directly in orbit. By processing more data in space, the system aims to reduce dependence on ground communications and extend AI compute beyond Earth-based infrastructure.
Business Owners, Do You Really Know How to Use AI?
INSIDE 硬塞 AI18 days agoOpinion
The article argues that many companies use AI mainly to improve efficiency, without creating meaningful revenue or strategic advantage. It proposes distributed AI, placing intelligence closer to where data is generated to reduce latency and support faster decisions. The key message is that firms should balance centralized and distributed architectures to strengthen competitiveness while preserving greater control over data and digital sovereignty.
烏克蘭無人機創辦人 Yaroslav Azhnyuk 談自主無人機技術棧與無人機經濟學：西方國家正處於昏睡狀態
Latent Space27 days agoCommentary
In this episode of the Latent Space podcast, the hosts and guest host Noah Smith (author of the well-known economics and technology blog Noahpinion)…
Google 發表 Gemma 4：專為裝置端設計的前沿多模態開放模型★ 85
Hugging Face Blog73 days agoRelease
Google and Hugging Face have jointly announced a new generation of open-weight models — "Gemma 4." This model represents a major breakthrough in on-device AI…
IBM 推出 Granite 4.0 3B Vision：專為企業文件設計的輕量級多模態 AI 模型★ 75
Hugging Face Blog75 days agoRelease
IBM has officially launched its new lightweight multimodal model on Hugging Face — the Granite 4.0 3B Vision. With 3 billion (3B) parameters, this model is…
Import AI 448：AI 研發趨勢、ByteDance 的 CUDA 寫作 Agent、衛星邊緣 AI 與 AI 戰爭的未來★ 75
Import AI (Jack Clark)97 days agoCommentary
This issue of Import AI 448, written by Jack Clark, takes a deep dive into the latest developments in AI R&D, automated hardware optimization, and the…
Hugging Face 聯手 NXP：將機器人 AI 帶入嵌入式平台（資料集錄製、VLA 微調與裝置端優化）★ 75
Hugging Face Blog101 days agoTutorial
Hugging Face has entered into a deep collaboration with semiconductor giant NXP (NXP Semiconductors), aimed at solving the challenge of deploying advanced…
GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95
Hugging Face Blog114 days agoBusiness
A historic milestone has arrived in the open-source AI world: GGML and llama.cpp — the open-source projects founded by Georgi Gerganov that laid the…
開放評測標準：使用 NeMo Evaluator 基準測試 NVIDIA Nemotron 3 Nano★ 70
Hugging Face Blog179 days agoTutorial
As large language models (LLMs) develop in two divergent directions — with extremely large cloud-based models at one end and lightweight "Nano"-scale models…
探討全球算力格局的轉變：Hugging Face 剖析 AI 基礎設施的未來★ 75
Hugging Face Blog228 days agoOpinion
Against the backdrop of explosive global growth in artificial intelligence, compute has become the core resource that determines technological competitiveness…
如何使用 NVIDIA Isaac 醫療平台打造醫療機器人：從模擬到部署的完整指南★ 70
Hugging Face Blog228 days agoTutorial
As healthcare demands increase and medical staffing shortages worsen, the development of medical robots — such as robots for ward supply delivery, assisted…
Granite 4.0 Nano：探索端側 AI 的極限，模型究竟能縮到多小？★ 75
Hugging Face Blog229 days agoRelease
This article, jointly published by IBM and Hugging Face, delves into the technical details and application scenarios of the brand-new ultra-lightweight model…
Google DeepMind 推出 Gemma 3 270M：專為超高效能 AI 設計的極致輕量級模型★ 72
Google DeepMind Blog233 days agoRelease
Google DeepMind has officially announced the addition of a highly distinctive and specialized new member to its open-source model family — Gemma 3 270M. This…

Page 1Next →

Latest in AI

Bonsai LM 1-bit and 1.58-bit Benchmarks on Jetson Orin Nano Super

From Desk-Side to Data Center: Leadtek Showcases On-Prem Agentic AI Computing Strategy at COMPUTEX 2026

Jetson Orin NX Build for Hermes Agent + Benchmarking

A 4B Edge-Deployable Cognitive Model Built in China

llama.cpp PR adds MTP support for Gemma-4 E2B and E4B assistants

Introducing Mistral 3★ 84

Introducing Mistral 3★ 78

Best Local TTS Solution

Clustering 3x Jetson Nano Orin Supers for Distributed AI

Launch HN: General Instinct (YC P26) - Frontier models on edge devices

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency★ 72

QNAP Showcases Ready & Recovery and Edge AI Enterprise IT Architecture at COMPUTEX 2026

NXP Computex 2026 Keynote: Neural Axis for Physical AI Hardware

Z-COM to Officially Launch NEW Platform at Computex 2026

Microsoft Build 2026 unveils MAI-Thinking-1, Scout, and Project Solara★ 76

Nvidia chases $200B CPU market with AI agent PCs from Microsoft, Dell, and HP

Qualcomm Unveils Dragonfly Data Center Brand for the Agentic AI Era

NVIDIA Space Computing Gets First Hardware Case as Aitech Integrates IGX Thor

Business Owners, Do You Really Know How to Use AI?

烏克蘭無人機創辦人 Yaroslav Azhnyuk 談自主無人機技術棧與無人機經濟學：西方國家正處於昏睡狀態

Google 發表 Gemma 4：專為裝置端設計的前沿多模態開放模型★ 85

IBM 推出 Granite 4.0 3B Vision：專為企業文件設計的輕量級多模態 AI 模型★ 75

Import AI 448：AI 研發趨勢、ByteDance 的 CUDA 寫作 Agent、衛星邊緣 AI 與 AI 戰爭的未來★ 75

Hugging Face 聯手 NXP：將機器人 AI 帶入嵌入式平台（資料集錄製、VLA 微調與裝置端優化）★ 75

GGML 與 llama.cpp 正式加入 Hugging Face，攜手保障本地端 AI 的長期發展★ 95

開放評測標準：使用 NeMo Evaluator 基準測試 NVIDIA Nemotron 3 Nano★ 70

探討全球算力格局的轉變：Hugging Face 剖析 AI 基礎設施的未來★ 75

如何使用 NVIDIA Isaac 醫療平台打造醫療機器人：從模擬到部署的完整指南★ 70

Granite 4.0 Nano：探索端側 AI 的極限，模型究竟能縮到多小？★ 75

Google DeepMind 推出 Gemma 3 270M：專為超高效能 AI 設計的極致輕量級模型★ 72