Latest in AI

Showing:tool-useDevelopersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Releasing Cohere North Mini Code
r/LocalLLaMA top day5 days agoRelease
Cohere’s Jay Alammar announced the official release of North Mini Code after early community feedback from r/LocalLLaMA. Weights are available on Hugging Face, including an fp8 version, and the model can be tried for free through OpenCode. For vLLM deployment, Cohere recommends using vLLM main for now and installing cohere_melody for accurate response parsing, while noting community requests for quantization and llama.cpp support.
Cohere Introduces Command A+: Next-Gen Enterprise Model Optimized for Agentic Workflows★ 80
Cohere Blog6 days agoRelease
Cohere has introduced Command A+, its latest enterprise-grade model tailored for agentic workflows. Stepping beyond traditional RAG, Command A+ excels in multi-step reasoning, complex tool use, and multilingual capabilities. It is designed to seamlessly integrate with enterprise APIs, enabling highly autonomous and reliable AI agents.
Arithmetic Without Numbers: How LLMs Do Math
Hacker News (AI keywords)9 days agoCommentary
The article asks whether LLM arithmetic is memorization, heuristics, real computation, or experimental assistance. It summarizes Rune experiments that decode operations and operands from frozen Llama activations, then route them to Python under a no-parser rule. The strongest supported claim is narrow: activation-derived tool arguments worked in scoped audits, while residual-state JIT replacement, long-number generation, and cross-model transfer remain brittle.
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios
Hugging Face Blog10 days agoBenchmark
ServiceNow AI published a Hugging Face Blog post titled “EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios.” Based only on the title, it appears to be a benchmark dataset update involving tool-use or scenario-based AI evaluation. The exact domains, tools, scenario design, licensing, supported models, and evaluation methodology cannot be confirmed without the full article.
Adding MCP Tools to Reachy Mini
Hugging Face Blog11 days agoTutorial
Based on the available title, this Hugging Face Blog post appears to cover adding MCP tools to Reachy Mini. The likely focus is connecting the open-source desktop robot with Model Context Protocol-based tool integrations. Since the original article text is not provided, implementation details, supported servers, models, and limitations cannot be confirmed.
Introducing Claude Opus 4.8★ 78
Hacker News (AI keywords)17 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, emphasizing benchmark gains, sharper judgment, and more reliable agentic work. The launch also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and Messages API support for system entries inside messages. Standard pricing remains unchanged, while fast mode is faster and substantially cheaper than before.
Ecom-RLVE：為電商對話 Agent 打造的自適應可驗證強化學習環境★ 75
Hugging Face Blog59 days agoRelease
As large language models (LLMs) become increasingly widespread, more and more companies are attempting to deploy AI agents in e-commerce customer service and…
深入解析 VAKRA：IBM Research 評估 AI Agent 推理、工具調用與失敗模式的全新基準測試★ 75
Hugging Face Blog60 days agoRelease
As generative AI technology has evolved, the industry's focus has shifted from pure "Large Language Models (LLMs)" to "AI Agents" capable of autonomously…
OpenEnv 實戰：在真實世界環境中評估具備工具使用能力的 AI Agent★ 75
Hugging Face Blog122 days agoNew Tool
As AI Agent (intelligent agent) technology advances rapidly, evaluating how these agents perform in the real world has become one of the greatest challenges…
Hugging Face 統一工具調用（Tool Use）標準：簡化開源 LLM Agent 開發流程★ 85
Hugging Face Blog671 days agoRelease
### Background and Pain Points In AI agent development, "tool use" (also known as function calling) is the core capability that allows large language models…
NuminaMath 如何贏得首屆 AIMO 進步獎（AI 數學奧林匹亞）並宣佈完整開源★ 80
Hugging Face Blog703 days agoRelease
### Background and Achievement The AI Mathematical Olympiad (AIMO) Progress Prize aims to advance AI models capable of solving Olympiad-level mathematical…