Hugging Face has recently introduced a new benchmark called "TextQuests," designed to evaluate the performance of large language models (LLMs) in text-based…
Replicate has officially launched a remote MCP (Model Context Protocol) server. MCP is an open standard created by Anthropic that enables large language models…
As the Model Context Protocol (MCP) proposed by Anthropic gradually becomes the open standard for connecting large language models (LLMs) with external tools…
As AI applications become more widespread, how to allow large language models (LLMs) to securely and efficiently access enterprise internal data or external…
Vercel has announced a major update to its AI development tooling, launching a new service based on the Model Context Protocol (MCP) that allows developers to…
### What is FutureBench? As large language models (LLMs) and AI agents have rapidly advanced, traditional static benchmarks (such as MMLU and GSM8K) face a…
The Model Context Protocol (MCP) is an open standard introduced by Anthropic, designed to allow AI assistants (such as Claude) to interact securely and…
With the rise of Anthropic's Claude 3.5 Sonnet "Computer Use" and various GUI-oriented multimodal models, "desktop agents" have become one of the hottest areas…
Hugging Face has officially announced the launch of its dedicated MCP (Model Context Protocol) server — a major step in ecosystem integration. The Model…
With Anthropic's introduction of the Model Context Protocol (MCP) open standard, the way AI agents connect to external tools and data sources has become…
University of Pennsylvania Wharton School professor Ethan Mollick recently published an extremely practical AI quick guide, "Using AI Right Now: A Quick…
Hugging Face recently published a highly practical technical tutorial demonstrating how to build a fully functional miniature AI agent in just around 70 lines…
Vercel has officially announced support for deploying MCP (Model Context Protocol) servers. This update allows developers to use Vercel's Serverless…
Wharton School professor Ethan Mollick, in his latest article "Personality and Persuasion," delves into AI's persuasive power and the psychological mechanisms…
Since Anthropic introduced the Model Context Protocol (MCP) open standard, connecting large language models (LLMs) to external tools has never been easier. The…
In this Hugging Face blog post, the team demonstrates how to implement a fully functional, lightweight AI agent (referred to as a "Tiny Agent") that supports…
### Background and Pain Points: Moving Beyond the Overly Simple "Needle in a Haystack" Test In recent years, the context window length supported by large…
At the 2025 Google Cloud Next conference, Google dropped two bombshells regarding the AI Agent ecosystem. The CEOs of Google and Google DeepMind jointly…
As large language model (LLM) technology has evolved, AI has transformed from a simple question-answering assistant into an "AI agent" capable of proactively…
On January 24, 2025, Hugging Face announced that smolagents — its open-source library designed for building lightweight, high-performance AI agents — now…
Hugging Face officially launched a lightweight AI agent development framework called `smolagents` at the end of 2024. The core philosophy of this tool is "Code…
### Background and Challenges: The Difficulty of Evaluating Non-English LLMs In the current landscape of large language model (LLM) development, evaluating…
This article from the Hugging Face blog introduces "The First Multilingual LLM Debate Competition." As large language models (LLMs) have rapidly advanced…
As large language models (LLMs) have rapidly advanced, traditional static benchmarks (such as MMLU) have increasingly faced saturation and gaming problems. As…
As generative AI applications become more widespread, one of the biggest challenges developers face is the "non-deterministic" output of large language models…
### Background and Challenges Document Visual Question Answering (DocVQA) is an important application of multimodal AI, requiring models to simultaneously…
As large language models (LLMs) have made tremendous strides in code generation, the long-standing industry gold standard — the HumanEval benchmark — has…
This Replicate technical digest (Intelligence #1) compiles three of the most talked-about technical breakthroughs and open-source projects in the AI community…
As code large language models (Code LLMs) develop rapidly, fairly and accurately evaluating their capabilities has become a major challenge. Traditional…
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…