ServiceNow AI published a Hugging Face Blog post titled “EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios.” Based only on the title, it appears to be a benchmark dataset update involving tool-use or scenario-based AI evaluation. The exact domains, tools, scenario design, licensing, supported models, and evaluation methodology cannot be confirmed without the full article.
As "Sovereign AI" becomes a global trend, countries around the world are actively seeking to build AI models that reflect their own culture, values, and…
NVIDIA has released a new synthetic dataset on Hugging Face called "Nemotron-Personas-Japan," a critical resource designed specifically to advance Japan's…
SandboxAQ — an AI and quantum technology pioneer spun out of Alphabet — has officially launched an open-source dataset called SAIR (Structural AI for Research)…
NVIDIA has officially released a massive "Multi-Lingual Reasoning Dataset" containing 6 million samples on the Hugging Face platform. This significant…
The "Virtual Cell" is one of the ultimate goals at the intersection of systems biology and artificial intelligence, aiming to fully simulate the physiological…
In the history of artificial intelligence, the appearance of the ImageNet dataset in 2012 is widely recognized as the key catalyst that ignited the deep…
### Hugging Face LeRobot Enters New Territory: Launches the World's Largest Open-Source Autonomous Driving Dataset Hugging Face's open-source robotics project…
With the rise of open-source video generation models such as LTX-Video, HunyuanVideo, and CogVideoX, building high-quality training datasets has become the…
Hugging Face launched a brand-new "Synthetic Data Generator" in December 2024 — a web-based, no-code tool designed to allow anyone to create high-quality AI…
### Introduction: An Important Piece of the Open-Source Image Generation Puzzle As text-to-image (T2I) technology advances rapidly, ensuring that AI-generated…
The open-source data curation and annotation platform Argilla has officially released version 2.4, with the core of this update being deep integration with…
CinePile is a multimodal question-answering dataset focused on movie and long-video understanding. In traditional dataset construction, researchers commonly…
The Hugging Face team and its collaborators have jointly launched a new benchmark called "BenCzechMark," designed to evaluate the understanding and generation…
With the explosion of video generation and understanding models such as Sora and Gen-3, high-quality video training data has become a key battleground for…
The Hugging Face official blog has announced the release of a new, massive dataset called "Docmatix," specifically designed for training and fine-tuning…
In the current wave of generative AI, the industry's attention is gradually shifting from "fine-tuning model architectures" to "improving data quality." Issue…
### Background In the current development of large language models (LLMs), high-quality alignment data (such as the preference data required for RLHF and DPO)…
Replicate's technical newsletter, Replicate Intelligence #2, takes a deep dive into three of the most hotly discussed trends in the open-source AI community…
### Background and Challenges In the field of code generation, instruction tuning is the key to improving a model's practical utility and alignment with human…
Hugging Face has officially released Cosmopedia, currently the largest and fully open-source synthetic dataset designed for the pre-training of large language…
The Hugging Face official blog has published a post introducing WebSight, a brand-new open-source dataset designed to address the bottleneck that multimodal…
The BigCode community, jointly led by Hugging Face and ServiceNow, together with NVIDIA, has officially announced the launch of a new generation of open-source…
Prodigy, the well-known machine learning data annotation tool from Explosion (the company behind the popular NLP library spaCy), has officially released a…
This article introduces the integration between Hugging Face and the open-source data exploration tool Renumics Spotlight, aimed at addressing the pain point…
Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal vision-language model…
The Hugging Face Ethics and Society team has published the fourth edition of its newsletter, this time focusing on the problem of "bias" in text-to-image (T2I)…
This second issue of the newsletter from Hugging Face's Ethics and Society team centers on the theme of "Biases in Machine Learning." As AI technology becomes…
In the fields of artificial intelligence and computer vision, collecting high-quality, labeled image datasets is typically a time-consuming and tedious task…