Latest in AI

Showing:multimodalClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80
Hugging Face Blog556 days agoRelease
Google and Hugging Face have jointly announced the release of a new generation of open-weight vision-language model (VLM) — PaliGemma 2. This model continues…
Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行★ 80
Hugging Face Blog565 days agoRelease
Hugging Face has officially launched a lightweight vision language model (VLM) called **SmolVLM**, designed to bring powerful multimodal understanding…
CinePile 2.0：利用對抗性精煉打造更強大的長影片問答資料集★ 75
Hugging Face Blog599 days agoRelease
CinePile is a multimodal question-answering dataset focused on movie and long-video understanding. In traditional dataset construction, researchers commonly…
Meta 推出 Llama 3.2：支援視覺多模態與邊緣裝置運行的輕量級模型，Hugging Face 全面支援★ 95
Hugging Face Blog627 days agoRelease
Meta has officially introduced the Llama 3.2 family of open-source models, marking a significant architectural upgrade with two major breakthroughs: multimodal…
FineVideo 幕後秘辛：Hugging Face 如何打造高品質開源影片資料集★ 75
Hugging Face Blog629 days agoRelease
With the explosion of video generation and understanding models such as Sora and Gen-3, high-quality video training data has become a key battleground for…
Hugging Face 推出 Docmatix：用於文件視覺問答（DocVQA）的超大型開源數據集★ 75
Hugging Face Blog696 days agoRelease
The Hugging Face official blog has announced the release of a new, massive dataset called "Docmatix," specifically designed for training and fine-tuning…
視覺語言模型（VLM）的偏好最佳化指南：使用 TRL 進行 DPO 微調★ 75
Hugging Face Blog704 days agoTutorial
As vision-language models (VLMs) are increasingly applied to multimodal tasks, how to make these models produce outputs that better align with human…
邁向多模態：Prezi 如何利用 Hugging Face Hub 與專家支持計畫加速其機器學習路線圖
Hugging Face Blog725 days agoBusiness
In this case study, Prezi — the well-known company behind the non-linear presentation software of the same name — shares how it is embracing the "multimodal…
阿布達比 TII 發表 Falcon 2 11B：搭載 5 兆 Token 訓練的預訓練語言與視覺語言模型★ 75
Hugging Face Blog751 days agoRelease
The Technology Innovation Institute (TII) of Abu Dhabi has officially released a new open-source model family on Hugging Face — Falcon 2 11B. This model, with…
Google 推出 PaliGemma：結合 SigLIP 與 Gemma 的開源視覺語言模型★ 80
Hugging Face Blog761 days agoRelease
Google has officially launched PaliGemma, a powerful yet lightweight open-source Vision-Language Model (VLM). The release of PaliGemma represents a significant…
Hugging Face 推出 Idefics2：強大的 8B 開源視覺語言模型★ 80
Hugging Face Blog790 days agoRelease
Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B) parameters, this…
視覺語言模型（VLM）原理解析：從架構、訓練到應用指南★ 80
Hugging Face Blog794 days agoTutorial
This technical blog post published by Hugging Face provides an accessible yet thorough breakdown of the core principles and applications of Vision Language…
Hugging Face 推出 ConTextual 排行榜：評估多模態模型在富含文本場景中的圖文聯合推理能力★ 75
Hugging Face Blog831 days agoRelease
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…
Hugging Face 推出 IDEFICS：開源重現 SOTA 多模態視覺語言模型 Flamingo★ 78
Hugging Face Blog1,027 days agoRelease
Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal vision-language model…
打造 AI 網路電視台：如何利用 Hugging Face 建立 24/7 全天候 AI 生成直播頻道
Hugging Face Blog1,063 days agoTutorial
This official Hugging Face blog post details how to build an "AI WebTV" (AI web television channel) from scratch — a system capable of automatically generating…
在 Habana Gaudi2 上加速視覺語言模型：BridgeTower 實作指南
Hugging Face Blog1,081 days agoTutorial
This technical blog post from Hugging Face details how to accelerate the vision-language model (VLM) "BridgeTower" on Intel's Habana Gaudi2 deep learning…
Kakao Brain 於 Hugging Face 釋出全新 ViT 與 ALIGN 開源模型
Hugging Face Blog1,196 days agoRelease
Kakao Brain, the AI research arm of South Korean tech giant Kakao, has officially released newly trained ViT (Vision Transformer) and ALIGN (A Large-scale…
深入探討視覺語言模型 (Vision-Language Models) 的原理與架構★ 80
Hugging Face Blog1,227 days agoTutorial
This is a classic technical guide written by the Hugging Face team, designed to help developers and researchers gain a deep understanding of how…
Hugging Face 電腦視覺（Computer Vision）發展現狀與生態指南
Hugging Face Blog1,231 days agoCommentary
Although Hugging Face rose to prominence in the field of natural language processing (NLP), it has made tremendous strides in computer vision (CV) in recent…
Hugging Face Datasets 推出全新音訊與電腦視覺文件指南
Hugging Face Blog1,417 days agoRelease
Hugging Face announced new official Audio and Vision documentation guides for its core open-source library `datasets`. As multimodal AI models continue to…
將倫理原則置於研究生命週期的核心：Hugging Face 的多模態研究倫理憲章
Hugging Face Blog1,487 days agoOpinion
As multimodal AI (combining text, images, audio, and other media) advances rapidly, the ethical challenges brought about by the technology are growing…
Perceiver IO：可擴展且適用於任何模態的全注意力機制模型★ 70
Hugging Face Blog1,642 days agoRelease
This article introduces DeepMind's Perceiver IO model and its integration into the Hugging Face Transformers library. Traditional Transformer models, while…

← PreviousPage 3

Latest in AI

Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80

Hugging Face 推出 SmolVLM：輕量且強大的開源視覺語言模型，可在本機高效運行★ 80

CinePile 2.0：利用對抗性精煉打造更強大的長影片問答資料集★ 75

Meta 推出 Llama 3.2：支援視覺多模態與邊緣裝置運行的輕量級模型，Hugging Face 全面支援★ 95

FineVideo 幕後秘辛：Hugging Face 如何打造高品質開源影片資料集★ 75

Hugging Face 推出 Docmatix：用於文件視覺問答（DocVQA）的超大型開源數據集★ 75

視覺語言模型（VLM）的偏好最佳化指南：使用 TRL 進行 DPO 微調★ 75

邁向多模態：Prezi 如何利用 Hugging Face Hub 與專家支持計畫加速其機器學習路線圖

阿布達比 TII 發表 Falcon 2 11B：搭載 5 兆 Token 訓練的預訓練語言與視覺語言模型★ 75

Google 推出 PaliGemma：結合 SigLIP 與 Gemma 的開源視覺語言模型★ 80

Hugging Face 推出 Idefics2：強大的 8B 開源視覺語言模型★ 80

視覺語言模型（VLM）原理解析：從架構、訓練到應用指南★ 80

Hugging Face 推出 ConTextual 排行榜：評估多模態模型在富含文本場景中的圖文聯合推理能力★ 75

Hugging Face 推出 IDEFICS：開源重現 SOTA 多模態視覺語言模型 Flamingo★ 78

打造 AI 網路電視台：如何利用 Hugging Face 建立 24/7 全天候 AI 生成直播頻道

在 Habana Gaudi2 上加速視覺語言模型：BridgeTower 實作指南

Kakao Brain 於 Hugging Face 釋出全新 ViT 與 ALIGN 開源模型

深入探討視覺語言模型 (Vision-Language Models) 的原理與架構★ 80

Hugging Face 電腦視覺（Computer Vision）發展現狀與生態指南

Hugging Face Datasets 推出全新音訊與電腦視覺文件指南

將倫理原則置於研究生命週期的核心：Hugging Face 的多模態研究倫理憲章

Perceiver IO：可擴展且適用於任何模態的全注意力機制模型★ 70