In this case study, Prezi — the well-known company behind the non-linear presentation software of the same name — shares how it is embracing the "multimodal…
The Technology Innovation Institute (TII) of Abu Dhabi has officially released a new open-source model family on Hugging Face — Falcon 2 11B. This model, with…
Google has officially launched PaliGemma, a powerful yet lightweight open-source Vision-Language Model (VLM). The release of PaliGemma represents a significant…
Hugging Face has announced the launch of Idefics2, the next generation of its open-source Vision Language Model (VLM). With 8 billion (8B) parameters, this…
This technical blog post published by Hugging Face provides an accessible yet thorough breakdown of the core principles and applications of Vision Language…
Hugging Face has announced the launch of a new multimodal benchmark and leaderboard called "ConTextual," aimed at addressing the shortcomings of existing…
Hugging Face has officially launched IDEFICS (Image-supervised Decoder-Encoder-Few-shot-In-Context-Shorthand), an open-source multimodal vision-language model…
This official Hugging Face blog post details how to build an "AI WebTV" (AI web television channel) from scratch — a system capable of automatically generating…
This technical blog post from Hugging Face details how to accelerate the vision-language model (VLM) "BridgeTower" on Intel's Habana Gaudi2 deep learning…
Kakao Brain, the AI research arm of South Korean tech giant Kakao, has officially released newly trained ViT (Vision Transformer) and ALIGN (A Large-scale…
This is a classic technical guide written by the Hugging Face team, designed to help developers and researchers gain a deep understanding of how…
Although Hugging Face rose to prominence in the field of natural language processing (NLP), it has made tremendous strides in computer vision (CV) in recent…
Hugging Face announced new official Audio and Vision documentation guides for its core open-source library `datasets`. As multimodal AI models continue to…
As multimodal AI (combining text, images, audio, and other media) advances rapidly, the ethical challenges brought about by the technology are growing…
This article introduces DeepMind's Perceiver IO model and its integration into the Hugging Face Transformers library. Traditional Transformer models, while…