Latest in AI

Showing:computer-visionResearchersClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations
r/LocalLLaMA top day5 days agoRelease
SCAIL-2 by zai-org removes the reliance on skeleton maps and inpainting masks common in prior character animation pipelines, driving characters directly from video in an end-to-end manner. Trained on 60K synthesized motion pairs using SCAIL-Preview, Wan-Animate, and MoCha via a Unified Motion Transfer Interface with RoPE design, the model develops emergent abilities beyond its teacher models. Capabilities include cross-identity character replacement, animal-driving scenarios, and zero-shot support for SAM3D-Body mesh rendering.
Unlocking VLM Potential on Satellite Imagery Through Fine-Tuning
Mistral AI News6 days agoTutorial
Mistral AI demonstrates how LoRA fine-tuning adapts Pixtral-12B to satellite imagery, a specialized visual domain where prompting alone is unreliable. Using the Aerial Image Dataset, the post compares a prompt-based baseline against a fine-tuned model across 30 scene classes. Accuracy rose from 0.56 to 0.91, while invalid label hallucinations dropped from 5% to 0.1%.
CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76
量子位 QbitAI6 days agoPaper
CVPR 2026 named Google DeepMind’s D4RT as Best Paper for fast dynamic 4D scene reconstruction from video. Honorable mentions included Meta’s SAM 3D and NVIDIA’s NitroGen, while TRELLIS.2 won Best Student Paper. The article emphasizes Chinese researcher visibility, ResNet and YOLO receiving the Longuet-Higgins Prize, and a GDUT-led undergraduate-heavy ChordEdit team breaking through among major labs and elite universities.
School shooting survivor sues AI gun detection firm after system failed
Ars Technica AI7 days agoIncident
A teen injured in a January 2025 Nashville high school shooting has sued Omnilert and reseller System Integrations. The lawsuit alleges the company knew or should have known its AI gun detection system could fail under real-world camera, lighting, angle, distance, and visibility limits. The case raises questions about marketing claims, public safety procurement, and accountability when AI security tools fail in emergencies.
DeFlock Hits 100k ALPRs Mapped in USA
Hacker News (AI keywords)14 days agoEthics
A Hacker News post highlights DeFlock reaching 100,000 mapped automated license plate readers in the United States. The original article text was not provided, so the confirmed facts are limited mainly to the title and public context around DeFlock. The item is most relevant to privacy, computer-vision surveillance, civic mapping, and governance rather than new AI models or developer tooling.
鋁價上漲 20%！回收新創公司押注 AI 技術以搶佔綠色金屬商機
TechCrunch AI24 days agoBusiness
Global metal markets have recently seen significant volatility, with aluminum prices surging by 20%. This sharp price increase has created unprecedented…
網路瘋傳！Figure AI 推出人形機器人 24 小時搬包裹直播，引發社群熱烈關注
Ars Technica AI25 days agoRelease
Humanoid robot startup Figure AI recently launched a highly buzzworthy technology showcase: a 24-hour uninterrupted live stream depicting its latest humanoid…
烏克蘭無人機創辦人 Yaroslav Azhnyuk 談自主無人機技術棧與無人機經濟學：西方國家正處於昏睡狀態
Latent Space27 days agoCommentary
In this episode of the Latent Space podcast, the hosts and guest host Noah Smith (author of the well-known economics and technology blog Noahpinion)…
Google DeepMind 發表 Gemini Omni：全新原生全模態模型，實現超低延遲即時影音與語音互動★ 95
Google DeepMind Blog27 days agoRelease
Google DeepMind has officially unveiled its latest flagship AI model, "Gemini Omni." This model represents a major breakthrough by Google in the field of…
Gemini Robotics-ER 1.6 發布：透過強化具身推理，賦能真實世界機器人任務★ 85
Google DeepMind Blog62 days agoRelease
Google DeepMind has officially announced its latest breakthrough in the field of embodied AI — **Gemini Robotics-ER 1.6**. This model is specifically designed…
Sentence Transformers 推出多模態嵌入與重排（Reranker）模型支援★ 78
Hugging Face Blog66 days agoRelease
The popular open-source library `sentence-transformers` from Hugging Face has received a major update, officially introducing native support for Multimodal…
TII 推出全新 Falcon Perception 多模態感知模型★ 75
Hugging Face Blog74 days agoRelease
The Technology Innovation Institute (TII) of the UAE has officially announced the launch of its new "Falcon Perception" model on the Hugging Face blog. As an…
ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75
Import AI (Jack Clark)90 days agoCommentary
This issue of Import AI (No. 449) dives deep into several core frontier topics in the current AI landscape, spanning technical breakthroughs and broad…
D4RT：讓 AI 學會用四維視角觀察世界，動態 4D 重建與追蹤速度提升高達 300 倍★ 80
Google DeepMind Blog149 days agoRelease
Google DeepMind has published a new technology called D4RT, designed to enable artificial intelligence to understand and reconstruct the dynamic world we live…
在 Replicate 上運行 Isaac 0.1：專為真實世界感知設計的輕量級具身視覺語言模型
Replicate Blog200 days agoRelease
The cloud AI model deployment and hosting platform Replicate has officially announced support for running the new lightweight vision-language model (VLM) —…
Google DeepMind 新研究：教導 AI 像人類一樣理解與組織視覺世界★ 75
Google DeepMind Blog215 days agoPaper
Google DeepMind has recently published an important study examining the fundamental differences between how AI systems and humans "organize and understand the…
Google DeepMind 運用 AI 繪製、模擬並理解自然生態：守護森林與聆聽鳥鳴★ 75
Google DeepMind Blog221 days agoOpinion
Google DeepMind recently published a feature article exploring how artificial intelligence (AI) can address the dual challenges of global climate change and…
Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75
Google DeepMind Blog233 days agoNew Tool
Google DeepMind recently unveiled a new experimental AI tool called "Backstory," designed to help internet users deeply explore and understand the background…
Hugging Face 探討「AI 應對食物過敏」：開源技術如何守護飲食安全
Hugging Face Blog240 days agoOpinion
Hugging Face recently published a feature article on "AI for Food Allergies" in its "Hugging Science" column. Food allergies are a global health concern…
Arm 與 Hugging Face 聯手推出「Neural Super Sampling」！加速行動端與邊緣設備的 AI 圖像超取樣★ 75
Hugging Face Blog306 days agoRelease
Arm and Hugging Face have announced a collaboration to launch "Neural Super Sampling (NSS)" technology and related models, officially bringing AI-driven image…
Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80
Hugging Face Blog398 days agoOpinion
With the explosion of multimodal technology, Vision Language Models (VLMs) have evolved from laboratory research prototypes into core tools for enterprises and…
Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器★ 80
Hugging Face Blog478 days agoRelease
Google has officially launched SigLIP 2, a major upgrade to its widely popular SigLIP (Sigmoid Loss for Language-Image Pre-training) vision-language encoder…
Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80
Hugging Face Blog480 days agoRelease
Google has officially launched the PaliGemma 2 Mix model series — a new family of open-source instruction-tuned vision-language models (VLMs) now available on…
Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80
Hugging Face Blog506 days agoRelease
On January 24, 2025, Hugging Face announced that smolagents — its open-source library designed for building lightweight, high-performance AI agents — now…
Timm ❤️ Transformers：現在可在 Transformers 中直接使用任何 timm 視覺模型★ 80
Hugging Face Blog514 days agoRelease
The official Hugging Face blog has announced exciting news for the computer vision (CV) community: the popular PyTorch image model library `timm` (PyTorch…
Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80
Hugging Face Blog556 days agoRelease
Google and Hugging Face have jointly announced the release of a new generation of open-weight vision-language model (VLM) — PaliGemma 2. This model continues…
Hugging Face 推出適用於文件圖像的 TextImage 數據增強技術 (TextImage Augmentation)★ 75
Hugging Face Blog677 days agoNew Tool
### Solving Real-World Document AI Pain Points In the fields of Document AI and OCR (Optical Character Recognition), datasets used in academic research or…
微調 Microsoft Florence-2：微軟頂尖視覺語言模型實戰指南★ 80
Hugging Face Blog720 days agoTutorial
Microsoft open-sourced Florence-2 in June 2024 — a vision-language model (VLM) based on a sequence-to-sequence architecture. Despite its compact size (the Base…
Google 推出 PaliGemma：結合 SigLIP 與 Gemma 的開源視覺語言模型★ 80
Hugging Face Blog761 days agoRelease
Google has officially launched PaliGemma, a powerful yet lightweight open-source Vision-Language Model (VLM). The release of PaliGemma represents a significant…
視覺語言模型（VLM）原理解析：從架構、訓練到應用指南★ 80
Hugging Face Blog794 days agoTutorial
This technical blog post published by Hugging Face provides an accessible yet thorough breakdown of the core principles and applications of Vision Language…

Page 1Next →

Latest in AI

SCAIL-2: Open-Source End-to-End Character Animation Without Intermediate Pose Representations

Unlocking VLM Potential on Satellite Imagery Through Fine-Tuning

CVPR 2026 Highlights Guangdong as He Kaiming and GDUT Team Stand Out★ 76

School shooting survivor sues AI gun detection firm after system failed

DeFlock Hits 100k ALPRs Mapped in USA

鋁價上漲 20%！回收新創公司押注 AI 技術以搶佔綠色金屬商機

網路瘋傳！Figure AI 推出人形機器人 24 小時搬包裹直播，引發社群熱烈關注

烏克蘭無人機創辦人 Yaroslav Azhnyuk 談自主無人機技術棧與無人機經濟學：西方國家正處於昏睡狀態

Google DeepMind 發表 Gemini Omni：全新原生全模態模型，實現超低延遲即時影音與語音互動★ 95

Gemini Robotics-ER 1.6 發布：透過強化具身推理，賦能真實世界機器人任務★ 85

Sentence Transformers 推出多模態嵌入與重排（Reranker）模型支援★ 78

TII 推出全新 Falcon Perception 多模態感知模型★ 75

ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75

D4RT：讓 AI 學會用四維視角觀察世界，動態 4D 重建與追蹤速度提升高達 300 倍★ 80

在 Replicate 上運行 Isaac 0.1：專為真實世界感知設計的輕量級具身視覺語言模型

Google DeepMind 新研究：教導 AI 像人類一樣理解與組織視覺世界★ 75

Google DeepMind 運用 AI 繪製、模擬並理解自然生態：守護森林與聆聽鳥鳴★ 75

Google DeepMind 推出實驗性 AI 工具「Backstory」，幫助使用者探索網路圖片的背景與來源★ 75

Hugging Face 探討「AI 應對食物過敏」：開源技術如何守護飲食安全

Arm 與 Hugging Face 聯手推出「Neural Super Sampling」！加速行動端與邊緣設備的 AI 圖像超取樣★ 75

Hugging Face 釋出 2025 視覺語言模型（VLM）指南：更強、更快、更實用的開源新時代★ 80

Google 推出 SigLIP 2：更強大的多語言視覺語言編碼器★ 80

Google 推出 PaliGemma 2 Mix：全新指令微調視覺語言模型★ 80

Hugging Face 輕量級 Agent 框架 smolagents 正式支援視覺語言模型 (VLM)！★ 80

Timm ❤️ Transformers：現在可在 Transformers 中直接使用任何 timm 視覺模型★ 80

Google 推出全新視覺語言模型 PaliGemma 2：基於 Gemma 2 的多模態輕量級模型★ 80

Hugging Face 推出適用於文件圖像的 TextImage 數據增強技術 (TextImage Augmentation)★ 75

微調 Microsoft Florence-2：微軟頂尖視覺語言模型實戰指南★ 80

Google 推出 PaliGemma：結合 SigLIP 與 Gemma 的開源視覺語言模型★ 80

視覺語言模型（VLM）原理解析：從架構、訓練到應用指南★ 80