A Reddit post highlights a new infographic-specific fine-tune for SenseNova U1-8B-MoT, trained with an extended multi-task phase for structured visual output. The reported benchmarks show large gains in IGenBench infographic accuracy and chart understanding, with smaller improvement in text rendering. Aesthetic score appears roughly unchanged, suggesting the update mainly improves information structure and visual reasoning rather than overall visual polish.
ByteDance’s commercial technology team has open-sourced Bernini, a unified framework for AI video generation and editing. Its design separates semantic planning from visual rendering: an MLLM-based planner understands text, source videos, images, and video references, then a DiT-based renderer produces the final video. The released Bernini-R includes inference code and weights, while the full planner-enabled version is still being prepared.
Google recently unveiled a brand-new "anything-to-anything" multimodal AI model — Gemini Omni — whose powerful cross-modal generation and transformation…
Google DeepMind announced today (February 18, 2026) that its popular AI assistant application Gemini has officially integrated its most advanced music…
Google DeepMind has unveiled a new model called "Nano Banana Pro," which is also the Pro-tier image model of the Gemini 3 generation (Gemini 3 Pro Image…