Avataar AI has launched Varya, a video generation model built from Alibaba’s open Wan 2.2 model and distilled for faster, cheaper output. The company says Varya can generate 5-second 720p clips on an NVIDIA H200 in 45 seconds, versus 1,230 seconds for Wan 2.2. Avataar plans to release the model and training data through India’s AI Kosh portal while offering hosted access at about $0.005 per second.
SCAIL-2 by zai-org removes the reliance on skeleton maps and inpainting masks common in prior character animation pipelines, driving characters directly from video in an end-to-end manner. Trained on 60K synthesized motion pairs using SCAIL-Preview, Wan-Animate, and MoCha via a Unified Motion Transfer Interface with RoPE design, the model develops emergent abilities beyond its teacher models. Capabilities include cross-identity character replacement, animal-driving scenarios, and zero-shot support for SAM3D-Body mesh rendering.
Vercel’s changelog points to Grok Imagine Video 1.5 becoming available through AI Gateway. The public model page lists the preview model as xai/grok-imagine-video-1.5-preview and marks it primarily for image-to-video generation. Because the source text is unavailable, concrete claims about quality, speed, audio, editing, or text-to-video improvements should not be inferred.
Latent Space interviews Ethan He, who led Grok Imagine at xAI, about building the product in three months. The episode contrasts video generation with world models and explores why video agent models may become an important next step. It also argues that Grok Imagine remains underrated, while the supplied description does not include architecture details or benchmark results.
xAI has released Grok Imagine Video 1.5, a model that animates a still image into a short video clip. It generates synchronized audio during the same pass, combining visual animation and sound creation in one workflow. The Replicate Blog post focuses on prompting techniques intended to help users get more from the model.
Overworld has officially launched a new model called "Waypoint-1" on the Hugging Face platform. It is a world model focused on "Real-time Interactive Video…
Google announced new generative media models and tools at I/O 2025, led by Veo 3 for video, Imagen 4 for images, and Flow for AI filmmaking. Veo 3 adds audio generation, while Imagen 4 improves detail, typography, aspect ratios, and up to 2K output. Google also expanded Lyria 2 and Lyria RealTime access, while continuing SynthID watermarking and launching SynthID Detector.
With the rise of open-source video generation models such as LTX-Video, HunyuanVideo, and CogVideoX, building high-quality training datasets has become the…