SCAIL-2 by zai-org removes the reliance on skeleton maps and inpainting masks common in prior character animation pipelines, driving characters directly from video in an end-to-end manner. Trained on 60K synthesized motion pairs using SCAIL-Preview, Wan-Animate, and MoCha via a Unified Motion Transfer Interface with RoPE design, the model develops emergent abilities beyond its teacher models. Capabilities include cross-identity character replacement, animal-driving scenarios, and zero-shot support for SAM3D-Body mesh rendering.
Latent Space interviews Ethan He, who led Grok Imagine at xAI, about building the product in three months. The episode contrasts video generation with world models and explores why video agent models may become an important next step. It also argues that Grok Imagine remains underrated, while the supplied description does not include architecture details or benchmark results.
Overworld has officially launched a new model called "Waypoint-1" on the Hugging Face platform. It is a world model focused on "Real-time Interactive Video…
With the rise of open-source video generation models such as LTX-Video, HunyuanVideo, and CogVideoX, building high-quality training datasets has become the…