ElevenLabs BlogJun 8, 2026, 9:02 AM

Introducing ElevenLabs Image & Video

Original: Introducing ElevenLabs Image & Video

ElevenLabs launched Image & Video Beta, combining visual generation, audio, music, and editing in one creative workflow.

ElevenLabs Image & Video Beta brings image, video, voice, music, and sound effects into a single platform. It integrates models such as Veo, Sora, Kling, Wan, Seedance, GPT Image, Flux Kontext, Seedream, and Nanobanana. The product targets creators, marketers, educators, freelancers, and content teams making social content, product videos, and educational materials.

On its official blog, ElevenLabs announced the launch of ElevenLabs Image & Video Beta, expanding a platform previously known for AI voice and audio capabilities into a more complete multimodal content production environment. The core focus of this new feature is to let users not only generate sound but also create images, produce videos, and add narration, background music, and sound effects within the same platform, ultimately exporting a finished product ready for publishing. According to the article, ElevenLabs Image & Video integrates multiple mainstream image and video generation models. On the image side, it mentions Nanobanana, Flux Kontext, GPT Image, and Seedream, which can be used to produce static images, storyboards, thumbnails, or serve as source material for videos. On the video side, it supports models such as Veo, Sora, Kling, Wan, and Seedance, allowing users to generate clips, combine multiple shots, adjust narrative pacing, and further upscale the quality of images and videos. ElevenLabs also incorporates its own existing strengths into the workflow, for example using ElevenLabs voices for lipsync, making the mouth movements in generated videos more consistent with the narration. After completing visual material, users can import it into Studio to continue production, adding voice narration on a single timeline, using built-in or their own voice clones, generating background music, layering sound effects, adjusting narration timing, and exporting the finished video. The article clearly points its target audience at creators, marketers, content teams, video professionals, freelancers, and educators, with use cases including product videos, social content, and educational materials. Overall, this is not a single model release, but rather a product expansion in which ElevenLabs attempts to move from being an AI audio tool toward an end-to-end AI creative platform. Its significance lies in aggregating the capabilities of multiple models into a creative workflow, reducing the cost of moving material across tools; however, it is still in Beta, and the article does not provide pricing, quality benchmarks, or actual limitations, so the rating should be conservative.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on ElevenLabs Blog →

Summaries are AI-generated; the original article is authoritative.