How reference images work
Reference images guide the AI’s visual output. The AI analyzes your references and preserves their key visual characteristics in the generated video:- Character appearance — face, clothing, body type, and distinctive features
- Environment style — architecture, lighting, color palette, and atmosphere
- Prop details — shape, color, texture, and proportions
- Visual style — artistic approach, rendering quality, and overall aesthetic
Creating a reference video
Select reference images
Choose 1-3 images from your canvas or asset library. These define the visual identity for your video. You can use generated images, uploaded photos, or a mix of both.
Describe the action
Tell the AI agent what should happen in the shot — camera movement, character action, environment changes. Use filmmaking language for precision: “slow dolly in”, “rack focus to background”, “character turns to face camera”.
Best practices for reference images
- Use clear, well-lit images — avoid noisy, blurry, or heavily compressed references
- Show the full subject — a full-body character shot works better than a cropped headshot
- Use a clean background — solid or simple backgrounds help the AI isolate the subject. Use Remove background if needed.
- Match your target style — if your project is photorealistic, use photorealistic references. If it’s illustrated, use illustrated references.
- One subject per reference — each reference image should feature a single character, prop, or scene — not a group
Building continuity across shots
To maintain visual consistency across multiple video clips:- Create a reference sheet — generate or upload clear images of each key character and environment
- Reuse the same references — select the same reference images each time you generate a new shot with that character or setting
- Use frames to organize — group reference images for each character or scene in a frame on the canvas
- Be consistent with style direction — use the same style descriptions across shots, or save them as a skill
Specs
| Spec | Value |
|---|---|
| Aspect ratio | 16:9 (widescreen) |
| Resolution | 720p (default) or 1080p |
| AI-generated audio | Optional |
| Duration | 5-8 seconds per clip |
| Cost | 840 credits per clip |

