Generating Images & Videos
Your agent can generate images and short video clips on demand — including 30-second vertical ads with a host speaking your script in their own voice and lip-sync.
Why use MorphMind for this
You don't have to learn a new tool, juggle three providers, or remember which model pairs with which. The agent handles all of that. What you get back is a workflow you keep:
- Build the recipe once. Your "30-second ad" workflow holds your brand voice, host avatar, framing, and pacing. Run it again with a new product brief and you get a new ad in the same look.
- Workflows and Specialists you can reuse. A custom step you teach the agent — a script polisher, a storyboard formatter, a brand-color enforcer — lives on and gets better with each run.
- Mass production by varying inputs. Same workflow, ten products, ten ads. Same workflow, ten languages, ten localized versions.
- Memory of what works. The agent learns which prompts and reference modes give you the result you want, and uses them next time without being asked.
This page covers the building blocks: what models are available, when to use which, and roughly how many credits each generation costs.
Pick a model
The agent picks one automatically when you describe what you want. You can override.
Images
| Model | Best for |
|---|---|
| Seedream 5.0 Lite | Digital avatars, especially when the same character will later appear in a video. |
| Gemini Image | Quick illustrations, banners, blog hero images. |
| GPT Image 2 — Draft | Rough sketches; cheapest and fastest. |
| GPT Image 2 — Standard | Polished general-purpose work. |
Video
| Model | Best for |
|---|---|
| Seedance Cinematic | Top quality with native synced audio. 480p / 720p / 1080p, up to 15s per clip. The default for finished work. |
| Seedance Quick Draft | Cheaper, faster iteration. Capped at 720p. Use for rough cuts before committing to Cinematic. |
The Seedream → Seedance rule
If your video has a human host that needs to stay visually consistent across multiple clips, generate the avatar with Seedream 5.0 Lite and reuse the same image as the reference for every clip. Avatars from Gemini Image or GPT Image 2 break Seedance's character consistency.
Reference modes for video
- Single image — pass one avatar or scene reference. Default for ads with a recurring character. Most reliable.
- Two images (start + end keyframes) — Seedance interpolates motion between them. Works well for non-human scenes; avoid with human hosts.
- Video reference — pass a short trim of an existing clip as motion/context. Same caveat as the two-image mode.
Roughly how many credits
Real cost varies by prompt and retries. Use these as a rough guide.
Per image
| Model | Credits |
|---|---|
| Seedream 5.0 Lite | ~10 |
| Gemini Image | ~10 |
| GPT Image 2 — Draft | ~5 |
| GPT Image 2 — Standard | ~15 |
Per Seedance clip (vertical 9:16)
| Spec | Credits |
|---|---|
| Cinematic 720p / 5s | ~150 |
| Cinematic 720p / 10s | ~300 |
| Cinematic 1080p / 5s | ~400 |
| Cinematic 1080p / 10s | ~750 |
| Cinematic 1080p / 15s | ~1,100 |
| Cinematic 1080p / 10s with video reference | ~550 |
| Quick Draft 720p / 5s | ~150 |
| Quick Draft 720p / 10s | ~250 |
Common deliverables
- 30-second vertical ad (3 × 10s Cinematic 1080p + Seedream avatar) → ~2,200 credits
- 60-second vertical ad (5 × 12s Cinematic 1080p + avatar) → ~4,300 credits
A retried clip — content moderation block, silent audio — re-charges that one clip's cost.
Tips & common gotchas
- For lip-synced speech, put the line in quotes in the prompt: "...host looks at camera and says 'Coffee in 30 seconds.'" Without the quoted line, Seedance produces ambient audio only — the mouth moves but no clear speech.
- Brand logos go in post, not in the prompt. Asking the model to render a wordmark produces blurry text and often trips the safety filter. Composite logos onto the finished video as an overlay.
- Avatars must be Seedream-lineage for video. Generating an avatar with another image model and then trying to use it as a Seedance reference will break consistency.
- Continuity comes from prompts, not from chaining. Match the host's pose and eyeline at the end of one beat and the start of the next; soft 0.5-second cross-dissolves at the cuts handle the rest.
See also
- Pricing — how credits work in general
- What Can AI Agents Do? — other capabilities
- Specialist Skills — extend an agent with custom tools