Image Generation
Generate AI images from text prompts with optional reference images for style guidance.
Generate AI images from text descriptions. Supports multiple resolutions, aspect ratios, and optional reference images for style guidance. Images are saved as local files.
Trigger
Invoke this skill with /image-gen, or use any of these phrases:
| Phrase | Language |
|---|---|
generate an image / generate image | English |
draw / visualize / create picture | English |
生成图片 / 画一张 | Chinese |
AI图 / 配图 | Chinese |
Requires ListenHub Skills to be installed — see Getting Started.
Quick Example
Generate an image: cyberpunk city at night, 16:9, 2KThe AI collects your preferences and generates the image.
Parameters
| Parameter | Options | Default |
|---|---|---|
| Model | 🍌 pro (gemini-3-pro-image-preview, higher quality, recommended), ⚡️ flash (gemini-3.1-flash-image-preview, faster and cheaper) | — |
| Resolution | 1K, 2K (recommended), 4K | — |
| Aspect ratio | 16:9, 1:1, 9:16, 2:3, 3:2, 3:4, 4:3, 21:9; flash also supports 1:4, 4:1, 1:8, 8:1 | — |
| Reference images | Up to 14, via image URL or base64 | None |
Writing Good Prompts
A good prompt covers these elements:
- Subject — what is in the image
- Style — art style or visual treatment
- Composition — how elements are arranged
- Lighting/Mood — atmosphere and time of day
- Quality — detail level and rendering quality
Examples
Basic:
a cat sitting on a windowsillBetter:
a fluffy orange tabby cat sitting on a sunny windowsill, warm afternoon light, cozy interior, highly detailed, photorealisticStyle Keywords
| Style | Keywords |
|---|---|
| Photorealistic | photorealistic, highly detailed, 8K, professional photography |
| Cyberpunk | neon lights, futuristic, dystopian, rain-slicked streets |
| Ink painting | Chinese ink painting, traditional art style, brush strokes |
| Watercolor | watercolor painting, soft edges, flowing colors |
| Anime | anime style, Japanese animation, cel shading |
| Minimalist | minimalist, clean lines, simple composition, white space |
Always write prompts in English — the image model is trained on English descriptions. If you describe in Chinese, the AI translates automatically.
Reference Images
Reference images guide the AI on style, not content. Your prompt still controls what appears in the image.
Using Image URLs
- Upload your reference to an image hosting service (imgbb.com, sm.ms, postimages.org)
- Copy the direct image URL (ending in
.jpg,.png,.webp, or.gif) - Provide the URL when the AI asks about references
Using Base64 Inline Data (API)
When calling the Image Generation API directly, you can also provide reference images as base64-encoded data via the inlineData field — no image hosting required. This is useful for programmatic workflows where you already have the image in memory.
Each reference image must use exactly one of fileData (URL) or inlineData (base64), not both. See the API reference for request format and code examples.
Output
Output behavior follows the outputMode set during config:
inline(default) — the image is displayed directly in the conversationdownload— saved to.listenhub/image-gen/YYYY-MM-DD-{id}/in the current projectboth— displayed inline and saved locally
The output mode can be changed at any time by saying "reconfigure" when the AI shows your current config.
API Reference
See the Image Generation API reference for endpoint details, request parameters, and code examples.