Image Generation

Generate AI images from text prompts with optional reference images for style guidance.

Generate AI images from text descriptions. Supports multiple resolutions, aspect ratios, and optional reference images for style guidance. Images are saved as local files.

Trigger

Invoke this skill with /image-gen, or use any of these phrases:

Phrase	Language
`generate an image` / `generate image`	English
`draw` / `visualize` / `create picture`	English
`生成图片` / `画一张`	Chinese
`AI图` / `配图`	Chinese

Requires ListenHub Skills to be installed — see Getting Started.

Quick Example

Generate an image: cyberpunk city at night, 16:9, 2K

The AI collects your preferences and generates the image.

Parameters

Parameter	Options	Default
Model	🍌 `pro` (`gemini-3-pro-image`, higher quality, recommended), ⚡️ `flash` (`gemini-3.1-flash-image`, faster and cheaper)	—
Resolution	`1K`, `2K` (recommended), `4K`	—
Aspect ratio	`16:9`, `1:1`, `9:16`, `2:3`, `3:2`, `3:4`, `4:3`, `21:9`; `flash` also supports `1:4`, `4:1`, `1:8`, `8:1`	—
Reference images	Up to 14, via image URL or base64	None

Writing Good Prompts

A good prompt covers these elements:

Subject — what is in the image
Style — art style or visual treatment
Composition — how elements are arranged
Lighting/Mood — atmosphere and time of day
Quality — detail level and rendering quality

Examples

Basic:

a cat sitting on a windowsill

Better:

a fluffy orange tabby cat sitting on a sunny windowsill, warm afternoon light, cozy interior, highly detailed, photorealistic

Style Keywords

Style	Keywords
Photorealistic	photorealistic, highly detailed, 8K, professional photography
Cyberpunk	neon lights, futuristic, dystopian, rain-slicked streets
Ink painting	Chinese ink painting, traditional art style, brush strokes
Watercolor	watercolor painting, soft edges, flowing colors
Anime	anime style, Japanese animation, cel shading
Minimalist	minimalist, clean lines, simple composition, white space

Always write prompts in English — the image model is trained on English descriptions. If you describe in Chinese, the AI translates automatically.

Reference Images

Reference images guide the AI on style, not content. Your prompt still controls what appears in the image.

Using Image URLs

Upload your reference to an image hosting service (imgbb.com, sm.ms, postimages.org)
Copy the direct image URL (ending in .jpg, .png, .webp, or .gif)
Provide the URL when the AI asks about references

Using Base64 Inline Data (API)

When calling the Image Generation API directly, you can also provide reference images as base64-encoded data via the inlineData field — no image hosting required. This is useful for programmatic workflows where you already have the image in memory.

Each reference image must use exactly one of fileData (URL) or inlineData (base64), not both. See the API reference for request format and code examples.

Output

Output behavior follows the outputMode set during config:

inline (default) — the image is displayed directly in the conversation
download — saved to .listenhub/image-gen/YYYY-MM-DD-{id}/ in the current project
both — displayed inline and saved locally

The output mode can be changed at any time by saying "reconfigure" when the AI shows your current config.

API Reference

See the Image Generation API reference for endpoint details, request parameters, and code examples.

Image Generation

On this page