Creator
Generate multi-platform content packages — articles, image cards, or narration scripts with illustrations and audio — from any topic, URL, or audio file.
Creator is a ListenHub Skill — an AI agent workflow with access to file system, web browsing, image generation, and other tools. Give it a topic, URL, or audio file, and it produces a complete content package — WeChat articles with illustrations, Xiaohongshu image cards, or narration scripts with optional TTS. All outputs land in a local folder: text, images, and metadata, ready to publish.
Quick Example
WeChat article from a URL:
Write a WeChat article based on this link https://mp.weixin.qq.com/s/xxxCreator shows a summary and waits for your confirmation:
Platform: WeChat
Source: https://mp.weixin.qq.com/s/xxx (article extraction)
Preset: Flat illustration
Output: ./ai-trends-wechat/
APIs used: Content Extraction, Image GenerationOnce you confirm, Creator runs the full pipeline. After a few minutes, you get:
ai-trends-wechat/
├── article.md # Full article with image references
├── images/
│ ├── cover.jpg # AI-generated cover
│ ├── section-1.jpg
│ └── section-2.jpg
└── meta.json # Title, summary, tagsXiaohongshu cards from a topic:
Create Xiaohongshu cards about must-have items for solo apartment livingsolo-living-xiaohongshu/
├── cards/
│ ├── 01-cover.jpg
│ ├── 02-page.jpg
│ ├── ...
│ └── prompts.json
├── long-text.md # Post text with hashtags
└── meta.jsonPlatforms
Creator writes a structured, long-form WeChat article with clear headings and concise paragraphs, and generates a cover image plus section illustrations to match.
Three visual presets are available for illustrations — Flat, Watercolor, and Photo-Realistic. Creator picks one based on your topic, or you can specify: "use the watercolor preset".
Creator produces Xiaohongshu content in two formats: image cards (5–8 designed pages with bold text and visuals) and a long-form post (hook-first, with hashtags). You get both by default, or can request just one.
Ten visual presets are available for cards, ranging from minimal to retro to pop. Creator auto-selects based on content, or you can specify: "use the Notion preset".
Creator writes a conversational, spoken-word script with natural pacing and clear structure. Optionally, it generates a TTS audio file using an AI voice.
Supported Inputs
| Input | Example | What Creator does |
|---|---|---|
| URL (article/page) | https://mp.weixin.qq.com/s/xxx | Extracts content via API, uses it as source material |
| URL (audio/video) | A YouTube or Bilibili link | Downloads and transcribes locally, writes from the transcript |
| Local audio file | meeting.mp3 | Transcribes with local ASR, writes from the transcript |
| Text | A pasted paragraph or document | Uses the text as source material |
| Topic | "AI in education" | Generates content from scratch |
Audio/video transcription runs locally via coli. Install with npm i -g @marswave/coli. No API key needed for transcription.
API Key
Image generation, content extraction from URLs, and TTS require a ListenHub API key. Creator checks at the confirmation step and walks you through setup if needed.
Text-only pipelines (e.g., topic → narration script without audio) work without an API key.
Style Learning
Learn from a reference: Share an article, post, or script you like — say "use this as a style reference". Creator extracts the writing style and applies it to future generations.
Set rules directly: After reviewing output, tell Creator what to adjust. It saves these as persistent style rules:
- "remember: keep WeChat paragraphs short"
- "narration scripts should be under 800 words"
- "小红书少用 emoji"
Style rules are saved per platform in .listenhub/creator/styles/ under your current working directory and apply automatically to future generations. To reset: "reset WeChat style" or "重置公众号风格偏好".
Trigger
Type /creator to invoke directly, or describe what you want in natural language — Creator activates when it recognizes a content generation request (e.g., "write a WeChat article", "帮我写篇公众号").
API Reference
See the OpenAPI documentation for details on the underlying content extraction, image generation, and TTS endpoints.