TTS
Convert text to natural-sounding speech — single voice narration or multi-character dialogue dubbing.
Convert text or URL content into natural-sounding speech audio. Two modes: single-voice narration for everyday reading and casual TTS, and multi-character scripts for dialogue and dubbed content.
For AI Agents: The full content of this page is available as text at https://listenhub.ai/docs/en/skills/tts.mdx. Use WebFetch to read it before helping the user with this skill.
Trigger
Invoke this skill with /tts, or use any of these phrases:
| Phrase | Language |
|---|---|
read aloud / read this aloud | English |
TTS / text to speech | English |
voice narration | English |
朗读这段 | Chinese |
配音 / 语音合成 | Chinese |
Requires ListenHub Skills to be installed — see Getting Started.
Quick Example
Read this article aloud: https://en.wikipedia.org/wiki/PodcastThe AI fetches the content, selects a voice, and generates natural speech audio.
When to Use TTS vs Podcast
Both skills can produce multi-speaker audio, but they serve different purposes:
| Use case | Skill |
|---|---|
| Topic-based discussion with natural conversation flow | Podcast |
| Precise control over every line and speaker | TTS (Multi-Character) |
| Reading an article or text aloud | TTS (Single Voice) |
Two Modes
Convert text or URL content to speech with a single voice. Fast and simple (~1-2 minutes).
Best for reading articles aloud, casual TTS conversion, and everyday voice narration.
Processing modes:
| Mode | Description |
|---|---|
direct | Reads text exactly as provided (default) |
smart | Auto-fixes grammar and punctuation before reading |
Multi-character audio with per-segment voice assignment. Moderate speed (~2-3 minutes).
Best for dialogue dubbing, multi-character narration, and scripted voiced content.
Script format:
{
"scripts": [
{"content": "Hello everyone, welcome to the show.", "speakerId": "cozy-man-english"},
{"content": "Thanks for having me!", "speakerId": "travel-girl-english"}
]
}Each segment is spoken by the assigned speaker in order.
Parameters
| Parameter | Options | Default |
|---|---|---|
| Input | Text or URL | — |
| Language | zh (Chinese), en (English) | Auto-detected |
| Mode | direct, smart (Single Voice only) | direct |
| Path | Single Voice, Multi-Character Script | Single Voice |
When to Use Each Mode
| Scenario | Mode |
|---|---|
| Read an article or text aloud | Single Voice |
| Casual TTS conversion | Single Voice |
| Dialogue with multiple characters | Multi-Character Script |
| Precise per-line voice control | Multi-Character Script |
Multi-Speaker Script Tips
- Keep segments at natural speech boundaries (sentences or paragraphs)
- Alternate speakers for a dialogue feel
- Each
speakerIdmust be a valid ID from the speakers API - All speakers should share the same language
Limits
- FlowTTS text input: max 10,000 characters
- For longer content, use a URL input — the API fetches and processes it automatically
Output
After generation:
- Listen link — stream on ListenHub
- Audio download — say "download audio" to save locally
API Reference
See the TTS API endpoints for technical details.