TTS

Convert text to natural-sounding speech — single voice narration or multi-character dialogue dubbing.

Convert text or URL content into natural-sounding speech audio. Two modes: single-voice narration for everyday reading and casual TTS, and multi-character scripts for dialogue and dubbed content.

Trigger

Invoke this skill with /tts, or use any of these phrases:

Phrase	Language
`read aloud` / `read this aloud`	English
`TTS` / `text to speech`	English
`voice narration`	English
`朗读这段`	Chinese
`配音` / `语音合成`	Chinese

Requires ListenHub Skills to be installed — see Getting Started.

Quick Example

Read this article aloud: https://en.wikipedia.org/wiki/Podcast

The AI fetches the content, selects a voice, and generates natural speech audio.

When to Use TTS vs Podcast

Both skills can produce multi-speaker audio, but they serve different purposes:

Use case	Skill
Topic-based discussion with natural conversation flow	Podcast
Precise control over every line and speaker	TTS (Multi-Character)
Reading an article or text aloud	TTS (Single Voice)

Two Modes

Convert text or URL content to speech with a single voice. Fast and simple (~1-2 minutes).

Best for reading articles aloud, casual TTS conversion, and everyday voice narration.

Processing modes:

Mode	Description
`direct`	Reads text exactly as provided (default)
`smart`	Auto-fixes grammar and punctuation before reading

Multi-character audio with per-segment voice assignment. Moderate speed (~2-3 minutes).

Best for dialogue dubbing, multi-character narration, and scripted voiced content.

Script format:

{
  "scripts": [
    {"content": "Hello everyone, welcome to the show.", "speakerId": "cozy-man-english"},
    {"content": "Thanks for having me!", "speakerId": "travel-girl-english"}
  ]
}

Each segment is spoken by the assigned speaker in order.

Parameters

Parameter	Options	Default
Input	Text or URL	—
Language	`zh` (Chinese), `en` (English)	Auto-detected
Mode	`direct`, `smart` (Single Voice only)	`direct`
Path	Single Voice, Multi-Character Script	Single Voice

When to Use Each Mode

Scenario	Mode
Read an article or text aloud	Single Voice
Casual TTS conversion	Single Voice
Dialogue with multiple characters	Multi-Character Script
Precise per-line voice control	Multi-Character Script

Multi-Speaker Script Tips

Keep segments at natural speech boundaries (sentences or paragraphs)
Alternate speakers for a dialogue feel
Each speakerId must be a valid ID from the speakers API
All speakers should share the same language

Limits

FlowTTS text input: max 10,000 characters
For longer content, use a URL input — the API fetches and processes it automatically

Output

After generation:

Listen link — stream on ListenHub
Audio download — say "download audio" to save locally

API Reference

See the TTS API endpoints for technical details.

TTS

On this page