Core Concepts
Episodes, speakers, generation modes, and data flow in ListenHub OpenAPI.
Basic Concepts
- Episode — The basic content unit in ListenHub. Each episode has a unique
episodeIdand contains audio, scripts, and metadata. - Speaker — Defines the voice characteristics used for generation. Identified by
speakerId, with attributes such as language and gender. CallGET /v1/speakers/listto browse available voices, or see the Speakers API reference.
Generation Modes
| Mode | Sub-mode | Description | Generation Time | API Endpoint |
|---|---|---|---|---|
| Podcast | quick | Faster generation prioritizing efficiency; best for news briefs and time-sensitive content | 1-2 min | /v1/podcast/episodes |
| debate | Two-host debate format; best for opinion discussions and multi-angle analysis | 2-4 min | ||
| deep | In-depth analysis with higher content quality; best for professional knowledge sharing and deep commentary | 2-4 min | ||
| Text to Speech | smart | AI optimizes content before synthesis; best for fixing awkward sentences and typos | 1-2 min | /v1/flow-speech/episodes |
| direct | Direct text-to-speech conversion; best for well-prepared scripts and announcements | 1-2 min | ||
| Content Extract | — | Async URL content extraction; best for article parsing, research, and content analysis | 10-30 sec | /v1/content/extract |
Podcast mode supports 1-2 speakers (single or dual host). Debate mode requires exactly 2 speakers.
Output Types
Each generated episode provides two types of data: script text and audio files.
Script Stream (Server-Sent Events)
While audio is being generated, you can retrieve outline and script data via SSE without waiting for the audio to finish:
- Podcast: available 20-60 seconds after creation
- Text to Speech: available ~3 seconds after creation
Audio Files
Once generation completes, the response includes:
| Field | Format | Description |
|---|---|---|
audioStreamUrl | M3U8 | Streaming playback, best for real-time use |
audioUrl | MP3 | Full file download, best for offline use |
Playground
ListenHub provides an online Playground for testing multi-speaker speech synthesis without writing code.
URL: Multi-speaker TTS Playground
- Multi-role dialogue — generate audio with multiple voices in a single request
- Flexible assignment — assign a different speaker to each script line
- Instant preview — edit scripts online and listen to results immediately
Suitable for audiobook/radio drama production, conversational content generation, and rapid product demo creation.
Next Steps
- Quick Start — Make your first API call in 5 minutes
- Authentication — Base URL, API key, and rate limits
- Podcast Generation API — Full parameters and response reference