Core Concepts
Key entities, generation modes, and data flow in ListenHub OpenAPI.
2. Core Concepts
1. Basic Terms
- Episode: The basic content unit in ListenHub
- Each episode has a unique
episodeId - Contains audio, scripts, and metadata
- Each episode has a unique
- Speaker: Defines the voice characteristics used for generation
- Identified by
speakerId - Includes attributes such as language and gender
- How to get speakers: call
GET /v1/speakers/list
- Identified by
2. Generation Modes
| Mode | Sub-mode | Characteristics | Typical use cases | Generation time | API endpoint |
|---|---|---|---|---|---|
| Podcast | deep | In-depth analysis with higher content quality | Professional knowledge sharing, deep commentary | 2-4 min | /v1/podcast/episodes |
| quick | Faster generation with efficiency priority | News briefs, time-sensitive content | 1-2 min | ||
| debate | Two-host debate style output | Opinion discussions, multi-angle analysis | 2-4 min | ||
| FlowSpeech | smart | AI improves readability and fixes text issues | Fix awkward sentences and typos | 1-2 min | /v1/flow-speech/episodes |
| direct | Direct text-to-speech conversion | Well-prepared scripts and announcements | 1-2 min |
Important: Podcast mode supports selecting 1-2 speakers.
3. Data Stream Types
- Text stream (Server-Sent Events format)
- Podcast: outline and scripts are usually available after 20-60 seconds
- FlowSpeech: outline and scripts are usually available after about 3 seconds
- Audio outputs
- Streaming audio (M3U8): suitable for real-time playback, field
audioStreamUrl - Full audio (MP3): suitable for download and offline playback, field
audioUrl
- Streaming audio (M3U8): suitable for real-time playback, field
3. Playground Quick Experience
1. Multi-speaker TTS
ListenHub Playground provides an online multi-speaker speech synthesis demo that can be tested without writing code.
URL: https://assets.listenhub.ai/listenhub-public-prod/static/playgroud-tts.html
Highlights:
- Multi-role dialogue generation in a single request
- Flexible assignment of speaker per script line
- Online script editing with instant listening preview
Typical scenarios:
- Audiobook and radio drama production
- Conversational content generation
- Rapid product demo production