Core Concepts

Basic Concepts

Episode — The basic content unit in ListenHub. Each episode has a unique episodeId and contains audio, scripts, and metadata.
Speaker — Defines the voice characteristics used for generation. Identified by speakerId, with attributes such as language and gender. Call GET /v1/speakers/list to browse available voices, or see the Speakers API reference.

Mode	Sub-mode	Description	Generation Time	API Endpoint
Podcast	quick	Faster generation prioritizing efficiency; best for news briefs and time-sensitive content	1-2 min	`/v1/podcast/episodes`
	debate	Two-host debate format; best for opinion discussions and multi-angle analysis	2-4 min
	deep	In-depth analysis with higher content quality; best for professional knowledge sharing and deep commentary	2-4 min
Text to Speech	smart	AI optimizes content before synthesis; best for fixing awkward sentences and typos	1-2 min	`/v1/flow-speech/episodes`
	direct	Direct text-to-speech conversion; best for well-prepared scripts and announcements	1-2 min
Content Extract	—	Async URL content extraction; best for article parsing, research, and content analysis	10-30 sec	`/v1/content/extract`

Podcast mode supports 1-2 speakers (single or dual host). Debate mode requires exactly 2 speakers.

Each generated episode provides two types of data: script text and audio files.

While audio is being generated, you can retrieve outline and script data via SSE without waiting for the audio to finish:

Once generation completes, the response includes:

Field	Format	Description
`audioStreamUrl`	M3U8	Streaming playback, best for real-time use
`audioUrl`	MP3	Full file download, best for offline use

ListenHub provides an online Playground for testing multi-speaker speech synthesis without writing code.

URL: Multi-speaker TTS Playground

Suitable for audiobook/radio drama production, conversational content generation, and rapid product demo creation.