ListenHubOpenAPI

Core Concepts

Episodes, speakers, generation modes, and data flow in ListenHub OpenAPI.

Basic Concepts

  • Episode — The basic content unit in ListenHub. Each episode has a unique episodeId and contains audio, scripts, and metadata.
  • Speaker — Defines the voice characteristics used for generation. Identified by speakerId, with attributes such as language and gender. Call GET /v1/speakers/list to browse available voices, or see the Speakers API reference.

Generation Modes

ModeSub-modeDescriptionGeneration TimeAPI Endpoint
PodcastquickFaster generation prioritizing efficiency; best for news briefs and time-sensitive content1-2 min/v1/podcast/episodes
debateTwo-host debate format; best for opinion discussions and multi-angle analysis2-4 min
deepIn-depth analysis with higher content quality; best for professional knowledge sharing and deep commentary2-4 min
Text to SpeechsmartAI optimizes content before synthesis; best for fixing awkward sentences and typos1-2 min/v1/flow-speech/episodes
directDirect text-to-speech conversion; best for well-prepared scripts and announcements1-2 min
Content ExtractAsync URL content extraction; best for article parsing, research, and content analysis10-30 sec/v1/content/extract

Podcast mode supports 1-2 speakers (single or dual host). Debate mode requires exactly 2 speakers.

Output Types

Each generated episode provides two types of data: script text and audio files.

Script Stream (Server-Sent Events)

While audio is being generated, you can retrieve outline and script data via SSE without waiting for the audio to finish:

  • Podcast: available 20-60 seconds after creation
  • Text to Speech: available ~3 seconds after creation

Audio Files

Once generation completes, the response includes:

FieldFormatDescription
audioStreamUrlM3U8Streaming playback, best for real-time use
audioUrlMP3Full file download, best for offline use

Playground

ListenHub provides an online Playground for testing multi-speaker speech synthesis without writing code.

URL: Multi-speaker TTS Playground

  • Multi-role dialogue — generate audio with multiple voices in a single request
  • Flexible assignment — assign a different speaker to each script line
  • Instant preview — edit scripts online and listen to results immediately

Suitable for audiobook/radio drama production, conversational content generation, and rapid product demo creation.

Next Steps

On this page