ListenHubSDKs & CLI
CLI

OpenAPI Commands

Reference for every `listenhub openapi` command — the API-key namespace for scripts and CI.

The listenhub openapi namespace runs every command against your API key instead of an OAuth login. Use it on servers and in CI, where you control the environment and want a long-lived credential rather than an interactive browser flow.

Before you start, make the key available — set LISTENHUB_API_KEY or store it with listenhub openapi config set-key. See Authentication for where credentials live and how the CLI resolves them.

export LISTENHUB_API_KEY="lh_sk_..."
listenhub openapi speakers list --language en

Conventions used on this page

Every command in this namespace shares the same behavior:

  • Output. Human-readable text by default; --json / -j prints machine-readable JSON to stdout (errors go to stderr). Pipe JSON into jq.
  • Async creation. Commands that start generation submit a job and then poll until it reaches a terminal state, printing a spinner. Polling runs on a 10-second interval. --no-wait returns the ID immediately and exits 0; --timeout <seconds> caps the wait (default varies by command, noted per group below).
  • Exit codes. 0 success, 1 error, 2 auth, 3 timeout.
  • Credits. Generation consumes credits. Use the relevant estimate command before creating, and check your balance with listenhub openapi subscription. Never assume a fixed cost.

Every command accepts --help / -h. Run listenhub openapi <group> <command> --help to see the exact flags for your installed version.

config

Manage the stored API key. The CLI reads LISTENHUB_API_KEY first, then the file written by set-key.

CommandDescription
config set-keyPrompt for a key and store it at ~/.config/listenhub/openapi.json (mode 0600). The key must start with lh_sk_.
config showShow the configured key (masked) and its source (env or file). Add --json. Exits 1 if nothing is configured.
config clearRemove the stored key file.
listenhub openapi config set-key
listenhub openapi config show --json

speakers

List the voices available to your account. The ID column is the speakerId you pass to creation commands.

CommandOptionsDescription
speakers list--language <lang>, --jsonList speakers, optionally filtered by language.
listenhub openapi speakers list --language en

Prints a table of Name, ID, Gender, and Language.

tts, audio-speech, speech

Three ways to turn text into speech. The first two stream binary audio to a file; the third returns a hosted audio URL.

CommandDescription
ttsText-to-speech, streamed to a local file.
audio-speechSame as tts, on the OpenAI /v1/audio/speech-compatible route.
speechCreate speech and get back a hosted audioUrl (plus duration, credits, and subtitles when available).

tts and audio-speech share these options:

OptionDefaultDescription
--text <text>requiredText to convert.
--voice <speakerId>requiredSpeaker ID.
--output <file>requiredOutput file path.
--format <format>mp3One of mp3, opus, aac, flac, wav, pcm.

speech options:

OptionDescription
--script <content>Script text (required).
--speaker-id <id>Speaker ID (required).
--json, -jOutput JSON.
# Stream an MP3 to disk
listenhub openapi tts \
  --text "Welcome to ListenHub." \
  --voice <speaker-id> \
  --output welcome.mp3

# Get a hosted audio URL instead
listenhub openapi speech --script "Welcome to ListenHub." --speaker-id <speaker-id>

flow-speech

Flow Speech turns sources or scripts into a narrated episode. Creation commands poll until processStatus is success; the default --timeout is 300 seconds.

CommandDescription
flow-speech createCreate an episode from --source-url / --source-text.
flow-speech get <episodeId>Fetch episode details.
flow-speech ttsCreate an episode directly from scripts (no source extraction).
flow-speech text-stream <episodeId>Stream generated text over SSE.

flow-speech create options:

OptionDefaultDescription
--source-url <url>Source URL. Repeatable.
--source-text <text>Source text. Repeatable.
--speaker-id <id>requiredSpeaker ID. Repeatable. At least one is required.
--mode <mode>smartsmart or direct.
--lang <lang>autoLanguage code.
--no-waitReturn the episode ID without polling.
--timeout <seconds>300Polling timeout.
--json, -jOutput JSON.

At least one --source-url or --source-text is required.

flow-speech tts options:

OptionDefaultDescription
--script <content>requiredScript content. Repeatable. At least one required.
--speaker-id <id>requiredSpeaker ID. Repeatable. At least one required.
--title <title>Episode title.
--no-wait, --timeout <seconds> (300), --jsonAs above.

Scripts and speakers are paired by position: the first --script uses the first --speaker-id, and so on; if there are more scripts than speakers, the first speaker is reused.

flow-speech text-stream <episodeId> requires --event <event>, one of script or outline. It writes the raw SSE stream to stdout.

# From a source URL, two voices
listenhub openapi flow-speech create \
  --source-url "https://example.com/article" \
  --speaker-id <host-id> \
  --speaker-id <guest-id>

# Directly from scripts
listenhub openapi flow-speech tts \
  --script "Hello and welcome." --speaker-id <host-id> \
  --script "Glad to be here." --speaker-id <guest-id> \
  --title "Episode 1"

podcast

Generate a podcast episode. You can produce text and audio in one step, or split the two: generate the text content first, review or stream it, then generate audio. Creation commands default to a 300-second --timeout.

CommandDescription
podcast createGenerate a full episode (text + audio) from --query and/or sources.
podcast get <episodeId>Fetch episode details.
podcast text-contentGenerate the script only, no audio. Polls until contentStatus is text-success.
podcast generate-audio <episodeId>Generate audio for an existing text episode. Polls until contentStatus is audio-success.
podcast text-stream <episodeId>Stream generated text over SSE.

podcast create options:

OptionDefaultDescription
--query <text>Topic or prompt for the episode.
--source-url <url>Source URL. Repeatable.
--source-text <text>Source text. Repeatable.
--speaker-id <id>requiredSpeaker ID. Repeatable. At least one required. Pass more than one for a multi-voice episode.
--mode <mode>Generation mode.
--lang <lang>autoLanguage code.
--no-wait, --timeout <seconds> (300), --jsonStandard async flags.

podcast text-content takes the same source and speaker options (--query, --source-url, --source-text, --speaker-id, --mode) plus the async flags. At least one of --query, --source-url, or --source-text is required.

podcast text-stream <episodeId> requires --event <event>, one of script or outline.

# One-shot: text + audio
listenhub openapi podcast create \
  --query "AI agent trends in 2026" \
  --speaker-id <host-id> \
  --mode quick

# Two-step: text first, then audio
ID=$(listenhub openapi podcast text-content \
  --query "Weekly recap" --speaker-id <host-id> --no-wait -j | jq -r '.episodeId')
listenhub openapi podcast text-stream "$ID" --event script
listenhub openapi podcast generate-audio "$ID"

storybook

Storybook produces explainer and slides episodes, optionally with video. Creation polls until processStatus is success; the default --timeout is 300 seconds.

CommandDescription
storybook createCreate an episode from sources.
storybook get <episodeId>Fetch episode details.
storybook generate-video <episodeId>Kick off video generation for an episode.

storybook create options:

OptionDefaultDescription
--source-url <url>Source URL. Repeatable.
--source-text <text>Source text. Repeatable.
--speaker-id <id>Speaker ID. Repeatable. Optional.
--skip-audiooffGenerate without audio.
--style <style>Storybook style.
--mode <mode>infoOne of info, story, slides.
--lang <lang>autoLanguage code.
--no-wait, --timeout <seconds> (300), --jsonStandard async flags.
listenhub openapi storybook create \
  --source-url "https://example.com/explainer" \
  --mode slides --skip-audio

image

Generate an AI image from a prompt, optionally conditioned on reference images. --reference accepts both local file paths and URLs; local files are read and sent inline, URLs are passed by reference.

OptionDescription
--prompt <text>Image description (required).
--provider <provider>Provider name (required).
--model <model>Model name.
--size <size>One of 1K, 2K, 4K.
--ratio <ratio>One of 16:9, 4:3, 1:1, 3:4, 9:16, 21:9.
--reference <path-or-url>Reference image, local path or URL. Repeatable.
--json, -jOutput JSON.
listenhub openapi image create \
  --provider <provider> \
  --prompt "A neon-lit city skyline at dusk" \
  --ratio 16:9 --size 2K

video

AI video generation. This group has two surfaces: the generic video commands (text/image/reference driven) and the video pixverse subcommands (PixVerse capability API). Both poll until status is success with a default --timeout of 1200 seconds, and both take a 24-character hex task ID.

CommandDescription
video createCreate a generation task.
video get <taskId>Fetch task details.
video listList tasks.
video estimateEstimate credits before creating.
video pixverse generateCreate a PixVerse task.
video pixverse estimateEstimate credits for a PixVerse task.

video create

The prompt is required; everything else selects an input mode. Frame mode (--first-frame / --last-frame) and reference mode (--reference-image / --reference-video / --reference-audio) are mutually exclusive.

OptionDefaultDescription
--prompt <text>requiredVideo description / prompt.
--first-frame <url>First-frame image URL.
--last-frame <url>Last-frame image URL. Requires --first-frame.
--reference-image <url>Reference image URL. Repeatable, max 9.
--reference-video <url>Reference video URL. Repeatable, max 3. Requires --input-video-duration.
--reference-audio <url>Reference audio URL. Repeatable, max 3. Requires --reference-image or --reference-video.
--input-video-duration <seconds>Input video duration, 215. Required with --reference-video.
--model <model>Model name, e.g. doubao-seedance-2-pro.
--resolution <res>One of 480p, 720p, 1080p.
--ratio <ratio>One of 16:9, 4:3, 1:1, 3:4, 9:16, 21:9.
--duration <seconds>Output duration, 415.
--no-generate-audioaudio onDisable audio generation.
--seed <number>Random seed, -1 to 4294967295.
--no-wait, --timeout <seconds> (1200), --jsonStandard async flags.
# Text to video
listenhub openapi video create \
  --prompt "A timelapse of clouds over a mountain range" \
  --model doubao-seedance-2-pro \
  --resolution 1080p --duration 8

# First/last frame interpolation
listenhub openapi video create \
  --prompt "Smooth morph between the two frames" \
  --first-frame "https://example.com/a.jpg" \
  --last-frame "https://example.com/b.jpg"

video list and estimate

video list options: --page <n> (default 1), --page-size <n> (default 20), --status <status> (one of pending, generating, uploading, success, failed), --json.

video estimate requires --model, --resolution, and --duration, and accepts --ratio, --has-video-input, and --input-video-duration (required when --has-video-input is set):

listenhub openapi video estimate \
  --model doubao-seedance-2-pro --resolution 1080p --duration 8

video pixverse

PixVerse exposes atomic generation capabilities plus a marketing agent. Pick one with --capability:

CapabilityWhat it does
text_to_videoGenerate from a text prompt.
image_to_videoAnimate a still image.
transitionTransition between two assets.
multi_transitionTransition across multiple assets.
fusionFuse multiple inputs into one clip.
restyleRestyle an existing PixVerse video.
mimicMimic a reference motion/style.
lip_syncDrive lip motion from audio or TTS.
agentMarketing agent (ad_master, promo_mix).

Shared enums:

  • Model (--model): pixverse, v6, v5, v4.5 (default pixverse).
  • Language / region (--language): zh, en (default en).
  • Quality (--quality): 360p, 540p, 720p, 1080p (default 720p).
  • Aspect ratio (--aspect-ratio): 9:16, 16:9, 1:1, 4:3, 3:4 (default 16:9).
  • Agent type (--agent-type, with --capability agent): ad_master, promo_mix.

video pixverse generate options:

OptionDefaultDescription
--capability <capability>requiredOne of the capabilities above.
--model <model>pixverseModel.
--language <lang>enService region.
--prompt <text>Prompt, max 2048 chars.
--quality <quality>720pOutput quality.
--aspect-ratio <ratio>16:9Aspect ratio.
--duration <seconds>5Integer 160.
--source-task-id <id>Reuse a prior succeeded PixVerse task (for restyle / lip_sync).
--image <url[:duration]>Image asset, optional :duration suffix. Repeatable, max 10.
--video <url[:duration]>Video asset, optional :duration suffix. Repeatable, max 2.
--audio <url[:duration]>Audio asset, optional :duration suffix. Repeatable, max 1.
--agent-type <type>ad_master or promo_mix (with --capability agent).
--source-video-id <id>PixVerse source video id (restyle).
--restyle-id <id>PixVerse restyle id (restyle).
--lip-sync-ttsoffEnable lip-sync TTS (--capability lip_sync).
--lip-sync-speaker-id <id>Lip-sync TTS speaker id.
--lip-sync-content <text>Lip-sync TTS content.
--pixverse-json <json>Escape hatch: raw JSON for the nested pixverse object. Merged with flag-derived fields; flags win.
--no-wait, --timeout <seconds> (1200), --jsonStandard async flags.

Asset flags accept an optional trailing :duration in seconds — for example https://example.com/clip.mp4:5. Only a trailing :<integer> is treated as a duration, so URLs with their own colons are safe.

# Lip-sync from TTS
listenhub openapi video pixverse generate \
  --capability lip_sync \
  --video "https://example.com/face.mp4" \
  --lip-sync-tts \
  --lip-sync-speaker-id <speaker-id> \
  --lip-sync-content "Hi, here's our product update."

# Marketing agent
listenhub openapi video pixverse generate \
  --capability agent \
  --agent-type ad_master \
  --prompt "30-second ad for a noise-cancelling headset" \
  --image "https://example.com/product.jpg"

video pixverse estimate takes --capability (required), plus --model, --language, --quality, --duration, and --agent-type:

listenhub openapi video pixverse estimate \
  --capability text_to_video --quality 1080p --duration 5

music

AI music generation, backed by Mureka. Async commands (generate, remix, instrumental, soundtrack, track) poll until status is success; the default --timeout is 600 seconds. The analysis commands (recognize, describe, stem) run synchronously.

CommandDescription
music generateGenerate from a prompt and/or lyrics.
music remix [audio]Remix an existing song with new lyrics.
music instrumentalGenerate a standalone instrumental.
music soundtrackGenerate music from an image or video.
music track [audio]Generate a single instrument or vocal track.
music recognizeRecognize lyrics with timestamps from audio.
music describeAnalyze audio: description, tags, genres, instruments.
music stemSeparate audio into stems, returns download URLs.
music listList music tasks.
music get <taskId>Fetch task details.

Model values across the generation commands: auto, mureka-7.6, mureka-8, mureka-9, mureka-o2.

music generate options:

OptionDescription
--prompt <text>Music description. At least one of --prompt or --lyrics is required.
--lyrics <text>Song lyrics.
--style <text>Music style / mood.
--title <text>Track title.
--model <model>One of the model values above.
--instrumentalInstrumental only, no vocals.
--vocal-id <id>Reusable vocal id.
--no-wait, --timeout <seconds> (600), --jsonStandard async flags.

music remix [audio] takes the audio as a positional file, or --audio-url <url>, or --provider-song-id <id> — exactly one. Requires --lyrics and --prompt.

music instrumental requires exactly one of --prompt or --reference-audio <file> (mp3/m4a, max 10MB); accepts --model.

music soundtrack requires exactly one of --image <file> or --video <file>; accepts --prompt and --model.

music track [audio] takes the audio as a positional file or --provider-song-id <id> (exactly one). Requires --generate-type (Vocals, Instrumental, Drums, Bass, Guitar, …) and --prompt; --lyrics is required when --generate-type is Vocals. Optional --vocal-gender <male|female>, --generate-start <seconds>, --generate-end <seconds>.

music recognize, music describe, and music stem each require --audio <file> (mp3/m4a, max 10MB). music stem also accepts --model (audio-separation-1 or audio-separation-2).

music list options: --page <n> (default 1), --page-size <n> (default 20), --status <status> (pending, generating, uploading, success, failed), --json.

# Generate from a prompt
listenhub openapi music generate \
  --prompt "Upbeat synthwave with a driving bassline" --title "Night Drive"

# Separate an mp3 into stems
listenhub openapi music stem --audio track.mp3

content

Extract readable content from a URL, optionally summarized. Async; polls until status is completed with a default --timeout of 300 seconds.

CommandDescription
content extractExtract content from a URL.
content get <taskId>Fetch the extraction result.

content extract options:

OptionDefaultDescription
--url <url>requiredURL to extract from.
--summarizeoffSummarize the extracted content.
--max-length <n>Maximum content length.
--no-wait, --timeout <seconds> (300), --jsonStandard async flags.
listenhub openapi content extract --url "https://example.com/article" --summarize

subscription

Show your subscription plan and credit balance — total available credits, the monthly allotment used/total, permanent credits, plan name, and expiry.

CommandOptionsDescription
subscription--jsonShow subscription and credits info.
listenhub openapi subscription --json

Next steps

On this page