OpenAPI Commands

Reference for every `listenhub openapi` command — the API-key namespace for scripts and CI.

The listenhub openapi namespace runs every command against your API key instead of an OAuth login. Use it on servers and in CI, where you control the environment and want a long-lived credential rather than an interactive browser flow.

Before you start, make the key available — set LISTENHUB_API_KEY or store it with listenhub openapi config set-key. See Authentication for where credentials live and how the CLI resolves them.

export LISTENHUB_API_KEY="lh_sk_..."
listenhub openapi speakers list --language en

Conventions used on this page

Every command in this namespace shares the same behavior:

Output. Human-readable text by default; --json / -j prints machine-readable JSON to stdout (errors go to stderr). Pipe JSON into jq.
Async creation. Commands that start generation submit a job and then poll until it reaches a terminal state, printing a spinner. Polling runs on a 10-second interval. --no-wait returns the ID immediately and exits 0; --timeout <seconds> caps the wait (default varies by command, noted per group below).
Exit codes. 0 success, 1 error, 2 auth, 3 timeout.
Credits. Generation consumes credits. Use the relevant estimate command before creating, and check your balance with listenhub openapi subscription. Never assume a fixed cost.

Every command accepts --help / -h. Run listenhub openapi <group> <command> --help to see the exact flags for your installed version.

config

Manage the stored API key. The CLI reads LISTENHUB_API_KEY first, then the file written by set-key.

Command	Description
`config set-key`	Prompt for a key and store it at `~/.config/listenhub/openapi.json` (mode `0600`). The key must start with `lh_sk_`.
`config show`	Show the configured key (masked) and its source (`env` or `file`). Add `--json`. Exits `1` if nothing is configured.
`config clear`	Remove the stored key file.

listenhub openapi config set-key
listenhub openapi config show --json

speakers

List the voices available to your account. The ID column is the speakerId you pass to creation commands.

Command	Options	Description
`speakers list`	`--language <lang>`, `--json`	List speakers, optionally filtered by language.

listenhub openapi speakers list --language en

Prints a table of Name, ID, Gender, and Language.

tts, audio-speech, speech

Three ways to turn text into speech. The first two stream binary audio to a file; the third returns a hosted audio URL.

Command	Description
`tts`	Text-to-speech, streamed to a local file.
`audio-speech`	Same as `tts`, on the OpenAI `/v1/audio/speech`-compatible route.
`speech`	Create speech and get back a hosted `audioUrl` (plus duration, credits, and subtitles when available).

tts and audio-speech share these options:

Option	Default	Description
`--text <text>`	required	Text to convert.
`--voice <speakerId>`	required	Speaker ID.
`--output <file>`	required	Output file path.
`--format <format>`	`mp3`	One of `mp3`, `opus`, `aac`, `flac`, `wav`, `pcm`.

speech options:

Option	Description
`--script <content>`	Script text (required).
`--speaker-id <id>`	Speaker ID (required).
`--json`, `-j`	Output JSON.

# Stream an MP3 to disk
listenhub openapi tts \
  --text "Welcome to ListenHub." \
  --voice <speaker-id> \
  --output welcome.mp3

# Get a hosted audio URL instead
listenhub openapi speech --script "Welcome to ListenHub." --speaker-id <speaker-id>

flow-speech

Flow Speech turns sources or scripts into a narrated episode. Creation commands poll until processStatus is success; the default --timeout is 300 seconds.

Command	Description
`flow-speech create`	Create an episode from `--source-url` / `--source-text`.
`flow-speech get <episodeId>`	Fetch episode details.
`flow-speech tts`	Create an episode directly from scripts (no source extraction).
`flow-speech text-stream <episodeId>`	Stream generated text over SSE.

flow-speech create options:

Option	Default	Description
`--source-url <url>`	—	Source URL. Repeatable.
`--source-text <text>`	—	Source text. Repeatable.
`--speaker-id <id>`	required	Speaker ID. Repeatable. At least one is required.
`--mode <mode>`	`smart`	`smart` or `direct`.
`--lang <lang>`	auto	Language code.
`--no-wait`	—	Return the episode ID without polling.
`--timeout <seconds>`	`300`	Polling timeout.
`--json`, `-j`	—	Output JSON.

At least one --source-url or --source-text is required.

flow-speech tts options:

Option	Default	Description
`--script <content>`	required	Script content. Repeatable. At least one required.
`--speaker-id <id>`	required	Speaker ID. Repeatable. At least one required.
`--title <title>`	—	Episode title.
`--no-wait`, `--timeout <seconds>` (`300`), `--json`	—	As above.

Scripts and speakers are paired by position: the first --script uses the first --speaker-id, and so on; if there are more scripts than speakers, the first speaker is reused.

flow-speech text-stream <episodeId> requires --event <event>, one of script or outline. It writes the raw SSE stream to stdout.

# From a source URL, two voices
listenhub openapi flow-speech create \
  --source-url "https://example.com/article" \
  --speaker-id <host-id> \
  --speaker-id <guest-id>

# Directly from scripts
listenhub openapi flow-speech tts \
  --script "Hello and welcome." --speaker-id <host-id> \
  --script "Glad to be here." --speaker-id <guest-id> \
  --title "Episode 1"

podcast

Generate a podcast episode. You can produce text and audio in one step, or split the two: generate the text content first, review or stream it, then generate audio. Creation commands default to a 300-second --timeout.

Command	Description
`podcast create`	Generate a full episode (text + audio) from `--query` and/or sources.
`podcast get <episodeId>`	Fetch episode details.
`podcast text-content`	Generate the script only, no audio. Polls until `contentStatus` is `text-success`.
`podcast generate-audio <episodeId>`	Generate audio for an existing text episode. Polls until `contentStatus` is `audio-success`.
`podcast text-stream <episodeId>`	Stream generated text over SSE.

podcast create options:

Option	Default	Description
`--query <text>`	—	Topic or prompt for the episode.
`--source-url <url>`	—	Source URL. Repeatable.
`--source-text <text>`	—	Source text. Repeatable.
`--speaker-id <id>`	required	Speaker ID. Repeatable. At least one required. Pass more than one for a multi-voice episode.
`--mode <mode>`	—	Generation mode.
`--lang <lang>`	auto	Language code.
`--no-wait`, `--timeout <seconds>` (`300`), `--json`	—	Standard async flags.

podcast text-content takes the same source and speaker options (--query, --source-url, --source-text, --speaker-id, --mode) plus the async flags. At least one of --query, --source-url, or --source-text is required.

podcast text-stream <episodeId> requires --event <event>, one of script or outline.

# One-shot: text + audio
listenhub openapi podcast create \
  --query "AI agent trends in 2026" \
  --speaker-id <host-id> \
  --mode quick

# Two-step: text first, then audio
ID=$(listenhub openapi podcast text-content \
  --query "Weekly recap" --speaker-id <host-id> --no-wait -j | jq -r '.episodeId')
listenhub openapi podcast text-stream "$ID" --event script
listenhub openapi podcast generate-audio "$ID"

storybook

Storybook produces explainer and slides episodes, optionally with video. Creation polls until processStatus is success; the default --timeout is 300 seconds.

Command	Description
`storybook create`	Create an episode from sources.
`storybook get <episodeId>`	Fetch episode details.
`storybook generate-video <episodeId>`	Kick off video generation for an episode.

storybook create options:

Option	Default	Description
`--source-url <url>`	—	Source URL. Repeatable.
`--source-text <text>`	—	Source text. Repeatable.
`--speaker-id <id>`	—	Speaker ID. Repeatable. Optional.
`--skip-audio`	off	Generate without audio.
`--style <style>`	—	Storybook style.
`--mode <mode>`	`info`	One of `info`, `story`, `slides`.
`--lang <lang>`	auto	Language code.
`--no-wait`, `--timeout <seconds>` (`300`), `--json`	—	Standard async flags.

listenhub openapi storybook create \
  --source-url "https://example.com/explainer" \
  --mode slides --skip-audio

image

Generate an AI image from a prompt, optionally conditioned on reference images. --reference accepts both local file paths and URLs; local files are read and sent inline, URLs are passed by reference.

Option	Description
`--prompt <text>`	Image description (required).
`--provider <provider>`	Provider name (required).
`--model <model>`	Model name.
`--size <size>`	One of `1K`, `2K`, `4K`.
`--ratio <ratio>`	One of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`, `21:9`.
`--reference <path-or-url>`	Reference image, local path or URL. Repeatable.
`--json`, `-j`	Output JSON.

listenhub openapi image create \
  --provider <provider> \
  --prompt "A neon-lit city skyline at dusk" \
  --ratio 16:9 --size 2K

video

AI video generation. This group has two surfaces: the generic video commands (text/image/reference driven) and the video pixverse subcommands (PixVerse capability API). Both poll until status is success with a default --timeout of 1200 seconds, and both take a 24-character hex task ID.

Command	Description
`video create`	Create a generation task.
`video get <taskId>`	Fetch task details.
`video list`	List tasks.
`video estimate`	Estimate credits before creating.
`video pixverse generate`	Create a PixVerse task.
`video pixverse estimate`	Estimate credits for a PixVerse task.

video create

The prompt is required; everything else selects an input mode. Frame mode (--first-frame / --last-frame) and reference mode (--reference-image / --reference-video / --reference-audio) are mutually exclusive.

Option	Default	Description
`--prompt <text>`	required	Video description / prompt.
`--first-frame <url>`	—	First-frame image URL.
`--last-frame <url>`	—	Last-frame image URL. Requires `--first-frame`.
`--reference-image <url>`	—	Reference image URL. Repeatable, max 9.
`--reference-video <url>`	—	Reference video URL. Repeatable, max 3. Requires `--input-video-duration`.
`--reference-audio <url>`	—	Reference audio URL. Repeatable, max 3. Requires `--reference-image` or `--reference-video`.
`--input-video-duration <seconds>`	—	Input video duration, `2`–`15`. Required with `--reference-video`.
`--model <model>`	—	Model name, e.g. `doubao-seedance-2-pro`.
`--resolution <res>`	—	One of `480p`, `720p`, `1080p`.
`--ratio <ratio>`	—	One of `16:9`, `4:3`, `1:1`, `3:4`, `9:16`, `21:9`.
`--duration <seconds>`	—	Output duration, `4`–`15`.
`--no-generate-audio`	audio on	Disable audio generation.
`--seed <number>`	—	Random seed, `-1` to `4294967295`.
`--no-wait`, `--timeout <seconds>` (`1200`), `--json`	—	Standard async flags.

# Text to video
listenhub openapi video create \
  --prompt "A timelapse of clouds over a mountain range" \
  --model doubao-seedance-2-pro \
  --resolution 1080p --duration 8

# First/last frame interpolation
listenhub openapi video create \
  --prompt "Smooth morph between the two frames" \
  --first-frame "https://example.com/a.jpg" \
  --last-frame "https://example.com/b.jpg"

video list options: --page <n> (default 1), --page-size <n> (default 20), --status <status> (one of pending, generating, uploading, success, failed), --json.

video estimate requires --model, --resolution, and --duration, and accepts --ratio, --has-video-input, and --input-video-duration (required when --has-video-input is set):

listenhub openapi video estimate \
  --model doubao-seedance-2-pro --resolution 1080p --duration 8

video pixverse

PixVerse exposes atomic generation capabilities plus a marketing agent. Pick one with --capability:

Capability	What it does
`text_to_video`	Generate from a text prompt.
`image_to_video`	Animate a still image.
`transition`	Transition between two assets.
`multi_transition`	Transition across multiple assets.
`fusion`	Fuse multiple inputs into one clip.
`restyle`	Restyle an existing PixVerse video.
`mimic`	Mimic a reference motion/style.
`lip_sync`	Drive lip motion from audio or TTS.
`agent`	Marketing agent (`ad_master`, `promo_mix`).

Shared enums:

Model (--model): pixverse, v6, v5, v4.5 (default pixverse).
Language / region (--language): zh, en (default en).
Quality (--quality): 360p, 540p, 720p, 1080p (default 720p).
Aspect ratio (--aspect-ratio): 9:16, 16:9, 1:1, 4:3, 3:4 (default 16:9).
Agent type (--agent-type, with --capability agent): ad_master, promo_mix.

video pixverse generate options:

Option	Default	Description
`--capability <capability>`	required	One of the capabilities above.
`--model <model>`	`pixverse`	Model.
`--language <lang>`	`en`	Service region.
`--prompt <text>`	—	Prompt, max 2048 chars.
`--quality <quality>`	`720p`	Output quality.
`--aspect-ratio <ratio>`	`16:9`	Aspect ratio.
`--duration <seconds>`	`5`	Integer `1`–`60`.
`--source-task-id <id>`	—	Reuse a prior succeeded PixVerse task (for `restyle` / `lip_sync`).
`--image <url[:duration]>`	—	Image asset, optional `:duration` suffix. Repeatable, max 10.
`--video <url[:duration]>`	—	Video asset, optional `:duration` suffix. Repeatable, max 2.
`--audio <url[:duration]>`	—	Audio asset, optional `:duration` suffix. Repeatable, max 1.
`--agent-type <type>`	—	`ad_master` or `promo_mix` (with `--capability agent`).
`--source-video-id <id>`	—	PixVerse source video id (`restyle`).
`--restyle-id <id>`	—	PixVerse restyle id (`restyle`).
`--lip-sync-tts`	off	Enable lip-sync TTS (`--capability lip_sync`).
`--lip-sync-speaker-id <id>`	—	Lip-sync TTS speaker id.
`--lip-sync-content <text>`	—	Lip-sync TTS content.
`--pixverse-json <json>`	—	Escape hatch: raw JSON for the nested `pixverse` object. Merged with flag-derived fields; flags win.
`--no-wait`, `--timeout <seconds>` (`1200`), `--json`	—	Standard async flags.

Asset flags accept an optional trailing :duration in seconds — for example https://example.com/clip.mp4:5. Only a trailing :<integer> is treated as a duration, so URLs with their own colons are safe.

# Lip-sync from TTS
listenhub openapi video pixverse generate \
  --capability lip_sync \
  --video "https://example.com/face.mp4" \
  --lip-sync-tts \
  --lip-sync-speaker-id <speaker-id> \
  --lip-sync-content "Hi, here's our product update."

# Marketing agent
listenhub openapi video pixverse generate \
  --capability agent \
  --agent-type ad_master \
  --prompt "30-second ad for a noise-cancelling headset" \
  --image "https://example.com/product.jpg"

video pixverse estimate takes --capability (required), plus --model, --language, --quality, --duration, and --agent-type:

listenhub openapi video pixverse estimate \
  --capability text_to_video --quality 1080p --duration 5

music

AI music generation, backed by Mureka. Async commands (generate, remix, instrumental, soundtrack, track) poll until status is success; the default --timeout is 600 seconds. The analysis commands (recognize, describe, stem) run synchronously.

Command	Description
`music generate`	Generate from a prompt and/or lyrics.
`music remix [audio]`	Remix an existing song with new lyrics.
`music instrumental`	Generate a standalone instrumental.
`music soundtrack`	Generate music from an image or video.
`music track [audio]`	Generate a single instrument or vocal track.
`music recognize`	Recognize lyrics with timestamps from audio.
`music describe`	Analyze audio: description, tags, genres, instruments.
`music stem`	Separate audio into stems, returns download URLs.
`music list`	List music tasks.
`music get <taskId>`	Fetch task details.

Model values across the generation commands: auto, mureka-7.6, mureka-8, mureka-9, mureka-o2.

music generate options:

Option	Description
`--prompt <text>`	Music description. At least one of `--prompt` or `--lyrics` is required.
`--lyrics <text>`	Song lyrics.
`--style <text>`	Music style / mood.
`--title <text>`	Track title.
`--model <model>`	One of the model values above.
`--instrumental`	Instrumental only, no vocals.
`--vocal-id <id>`	Reusable vocal id.
`--no-wait`, `--timeout <seconds>` (`600`), `--json`	Standard async flags.

music remix [audio] takes the audio as a positional file, or --audio-url <url>, or --provider-song-id <id> — exactly one. Requires --lyrics and --prompt.

music instrumental requires exactly one of --prompt or --reference-audio <file> (mp3/m4a, max 10MB); accepts --model.

music soundtrack requires exactly one of --image <file> or --video <file>; accepts --prompt and --model.

music track [audio] takes the audio as a positional file or --provider-song-id <id> (exactly one). Requires --generate-type (Vocals, Instrumental, Drums, Bass, Guitar, …) and --prompt; --lyrics is required when --generate-type is Vocals. Optional --vocal-gender <male|female>, --generate-start <seconds>, --generate-end <seconds>.

music recognize, music describe, and music stem each require --audio <file> (mp3/m4a, max 10MB). music stem also accepts --model (audio-separation-1 or audio-separation-2).

music list options: --page <n> (default 1), --page-size <n> (default 20), --status <status> (pending, generating, uploading, success, failed), --json.

# Generate from a prompt
listenhub openapi music generate \
  --prompt "Upbeat synthwave with a driving bassline" --title "Night Drive"

# Separate an mp3 into stems
listenhub openapi music stem --audio track.mp3

content

Extract readable content from a URL, optionally summarized. Async; polls until status is completed with a default --timeout of 300 seconds.

Command	Description
`content extract`	Extract content from a URL.
`content get <taskId>`	Fetch the extraction result.

content extract options:

Option	Default	Description
`--url <url>`	required	URL to extract from.
`--summarize`	off	Summarize the extracted content.
`--max-length <n>`	—	Maximum content length.
`--no-wait`, `--timeout <seconds>` (`300`), `--json`	—	Standard async flags.

listenhub openapi content extract --url "https://example.com/article" --summarize

subscription

Show your subscription plan and credit balance — total available credits, the monthly allotment used/total, permanent credits, plan name, and expiry.

Command	Options	Description
`subscription`	`--json`	Show subscription and credits info.

listenhub openapi subscription --json