Content Extract

Asynchronously extract text content from URLs including web articles, Twitter/X profiles, tweets, YouTube videos, and WeChat Official Account posts.

Extract text content from any URL asynchronously. Submit a URL to create a task, then poll for results.

Supported Sources

X / Twitter

Profile pages and individual tweets — fetch recent posts from any public account, or extract a single tweet with its full context.

YouTube

Extract video transcripts and metadata from any public YouTube video URL.

WeChat Official Accounts

Extract article content from WeChat Official Account posts (mp.weixin.qq.com).

Web Articles

Any publicly accessible web page — article text, metadata, and reference links.

Use Cases

Podcast source material -- extract articles, tweets, or WeChat posts as input for podcast generation
Content summarization -- pull long-form content and generate summaries
Social media monitoring -- batch-extract tweets from key accounts
Research aggregation -- collect and structure content from multiple URLs

Create Extraction Task

POST /v1/content/extract

Request example:

curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://example.com/article"
    }
  }'

const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    source: {
      type: 'url',
      uri: 'https://example.com/article',
    },
  }),
});
const data = await response.json();
const taskId = data.data.taskId;
console.log('Task ID:', taskId);

import os
import requests

response = requests.post(
    'https://api.marswave.ai/openapi/v1/content/extract',
    headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
    json={
        'source': {
            'type': 'url',
            'uri': 'https://example.com/article',
        }
    }
)
data = response.json()
task_id = data['data']['taskId']
print('Task ID:', task_id)

With options (summarize + max length):

curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://example.com/long-article"
    },
    "options": {
      "summarize": true,
      "maxLength": 5000
    }
  }'

const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    source: {
      type: 'url',
      uri: 'https://example.com/long-article',
    },
    options: {
      summarize: true,
      maxLength: 5000,
    },
  }),
});
const data = await response.json();
console.log(data);

import os
import requests

response = requests.post(
    'https://api.marswave.ai/openapi/v1/content/extract',
    headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
    json={
        'source': {
            'type': 'url',
            'uri': 'https://example.com/long-article',
        },
        'options': {
            'summarize': True,
            'maxLength': 5000,
        },
    }
)
data = response.json()
print(data)

Twitter/X profile URL with tweet count:

# For Twitter/X profile URLs, use the twitter option to control how many tweets to fetch
curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://x.com/elonmusk"
    },
    "options": {
      "twitter": {
        "count": 50
      }
    }
  }'

const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    source: {
      type: 'url',
      uri: 'https://x.com/elonmusk',
    },
    options: {
      twitter: {
        count: 50,
      },
    },
  }),
});
const data = await response.json();
console.log(data);

import os
import requests

response = requests.post(
    'https://api.marswave.ai/openapi/v1/content/extract',
    headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
    json={
        'source': {
            'type': 'url',
            'uri': 'https://x.com/elonmusk',
        },
        'options': {
            'twitter': {
                'count': 50,
            },
        },
    }
)
data = response.json()
print(data)

WeChat Official Account article:

curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "source": {
      "type": "url",
      "uri": "https://mp.weixin.qq.com/s/XXXXXXXXXXXXXXXX"
    }
  }'

const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    source: {
      type: 'url',
      uri: 'https://mp.weixin.qq.com/s/XXXXXXXXXXXXXXXX',
    },
  }),
});
const data = await response.json();
const taskId = data.data.taskId;

import os
import requests

response = requests.post(
    'https://api.marswave.ai/openapi/v1/content/extract',
    headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
    json={
        'source': {
            'type': 'url',
            'uri': 'https://mp.weixin.qq.com/s/XXXXXXXXXXXXXXXX',
        }
    }
)
data = response.json()
task_id = data['data']['taskId']

Request body:

Field	Type	Required	Description
`source`	object	Yes	Source to extract from
`source.type`	string	Yes	Must be `"url"`
`source.uri`	string	Yes	The URL to extract content from
`options`	object	No	Extraction options
`options.summarize`	boolean	No	Whether to generate a summary
`options.maxLength`	integer	No	Maximum content length (default: 100,000, max: 500,000)
`options.twitter`	object	No	Twitter/X specific options
`options.twitter.count`	integer	No	Number of tweets to fetch (1-100, default 20)

Response example:

{
  "code": 0,
  "message": "success",
  "data": {
    "taskId": "{taskId}"
  }
}

Query Task Status

GET /v1/content/extract/{taskId}

Request example:

curl -X GET "https://api.marswave.ai/openapi/v1/content/extract/{taskId}" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY"

const result = await fetch(`https://api.marswave.ai/openapi/v1/content/extract/${taskId}`, {
  headers: {
    'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
  },
});
const data = await result.json();
console.log('Status:', data.data.status);

import os
import requests

result = requests.get(
    f'https://api.marswave.ai/openapi/v1/content/extract/{task_id}',
    headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'}
)
data = result.json()
print('Status:', data['data']['status'])

Path parameters:

Field	Type	Description
`taskId`	string	24-character hex string returned from the create endpoint

Response when complete (status: "completed"):

{
  "code": 0,
  "message": "success",
  "data": {
    "taskId": "{taskId}",
    "status": "completed",
    "createdAt": "2025-04-09T12:00:00Z",
    "data": {
      "content": "The extracted article text content...",
      "metadata": {
        "title": "Article Title",
        "author": "Author Name",
        "publishedAt": "2025-04-01T08:00:00Z"
      },
      "references": [
        "https://example.com/related-article"
      ]
    },
    "credits": 5,
    "failCode": null,
    "message": null
  }
}

While in progress, status is "processing" and data is null. On failure, status is "failed" and failCode / message describe the error.

Response fields:

Field	Type	Description
`taskId`	string	Task identifier
`status`	string	`processing`, `completed`, or `failed`
`createdAt`	string	ISO 8601 timestamp
`data`	object	Extracted content (null until completed)
`data.content`	string	Extracted text content
`data.metadata`	object	Page metadata (title, author, etc.)
`data.references`	array	Referenced URLs found in the content
`credits`	integer	Credits consumed
`failCode`	string	Error code (null on success)
`message`	string	Error message (null on success)

Notes

Twitter/X profile URLs:

When the source URL is a Twitter/X profile (e.g. https://x.com/username), the API fetches recent tweets
Use options.twitter.count to control how many tweets to retrieve (1-100, default 20)
This option is ignored for non-Twitter URLs

WeChat Official Accounts:

Use the full mp.weixin.qq.com/s/... article URL
Extracted content includes the article body text and metadata

Task lifecycle:

Status	Description
`processing`	Extraction is in progress
`completed`	Content extracted successfully
`failed`	Extraction failed -- check `failCode` and `message`

Polling recommendations:

Initial wait: 5 seconds after task creation
Poll interval: 5 seconds
Typical completion time: 10-30 seconds depending on URL complexity

Credits:

Rule	Details
Pre-deduction	5 credits reserved when a task starts
Actual charge	100 credits per 100,000 characters; if actual is less than 5, the actual amount is charged
Failure refund	All pre-deducted credits refunded if extraction fails
Default `maxLength`	100,000 characters (100 credits)
Maximum `maxLength`	500,000 characters (500 credits)

Credits are charged based on actual extracted content length. The credits field in the completed task response shows the final amount consumed.

Content Extract

Content Extract

Supported Sources

X / Twitter

YouTube

WeChat Official Accounts

Web Articles

Use Cases

Create Extraction Task

Query Task Status

Notes

On this page