Content Extract
Asynchronously extract text content from URLs including web articles, Twitter/X profiles, tweets, YouTube videos, and WeChat Official Account posts.
Content Extract
Extract text content from any URL asynchronously. Submit a URL to create a task, then poll for results.
Supported Sources
X / Twitter
Profile pages and individual tweets — fetch recent posts from any public account, or extract a single tweet with its full context.
YouTube
Extract video transcripts and metadata from any public YouTube video URL.
WeChat Official Accounts
Extract article content from WeChat Official Account posts (mp.weixin.qq.com).
Web Articles
Any publicly accessible web page — article text, metadata, and reference links.
Use Cases
- Podcast source material -- extract articles, tweets, or WeChat posts as input for podcast generation
- Content summarization -- pull long-form content and generate summaries
- Social media monitoring -- batch-extract tweets from key accounts
- Research aggregation -- collect and structure content from multiple URLs
Create Extraction Task
POST /v1/content/extract
Request example:
curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://example.com/article"
}
}'const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
source: {
type: 'url',
uri: 'https://example.com/article',
},
}),
});
const data = await response.json();
const taskId = data.data.taskId;
console.log('Task ID:', taskId);import os
import requests
response = requests.post(
'https://api.marswave.ai/openapi/v1/content/extract',
headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
json={
'source': {
'type': 'url',
'uri': 'https://example.com/article',
}
}
)
data = response.json()
task_id = data['data']['taskId']
print('Task ID:', task_id)With options (summarize + max length):
curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://example.com/long-article"
},
"options": {
"summarize": true,
"maxLength": 5000
}
}'const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
source: {
type: 'url',
uri: 'https://example.com/long-article',
},
options: {
summarize: true,
maxLength: 5000,
},
}),
});
const data = await response.json();
console.log(data);import os
import requests
response = requests.post(
'https://api.marswave.ai/openapi/v1/content/extract',
headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
json={
'source': {
'type': 'url',
'uri': 'https://example.com/long-article',
},
'options': {
'summarize': True,
'maxLength': 5000,
},
}
)
data = response.json()
print(data)Twitter/X profile URL with tweet count:
# For Twitter/X profile URLs, use the twitter option to control how many tweets to fetch
curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://x.com/elonmusk"
},
"options": {
"twitter": {
"count": 50
}
}
}'const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
source: {
type: 'url',
uri: 'https://x.com/elonmusk',
},
options: {
twitter: {
count: 50,
},
},
}),
});
const data = await response.json();
console.log(data);import os
import requests
response = requests.post(
'https://api.marswave.ai/openapi/v1/content/extract',
headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
json={
'source': {
'type': 'url',
'uri': 'https://x.com/elonmusk',
},
'options': {
'twitter': {
'count': 50,
},
},
}
)
data = response.json()
print(data)WeChat Official Account article:
curl -X POST "https://api.marswave.ai/openapi/v1/content/extract" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"source": {
"type": "url",
"uri": "https://mp.weixin.qq.com/s/XXXXXXXXXXXXXXXX"
}
}'const response = await fetch('https://api.marswave.ai/openapi/v1/content/extract', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
source: {
type: 'url',
uri: 'https://mp.weixin.qq.com/s/XXXXXXXXXXXXXXXX',
},
}),
});
const data = await response.json();
const taskId = data.data.taskId;import os
import requests
response = requests.post(
'https://api.marswave.ai/openapi/v1/content/extract',
headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'},
json={
'source': {
'type': 'url',
'uri': 'https://mp.weixin.qq.com/s/XXXXXXXXXXXXXXXX',
}
}
)
data = response.json()
task_id = data['data']['taskId']Request body:
| Field | Type | Required | Description |
|---|---|---|---|
source | object | Yes | Source to extract from |
source.type | string | Yes | Must be "url" |
source.uri | string | Yes | The URL to extract content from |
options | object | No | Extraction options |
options.summarize | boolean | No | Whether to generate a summary |
options.maxLength | integer | No | Maximum content length (default: 100,000, max: 500,000) |
options.twitter | object | No | Twitter/X specific options |
options.twitter.count | integer | No | Number of tweets to fetch (1-100, default 20) |
Response example:
{
"code": 0,
"message": "success",
"data": {
"taskId": "{taskId}"
}
}Query Task Status
GET /v1/content/extract/{taskId}
Request example:
curl -X GET "https://api.marswave.ai/openapi/v1/content/extract/{taskId}" \
-H "Authorization: Bearer $LISTENHUB_API_KEY"const result = await fetch(`https://api.marswave.ai/openapi/v1/content/extract/${taskId}`, {
headers: {
'Authorization': `Bearer ${process.env.LISTENHUB_API_KEY}`,
},
});
const data = await result.json();
console.log('Status:', data.data.status);import os
import requests
result = requests.get(
f'https://api.marswave.ai/openapi/v1/content/extract/{task_id}',
headers={'Authorization': f'Bearer {os.environ["LISTENHUB_API_KEY"]}'}
)
data = result.json()
print('Status:', data['data']['status'])Path parameters:
| Field | Type | Description |
|---|---|---|
taskId | string | 24-character hex string returned from the create endpoint |
Response when complete (status: "completed"):
{
"code": 0,
"message": "success",
"data": {
"taskId": "{taskId}",
"status": "completed",
"createdAt": "2025-04-09T12:00:00Z",
"data": {
"content": "The extracted article text content...",
"metadata": {
"title": "Article Title",
"author": "Author Name",
"publishedAt": "2025-04-01T08:00:00Z"
},
"references": [
"https://example.com/related-article"
]
},
"credits": 5,
"failCode": null,
"message": null
}
}While in progress, status is "processing" and data is null. On failure, status is "failed" and failCode / message describe the error.
Response fields:
| Field | Type | Description |
|---|---|---|
taskId | string | Task identifier |
status | string | processing, completed, or failed |
createdAt | string | ISO 8601 timestamp |
data | object | Extracted content (null until completed) |
data.content | string | Extracted text content |
data.metadata | object | Page metadata (title, author, etc.) |
data.references | array | Referenced URLs found in the content |
credits | integer | Credits consumed |
failCode | string | Error code (null on success) |
message | string | Error message (null on success) |
Notes
Twitter/X profile URLs:
- When the source URL is a Twitter/X profile (e.g.
https://x.com/username), the API fetches recent tweets - Use
options.twitter.countto control how many tweets to retrieve (1-100, default 20) - This option is ignored for non-Twitter URLs
WeChat Official Accounts:
- Use the full
mp.weixin.qq.com/s/...article URL - Extracted content includes the article body text and metadata
Task lifecycle:
| Status | Description |
|---|---|
processing | Extraction is in progress |
completed | Content extracted successfully |
failed | Extraction failed -- check failCode and message |
Polling recommendations:
- Initial wait: 5 seconds after task creation
- Poll interval: 5 seconds
- Typical completion time: 10-30 seconds depending on URL complexity