API Reference
Complete reference for the Helix Parse API.
Base URL
https://api.feeds.onhelix.ai
Authentication
All requests require API key authentication using the Bearer token scheme:
Authorization: Bearer YOUR_API_KEY
See the Authentication Guide for details on obtaining and using API keys.
Response Envelope
All successful responses use the standard wrapper:
{
"success": true,
"data": { ... }
}
Parse Content
Extract structured content from a URL or raw HTML.
POST /parse
Request
Headers:
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json
Body:
{
"url": "https://www.bbc.com/news/technology-67988517"
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string (URL) | Conditional | URL to scrape and parse. Required if html not provided. |
html | string | Conditional | Raw HTML to parse. Required if url not provided. Max 2MB. |
title | string | No | Page title hint for extraction. Max 1,000 chars. |
jobId | string | No | Custom job ID for idempotency. Max 256 chars. |
Validation: Either url or html must be provided. Both can be provided simultaneously -- when both are present, the HTML is used for extraction (no scrape), and the URL is stored as metadata. The url field is validated as a proper URL format; invalid URLs are rejected.
Response
Status: 200 OK
{
"success": true,
"data": {
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"hasPrimaryContent": true,
"consumability": {
"isConsumable": true,
"reason": "Page contains a full news article with headline, body text, and publication metadata."
},
"primaryContent": {
"title": "Apple Vision Pro: First weekend sees steady sales at stores",
"description": "Apple's new mixed-reality headset goes on sale in the US, with steady demand reported at stores across the country.",
"author": "Zoe Kleinman",
"publisher": "BBC News",
"publishedAt": "2024-02-04T12:30:00.000Z",
"updatedAt": "2024-02-04T15:45:00.000Z",
"isSponsored": false,
"isDigest": false,
"accessRestrictionType": null,
"text": {
"simplifiedHtml": "<p>Apple's Vision Pro headset has seen steady sales during its first weekend on sale in the US, with reports of consistent demand at Apple stores across the country.</p><p>The $3,499 device, which Apple calls a \"spatial computer\", went on sale on Friday.</p><p>Some stores saw queues, though they were shorter than those seen for recent iPhone launches.</p>"
},
"video": null,
"primaryImage": {
"url": "https://ichef.bbci.co.uk/news/1024/branded_news/1234/production/_132567890_visionpro.jpg",
"caption": "A customer tries on the Apple Vision Pro at an Apple Store",
"credit": "Getty Images"
},
"originallyPublished": {
"syndicated": false,
"domain": null,
"url": null,
"publisher": null,
"publishedAt": null
}
},
"scrape": {
"httpStatus": 200
}
}
}
Response Fields:
| Field | Type | Description |
|---|---|---|
jobId | string | Job identifier (your provided jobId or auto-generated UUID) |
hasPrimaryContent | boolean | Whether meaningful primary content was extracted |
consumability | object | Content quality assessment (see below) |
primaryContent | object|null | Extracted content (see below). Null if nothing extracted. |
scrape | object | HTTP scrape metadata. Present only in URL mode; absent in HTML mode. |
Consumability Object:
| Field | Type | Description |
|---|---|---|
isConsumable | boolean | Whether the page has meaningful standalone content |
reason | string | Natural language explanation of the assessment |
Primary Content Object:
| Field | Type | Description |
|---|---|---|
title | string|null | Page or article title |
description | string|null | Summary or meta description |
author | string|null | Content author |
publisher | string|null | Publishing organization |
publishedAt | string|null | Publication date (ISO 8601) |
updatedAt | string|null | Last update date (ISO 8601) |
isSponsored | boolean|null | Whether the content is sponsored |
isDigest | boolean|null | Whether the page is a digest of other content |
accessRestrictionType | string[]|null | Detected access restrictions (see below) |
text | object|null | Body content (see below) |
video | object|null | Video content (see below) |
primaryImage | object|null | Primary image (see below) |
originallyPublished | object|null | Original source for syndicated content (see below) |
Text Object:
| Field | Type | Description |
|---|---|---|
simplifiedHtml | string | Simplified HTML of the body content |
Video Object:
| Field | Type | Description |
|---|---|---|
url | string | Video URL |
duration | string | Video duration |
Primary Image Object:
| Field | Type | Description |
|---|---|---|
url | string | Image URL |
caption | string|null | Image caption |
credit | string|null | Image credit or attribution |
Originally Published Object:
| Field | Type | Description |
|---|---|---|
syndicated | boolean|null | Whether the content is syndicated |
domain | string|null | Domain of the original publication |
url | string|null | URL of the original publication |
publisher | string|null | Name of the original publisher |
publishedAt | string|null | Original publication date (ISO 8601) |
Access Restriction Types:
| Value | Description |
|---|---|
subscription-required | Content behind a paywall |
bot-detected | Bot detection challenge served |
captcha | CAPTCHA presented |
adblock-detected | Ad blocker detection blocked content |
login-required | Login required to view content |
geo | Geographic restriction |
other | Other restriction |
Scrape Object:
| Field | Type | Description |
|---|---|---|
httpStatus | number | HTTP status code from the page scrape |
Errors
400 Bad Request errors use a nested error envelope:
{
"success": false,
"error": {
"code": "INVALID_BODY",
"message": "Invalid request body: Either url or html must be provided"
}
}
401 Unauthorized errors use a flat error format:
{
"success": false,
"error": "Authentication failed for strategy: api-key",
"code": "AUTHENTICATION_FAILED"
}
Error Cases:
400 Bad Request -- INVALID_BODY (validation errors):
- Missing both
urlandhtml - Invalid URL format
- HTML exceeds 2MB size limit
- Title exceeds 1,000 character limit
400 Bad Request -- VALIDATION_FAILED (workflow errors):
- "Previous parse job failed. Please retry with a new jobId."
- "Parse job failed: {cause}. Please retry with a new jobId."
401 Unauthorized -- AUTHENTICATION_FAILED:
- Missing or invalid API key
Idempotency
The jobId parameter enables request deduplication, scoped per organization.
- When you provide a
jobId, any subsequent request with the samejobIdwithin the same organization reconnects to the existing workflow result rather than starting a new parse. - Without a
jobId, a random UUID is generated for each request. - If a previous job with the same
jobIdfailed, the API returns a400error asking you to retry with a newjobId.
Examples
curl -- URL mode:
curl -X POST https://api.feeds.onhelix.ai/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.bbc.com/news/technology-67988517"
}'
curl -- HTML mode:
curl -X POST https://api.feeds.onhelix.ai/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"html": "<html><head><title>Example Article</title></head><body><article><h1>Breaking News</h1><p>Article content here...</p></article></body></html>",
"title": "Example Article"
}'
JavaScript:
const API_KEY = process.env.HELIX_API_KEY;
const BASE_URL = 'https://api.feeds.onhelix.ai';
async function parseUrl(url) {
try {
const response = await fetch(`${BASE_URL}/parse`, {
method: 'POST',
headers: {
Authorization: `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ url }),
});
if (!response.ok) {
const error = await response.json();
throw new Error(
`Parse failed (${response.status}): ${JSON.stringify(error)}`
);
}
const { data } = await response.json();
if (!data.hasPrimaryContent) {
console.log('No primary content extracted');
return null;
}
return data.primaryContent;
} catch (error) {
console.error('Parse request failed:', error.message);
throw error;
}
}
Python:
import os
import time
import requests
API_KEY = os.environ["HELIX_API_KEY"]
BASE_URL = "https://api.feeds.onhelix.ai"
def parse_url(url, max_retries=3):
for attempt in range(max_retries):
try:
response = requests.post(
f"{BASE_URL}/parse",
json={"url": url},
headers={"Authorization": f"Bearer {API_KEY}"},
)
response.raise_for_status()
data = response.json()["data"]
if not data["hasPrimaryContent"]:
return None
return data["primaryContent"]
except requests.exceptions.HTTPError as e:
error_body = e.response.json()
print(f"HTTP {e.response.status_code}: {error_body}")
raise
except requests.exceptions.RequestException:
if attempt == max_retries - 1:
raise
time.sleep(2**attempt)
return None
Next Steps
- Quickstart: Get parsing working in under 2 minutes
- Overview: How Parse works and what it returns
- Authentication: API key best practices