Error Responses

Complete reference for HTTP status codes, error codes, retry strategy, and troubleshooting

Every AnakinScraper endpoint returns errors in a consistent JSON shape. This page is the canonical reference — what we return, what each code means, when to retry, and how to diagnose the most common failures.

For per-endpoint rate limits and the recommended request pacing, see Rate Limits.


Error response format

All errors return a JSON body with two fields:

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. Please try again later."
}
FieldTypeDescription
errorstringA short, machine-readable code. Stable across releases — safe to switch on in code.
messagestringA human-readable explanation. Subject to copy changes — log it but don't parse it.

Async jobs are different. When a polled job ends in status: "failed", the failure shape lives inside the job response (error field, free-form string). See Async job failures below.


HTTP status codes

StatusMeaningRetry?
200Success — synchronous endpoint returned a result, or polled job is complete.
202Accepted — async job was queued, or polled job is still in progress.
400Bad Request — your request body or parameters are invalid.No — fix the request.
401Unauthorized — API key missing, malformed, or revoked.No — check the key.
402Payment Required — insufficient credits for the operation.No — top up credits.
403Forbidden — the resource exists but you don't own it.No.
404Not Found — the job ID, session, or scraper doesn't exist.No.
409Conflict — resource is locked or already exists (e.g., session in use, duplicate name).After resolving the conflict.
422Unprocessable Entity — request was valid but the resource isn't ready (e.g., session not yet saved).After the prerequisite is met.
429Too Many Requests — rate limit exceeded.Yes — see Rate limit handling.
500Internal Server Error — unexpected failure on our side.Yes — exponential backoff.
502Bad Gateway — a downstream service (browser, CDP proxy) is unreachable.Yes — exponential backoff.
503Service Unavailable — a required service is temporarily down or unconfigured.Yes — wait 30–60s.

Error code catalog

The error field uses a fixed set of codes. The table below covers every code returned by the public API.

Validation & input

CodeHTTPWhen you'll see itWhat to do
invalid_request400Body is not valid JSON, a required field is missing, or a value is out of range (e.g., depth > 5, batch with 0 or >10 URLs, prompt >8KB, schema >50KB).Inspect message for the specific field, fix the request, and resubmit.
invalid_url400A URL in a batch request is malformed.Fix the URL. The message indicates the index.
invalid_job_type400The job_type field on POST /v1/request doesn't match a registered handler.Use a supported value (url_scraper, crawl, map, agentic_search, search, web_scraper).

Auth & authorization

CodeHTTPWhen you'll see itWhat to do
unauthorized401No API key was sent, or the key is malformed, revoked, or belongs to a deleted user.Send a valid key in X-API-Key (or one of the accepted header variants). Generate a new key in the dashboard if needed.
forbidden403The resource (job, session, scraper) exists but belongs to a different user.Use a job ID from your own account.

Credits

CodeHTTPWhen you'll see itWhat to do
insufficient_credits402Account balance is below the cost of the operation. The message includes the cost and your current balance.Top up credits in Billing or upgrade your plan.

Rate limiting

CodeHTTPWhen you'll see itWhat to do
rate_limit_exceeded429You exceeded the per-endpoint rate limit.See Rate limit handling.

Resource state

CodeHTTPWhen you'll see itWhat to do
not_found404Job ID, session ID, or scraper ID doesn't exist.Verify the ID. Job IDs are valid for 30 days.
session_not_saved422You tried to attach a saved browser session before its storage state was uploaded.Run the manual save flow first (see Browser Sessions).
session_in_use409A saved session is already attached to an active automation.Wait for the other run to finish, or use a different session.
duplicate_name409A session name is already taken for this user.Use a unique name.

Server-side

CodeHTTPWhen you'll see itWhat to do
server_error500Unhandled error in our handler — usually a database or internal service issue.Retry with backoff. If it persists, contact support with the request ID.
queue_error500Failed to enqueue the job (SQS unavailable or misconfigured).Retry with backoff.
configuration_error500A required service-side config is missing for this endpoint.Retry; if persistent, contact support.
internal_error500Generic catch-all for unexpected failures.Retry with backoff.
search_error500Upstream search provider (Perplexity) returned an error.Retry with backoff; reword the prompt if persistent.
service_unavailable503A dependent service (browser AI, CDP proxy, scraper generator) is offline.Retry after 30–60 seconds.

Note on format consistency. A small number of older endpoints — /v1/browser-connect, /v1/ai/evaluate, and a few scraper-management routes — currently return errors using slight variations of the format above (e.g., omitting message, or using a Fiber default shape {"statusCode": 400, "message": "..."} for validation errors). Treat them as still conforming to the principle: a string error field is always present, and the HTTP status is authoritative.

Accepted API key headers

The API accepts the key under any of the following headers (and a few query params for WebSocket endpoints), in priority order:

X-API-Key, X-Api-Key, Api-Key, API-Key, X-Access-Key, Access-Key, apikey, api_key, Authorization (with Bearer , API-Key , ApiKey prefix or raw).

For /v1/browser-connect (WebSocket): ?api_key=, ?apikey=, or ?token= query parameters also work.


Async job failures

For async endpoints (/v1/url-scraper, /v1/agentic-search, /v1/map, /v1/crawl, Wire's /v1/holocron/task), HTTP status 200/202 only confirms that polling is working. The actual outcome lives in the status field of the job response:

statusMeaning
pendingQueued, not yet picked up by a worker.
processingA worker is actively running the job.
completedFinished successfully — results are in the response.
failedThe job ran but could not produce a result. See error.

A failed job response looks like this:

{
  "id": "job-abc123",
  "status": "failed",
  "error": "Blocked by website (HTTP 403)",
  "createdAt": "2025-04-30T18:12:04Z",
  "completedAt": "2025-04-30T18:12:34Z",
  "durationMs": 30000
}

The error field is a free-form, human-readable string. Common substrings to switch on if you must:

SubstringCauseSuggested fix
Blocked by website, HTTP 403, HTTP 429, bot detection, CAPTCHAAnti-bot protection.Set useBrowser: true and/or specify a country. Try a browser session for sites that require login.
Connection timeout, timeoutPage didn't finish loading in time.For SPAs, set useBrowser: true and increase wait time.
DNS resolution failed, no such hostThe domain can't be resolved.Verify the URL is reachable.
TLS, SSLCertificate validation failure.Confirm the target uses a trusted certificate.
Invalid URLMalformed URL passed all the way through.Pre-validate URLs client-side.

Batch jobs. A batch URL scraper job is completed if any child finishes — partial failures don't fail the parent. Iterate results[] and check each child's status and error.


Retry guidance

When to retry

StatusRetry?Why
400, 401, 402, 403, 404, 409, 422NoThe request itself is the problem. Retrying will return the same error.
429YesTransient — the bucket refills. Read Retry-After if present, otherwise back off.
500, 502, 503YesTransient server-side issue. Cap retries at 3–5 and use exponential backoff with jitter.
Network errors (no response)YesTreat the same as 5xx.

Jitter spreads retries from many clients so a thundering herd doesn't synchronize. Cap the total wait so a stuck worker fails fast instead of looping forever.

import random
import time
import requests

RETRYABLE = {429, 500, 502, 503}
MAX_ATTEMPTS = 5
BASE_DELAY = 1.0  # seconds
MAX_DELAY = 30.0

def request_with_retry(method, url, *, headers=None, json=None):
    """POST/GET with exponential backoff + jitter on retryable failures."""
    for attempt in range(MAX_ATTEMPTS):
        try:
            response = requests.request(method, url, headers=headers, json=json, timeout=30)
        except requests.RequestException:
            if attempt == MAX_ATTEMPTS - 1:
                raise
            time.sleep(_backoff(attempt))
            continue

        if response.status_code not in RETRYABLE:
            return response

        # Honor server-sent Retry-After when present
        retry_after = response.headers.get("Retry-After")
        delay = float(retry_after) if retry_after else _backoff(attempt)

        if attempt == MAX_ATTEMPTS - 1:
            return response  # caller decides what to do

        time.sleep(delay)

    return response


def _backoff(attempt: int) -> float:
    """Full-jitter exponential backoff, capped at MAX_DELAY."""
    cap = min(MAX_DELAY, BASE_DELAY * (2 ** attempt))
    return random.uniform(0, cap)


# Usage
resp = request_with_retry(
    "POST",
    "https://api.anakin.io/v1/url-scraper",
    headers={"X-API-Key": "ak-your-key-here"},
    json={"url": "https://example.com"},
)
resp.raise_for_status()
print(resp.json()["jobId"])

Rate limit handling

Per-endpoint limits are documented on the Rate Limits page. The short version: most submit endpoints allow 60 requests/min per user; AI evaluation is 10/min; GET polling endpoints are not rate-limited.

Response headers

When a request is rate-limited, the API returns 429 Too Many Requests with the standard error body. Some 429 responses (notably /v1/browser-connect over the limit on concurrent CDP sessions) include a Retry-After header indicating seconds to wait:

HTTP/1.1 429 Too Many Requests
Retry-After: 5
Content-Type: application/json

{"error": "rate_limit_exceeded", "message": "Too many requests. Please try again later."}

Heads up. AnakinScraper does not currently emit the optional X-RateLimit-Limit, X-RateLimit-Remaining, or X-RateLimit-Reset headers. Don't rely on them — drive your retry loop off Retry-After (when present) or your own backoff. Surfacing these headers is on our roadmap.

Reading Retry-After

The requestWithRetry helpers above already honor Retry-After. If you only need to handle 429 specifically:

import time
import requests

resp = requests.post(
    "https://api.anakin.io/v1/url-scraper",
    headers={"X-API-Key": "ak-your-key-here"},
    json={"url": "https://example.com"},
)

if resp.status_code == 429:
    wait = int(resp.headers.get("Retry-After", "5"))
    time.sleep(wait)
    resp = requests.post(...)  # retry

Troubleshooting

The following scenarios cover the failures we see most often in support tickets.

"I'm getting 403 from the target site"

The site has bot detection. Two levers, in order of effectiveness:

  1. Switch to the browser handler. Add "useBrowser": true to your request — this routes through Camoufox (Firefox-based, fingerprint-masked) instead of plain HTTP.
  2. Set a country. Add "country": "US" (or another ISO code) — the proxy bandit will pick a residential IP from that region.

If both fail, the site likely requires a logged-in session. Use Browser Sessions to capture cookies once, then attach the session by ID.

"Timeouts on a single-page app"

Plain HTTP can't run JavaScript. Set "useBrowser": true so the scraper executes the page's JS before extracting content. For very slow SPAs, also increase "waitForSelector" or "waitMs" if your endpoint supports them.

"Schema extraction returns the wrong fields"

Agentic Search and JSON extraction are LLM-driven — better prompts and tighter schemas produce better output:

  • Be explicit in the prompt: name each field and describe its expected shape (e.g., "extract price as a number in USD, no currency symbol").
  • Provide examples in the prompt for ambiguous fields.
  • Tighten the schema. Required JSON Schema fields force the model to produce them; optional fields tend to get omitted.
  • Cap the schema at 50KB. Larger schemas are rejected with invalid_request.

"Job stuck in pending"

A few possibilities:

  • You're polling the wrong endpoint. POST /v1/url-scraper returns a jobId you poll at GET /v1/url-scraper/{id}. The list is in Polling Jobs.
  • Worker fleet is saturated. Pending → processing usually takes <5s. If it's been >60s, retry or contact support.
  • The job died silently. Stale jobs are auto-marked failed after 1 hour. If you see this, check the error field for the cause.

"402 insufficient_credits when I just topped up"

Credits are deducted on completion, but checked upfront. If you submitted a batch of 10 URLs and have 8 credits, the batch is rejected immediately even though some URLs would have come from cache (which costs 0). Top up enough for the worst case.

"Got a 401 with a brand-new key"

API keys take a few seconds to propagate. If a freshly-created key returns 401, wait 5–10 seconds and retry. If it persists, regenerate the key in the dashboard.

"Different services return slightly different error shapes"

A small number of older endpoints (notably /v1/browser-connect, /v1/ai/evaluate, and some scraper-management routes) use minor variations on the canonical format. The HTTP status is always authoritative; the body always contains a string error field. Plan your error handling around the status code first, then the error code.

"Wire job returned a 429 even though I've only made a few requests"

GET /v1/holocron/jobs/{id} is capped at 60/min per user — unlike URL Scraper, which is unlimited. If you're polling many Wire jobs in parallel, stagger them or reduce poll frequency. See Rate Limits for the per-endpoint table.

"Browser Connect closed unexpectedly"

The CDP proxy returns 429 once a single API instance has 50 concurrent CDP sessions. Pool clients across multiple instances, close sessions when done, and retry on Retry-After. If you saved a session, it must finish uploading to S3 before another connection can attach to it (otherwise you'll see session_not_saved).


Reporting unexpected errors

If you hit a 500, an unfamiliar error code, or behavior that contradicts this page:

  • Capture the request: method, URL, headers (redact the API key), body.
  • Capture the response: status, headers, body.
  • Note the time (UTC, to the second) — this lets us correlate against server logs.
  • Email support@anakin.io with the above. For Enterprise customers, see your dedicated channel.