Python

Scrape your first page from Python using the requests library — works with any Python 3.8+ project.

Submit a scrape, poll for the result, and handle transient errors — using requests, the standard Python HTTP library.


Authentication

Set your API key as an environment variable. Get a key from the Dashboard.

export ANAKIN_API_KEY=ak-your-key-here

The base URL is https://api.anakin.io/v1. Every request authenticates via the X-API-Key header.


Install

requests is the de facto Python HTTP library. Install once:

pip install requests

If you'd rather avoid any dependency, the same logic works with urllib.request from the standard library — but requests is cleaner and present in nearly every Python project.


Scrape a page

Save as quickstart.py:

import os
import time
import requests

BASE = "https://api.anakin.io/v1"
API_KEY = os.environ.get("ANAKIN_API_KEY")
if not API_KEY:
    raise SystemExit("ANAKIN_API_KEY is not set")

session = requests.Session()
session.headers.update({"X-API-Key": API_KEY, "Content-Type": "application/json"})


def request(method: str, path: str, json=None):
    try:
        resp = session.request(method, BASE + path, json=json, timeout=30)
        return resp.json()
    except requests.RequestException:
        return None  # caller retries on None


def scrape(url: str) -> dict:
    submitted = request("POST", "/url-scraper", {"url": url})
    job_id = submitted["jobId"]

    for _ in range(60):
        job = request("GET", f"/url-scraper/{job_id}")
        if job is None:
            time.sleep(3)  # retry transient errors
            continue
        if job["status"] == "completed":
            return job
        if job["status"] == "failed":
            raise RuntimeError(f"scrape failed: {job.get('error')}")
        time.sleep(3)
    raise TimeoutError("timed out after 3 minutes")


if __name__ == "__main__":
    job = scrape("https://example.com")
    print(job["markdown"])

Run it:

python quickstart.py

What this does

  1. Submits https://example.com to /url-scraper and gets back a jobId.
  2. Polls /url-scraper/{jobId} every 3 seconds (up to 60 attempts = 3 minutes).
  3. Retries transient RequestException errors silently — only surfaces real failures.
  4. Prints the final markdown when the job completes.

Most jobs finish in 3–15 seconds.


Go further

Extract structured JSON with AI

Replace the submit body with generateJson: True to have AI return structured data:

submitted = request("POST", "/url-scraper", {
    "url": "https://news.ycombinator.com",
    "generateJson": True,
})

The completed response includes a generatedJson field with structured data inferred from the page.

Scrape JavaScript-heavy sites

For SPAs and dynamically-loaded pages, add useBrowser: True:

submitted = request("POST", "/url-scraper", {
    "url": "https://example.com/spa",
    "useBrowser": True,
})

Only use browser mode when needed — standard scraping is faster and cheaper.


Use it from Django / FastAPI

Wrap scrape() in a Celery / RQ / Dramatiq task — the polling loop blocks for up to 3 minutes per URL, so background execution is the natural fit. For FastAPI specifically, swap requests for httpx.AsyncClient and await asyncio.sleep(3) to keep the event loop free:

# app/tasks/scrape.py
from celery import shared_task

@shared_task
def scrape_url(url: str):
    job = scrape(url)
    Page.objects.create(url=url, markdown=job["markdown"])

Next steps