Python
Scrape your first page from Python using the requests library — works with any Python 3.8+ project.
Submit a scrape, poll for the result, and handle transient errors — using requests, the standard Python HTTP library.
Authentication
Set your API key as an environment variable. Get a key from the Dashboard.
export ANAKIN_API_KEY=ak-your-key-hereThe base URL is https://api.anakin.io/v1. Every request authenticates via the X-API-Key header.
Install
requests is the de facto Python HTTP library. Install once:
pip install requestsIf you'd rather avoid any dependency, the same logic works with urllib.request from the standard library — but requests is cleaner and present in nearly every Python project.
Scrape a page
Save as quickstart.py:
import os
import time
import requests
BASE = "https://api.anakin.io/v1"
API_KEY = os.environ.get("ANAKIN_API_KEY")
if not API_KEY:
raise SystemExit("ANAKIN_API_KEY is not set")
session = requests.Session()
session.headers.update({"X-API-Key": API_KEY, "Content-Type": "application/json"})
def request(method: str, path: str, json=None):
try:
resp = session.request(method, BASE + path, json=json, timeout=30)
return resp.json()
except requests.RequestException:
return None # caller retries on None
def scrape(url: str) -> dict:
submitted = request("POST", "/url-scraper", {"url": url})
job_id = submitted["jobId"]
for _ in range(60):
job = request("GET", f"/url-scraper/{job_id}")
if job is None:
time.sleep(3) # retry transient errors
continue
if job["status"] == "completed":
return job
if job["status"] == "failed":
raise RuntimeError(f"scrape failed: {job.get('error')}")
time.sleep(3)
raise TimeoutError("timed out after 3 minutes")
if __name__ == "__main__":
job = scrape("https://example.com")
print(job["markdown"])Run it:
python quickstart.pyWhat this does
- Submits
https://example.comto/url-scraperand gets back ajobId. - Polls
/url-scraper/{jobId}every 3 seconds (up to 60 attempts = 3 minutes). - Retries transient
RequestExceptionerrors silently — only surfaces real failures. - Prints the final
markdownwhen the job completes.
Most jobs finish in 3–15 seconds.
Go further
Extract structured JSON with AI
Replace the submit body with generateJson: True to have AI return structured data:
submitted = request("POST", "/url-scraper", {
"url": "https://news.ycombinator.com",
"generateJson": True,
})The completed response includes a generatedJson field with structured data inferred from the page.
Scrape JavaScript-heavy sites
For SPAs and dynamically-loaded pages, add useBrowser: True:
submitted = request("POST", "/url-scraper", {
"url": "https://example.com/spa",
"useBrowser": True,
})Only use browser mode when needed — standard scraping is faster and cheaper.
Use it from Django / FastAPI
Wrap scrape() in a Celery / RQ / Dramatiq task — the polling loop blocks for up to 3 minutes per URL, so background execution is the natural fit. For FastAPI specifically, swap requests for httpx.AsyncClient and await asyncio.sleep(3) to keep the event loop free:
# app/tasks/scrape.py
from celery import shared_task
@shared_task
def scrape_url(url: str):
job = scrape(url)
Page.objects.create(url=url, markdown=job["markdown"])