Crawl

The entire website. All at once.

Give Crawl a root URL and get back every page's content, in parallel, at scale, in the format you want. Markdown for LLMs, JSON for pipelines, or raw HTML.

Parallel scraping: up to 50 pages at onceJS rendering per-page when neededConfigurable depth and path filters

Start crawling View docs

The problem

Scraping at scale shouldn't mean building infrastructure.

Orchestrating workers, managing queues, handling deduplication. Crawl handles all of it. One API call starts a full-site extraction.

The old way

Scrape pages sequentially one by one
No automatic link following
Handle rate limiting yourself
Deduplicate URLs manually
Re-implement crawl logic per project
Manage worker concurrency
No progress tracking
Stop and restart manually

With Crawl

Parallel scraping built in
Link following automatic
Rate limits respected automatically
Deduplication handled
Single API call to start
50 concurrent workers
Real-time job progress
Resume interrupted crawls

9,241 pages · 847 MB extracted

markdown + metadata · S3-ready export

Crawl at scale

Thousands of pages. All running in parallel.

crawl · docs.stripe.com · running

9,241 / ~12,000 pages77%

16 parallel workers

●

○

47/s

pages/sec

847 MB

extracted

3m 17s

elapsed

How it works

One call. Every page scraped.

Start with a URL

POST the root URL with your settings: depth limit, path filters, output format. A crawl job is created and starts immediately.

We crawl in parallel

Up to 50 pages are scraped simultaneously. Links are followed automatically. Rate limits, robots.txt, and duplicate detection are all handled.

Get all the content

Poll the job ID for results as they stream in, or receive a webhook when complete. Each page returns its URL, status, and full content.

parallel workers

per crawl job

0K+

pages per job

with pagination

per page

median scrape time

Quick start

Running in minutes.

Start a crawl job with one POST request. Poll for results or set a webhook. No infrastructure to manage.

POST a root URL to /v1/crawl to start

Poll the job ID for streaming results

import requests, time

API_KEY = "your_api_key"

# Start crawl job
job = requests.post(
    "https://api.anakin.io/v1/crawl",
    headers={"X-API-Key": API_KEY},
    json={
        "url": "https://docs.example.com",
        "max_pages": 500,
        "format": "markdown",
        "max_depth": 3,
    },
).json()

# Poll for results
while True:
    status = requests.get(
        f"https://api.anakin.io/v1/crawl/{job['id']}",
        headers={"X-API-Key": API_KEY},
    ).json()
    print(f"{status['completed']} / {status['total']} pages")
    if status["status"] == "completed":
        break
    time.sleep(5)

Authenticate with the X-API-Key header.Get API key

FAQ

Common questions