Crawl

The entire website. All at once.

Give Crawl a root URL and get back every page's content, in parallel, at scale, in the format you want. Markdown for LLMs, JSON for pipelines, or raw HTML.

Parallel scraping: up to 50 pages at onceJS rendering per-page when neededConfigurable depth and path filters
docs.example.com12345678910111248 / 312 pagescrawl.anakin.io / v1live

The problem

Scraping at scale shouldn't mean building infrastructure.

Orchestrating workers, managing queues, handling deduplication. Crawl handles all of it. One API call starts a full-site extraction.

The old way

  • Scrape pages sequentially one by one
  • No automatic link following
  • Handle rate limiting yourself
  • Deduplicate URLs manually
  • Re-implement crawl logic per project
  • Manage worker concurrency
  • No progress tracking
  • Stop and restart manually
C

With Crawl

  • Parallel scraping built in
  • Link following automatic
  • Rate limits respected automatically
  • Deduplication handled
  • Single API call to start
  • 50 concurrent workers
  • Real-time job progress
  • Resume interrupted crawls

9,241 pages · 847 MB extracted

markdown + metadata · S3-ready export

Crawl at scale

Thousands of pages. All running in parallel.

crawl · docs.stripe.com · running
9,241 / ~12,000 pages77%

16 parallel workers

47/s
pages/sec
847 MB
extracted
3m 17s
elapsed

How it works

One call. Every page scraped.

01

Start with a URL

POST the root URL with your settings: depth limit, path filters, output format. A crawl job is created and starts immediately.

02

We crawl in parallel

Up to 50 pages are scraped simultaneously. Links are followed automatically. Rate limits, robots.txt, and duplicate detection are all handled.

03

Get all the content

Poll the job ID for results as they stream in, or receive a webhook when complete. Each page returns its URL, status, and full content.

0

parallel workers

per crawl job

0K+

pages per job

with pagination

0s

per page

median scrape time

Quick start

Running in minutes.

Start a crawl job with one POST request. Poll for results or set a webhook. No infrastructure to manage.

1
Sign up and get your API key
2
POST a root URL to /v1/crawl to start
3
Poll the job ID for streaming results
import requests, time

API_KEY = "your_api_key"

# Start crawl job
job = requests.post(
    "https://api.anakin.io/v1/crawl",
    headers={"X-API-Key": API_KEY},
    json={
        "url": "https://docs.example.com",
        "max_pages": 500,
        "format": "markdown",
        "max_depth": 3,
    },
).json()

# Poll for results
while True:
    status = requests.get(
        f"https://api.anakin.io/v1/crawl/{job['id']}",
        headers={"X-API-Key": API_KEY},
    ).json()
    print(f"{status['completed']} / {status['total']} pages")
    if status["status"] == "completed":
        break
    time.sleep(5)
Authenticate with the X-API-Key header.Get API key

FAQ

Common questions

Crawl

Every page. Every word.
Ready for your pipeline.

Stop building crawlers. One call fetches the entire site.