The entire website. All at once.
Give Crawl a root URL and get back every page's content, in parallel, at scale, in the format you want. Markdown for LLMs, JSON for pipelines, or raw HTML.
The problem
Scraping at scale shouldn't mean building infrastructure.
Orchestrating workers, managing queues, handling deduplication. Crawl handles all of it. One API call starts a full-site extraction.
The old way
- Scrape pages sequentially one by one
- No automatic link following
- Handle rate limiting yourself
- Deduplicate URLs manually
- Re-implement crawl logic per project
- Manage worker concurrency
- No progress tracking
- Stop and restart manually
With Crawl
- Parallel scraping built in
- Link following automatic
- Rate limits respected automatically
- Deduplication handled
- Single API call to start
- 50 concurrent workers
- Real-time job progress
- Resume interrupted crawls
9,241 pages · 847 MB extracted
markdown + metadata · S3-ready export
Crawl at scale
Thousands of pages. All running in parallel.
16 parallel workers
How it works
One call. Every page scraped.
Start with a URL
POST the root URL with your settings: depth limit, path filters, output format. A crawl job is created and starts immediately.
We crawl in parallel
Up to 50 pages are scraped simultaneously. Links are followed automatically. Rate limits, robots.txt, and duplicate detection are all handled.
Get all the content
Poll the job ID for results as they stream in, or receive a webhook when complete. Each page returns its URL, status, and full content.
parallel workers
per crawl job
pages per job
with pagination
per page
median scrape time
Quick start
Running in minutes.
Start a crawl job with one POST request. Poll for results or set a webhook. No infrastructure to manage.
import requests, time
API_KEY = "your_api_key"
# Start crawl job
job = requests.post(
"https://api.anakin.io/v1/crawl",
headers={"X-API-Key": API_KEY},
json={
"url": "https://docs.example.com",
"max_pages": 500,
"format": "markdown",
"max_depth": 3,
},
).json()
# Poll for results
while True:
status = requests.get(
f"https://api.anakin.io/v1/crawl/{job['id']}",
headers={"X-API-Key": API_KEY},
).json()
print(f"{status['completed']} / {status['total']} pages")
if status["status"] == "completed":
break
time.sleep(5)X-API-Key header.Get API keyFAQ
Common questions
Every page. Every word.
Ready for your pipeline.
Stop building crawlers. One call fetches the entire site.