GET Get Crawl Result

Poll for crawl job status and retrieve scraped page content

GEThttps://api.anakin.io/v1/crawl/{id}

Retrieve the status and results of a crawl job. Use this to poll for completion after submitting a crawl request.

Path Parameters

Parameter	Type	Description
`id` required	string	The job ID returned from the submit endpoint

Response

200 OK

{
  "id": "job_abc123xyz",
  "status": "completed",
  "url": "https://example.com",
  "totalPages": 3,
  "completedPages": 3,
  "results": [
    {
      "url": "https://example.com",
      "status": "completed",
      "html": "<html>...</html>",
      "markdown": "# Home page content...",
      "durationMs": 2000
    },
    {
      "url": "https://example.com/blog",
      "status": "completed",
      "html": "<html>...</html>",
      "markdown": "# Blog index...",
      "durationMs": 1500
    },
    {
      "url": "https://example.com/blog/post-1",
      "status": "failed",
      "error": "Connection timeout",
      "durationMs": 5000
    }
  ],
  "createdAt": "2024-01-01T12:00:00Z",
  "completedAt": "2024-01-01T12:00:15Z",
  "durationMs": 15000
}

Response Fields

Field	Type	Description
`status`	string	`pending`, `processing`, `completed`, or `failed`
`url`	string	The starting URL submitted for crawling
`totalPages`	number	Total pages discovered and attempted
`completedPages`	number	Pages successfully scraped
`results`	array	Per-page results. Only present when completed.
`error`	string	Error message. Only present when the entire job failed.
`durationMs`	number	Total processing time in milliseconds.

Per-Page Result Fields

Field	Type	Description
`url`	string	The page URL
`status`	string	`completed` or `failed`
`html`	string	Raw HTML content. Only when page completed.
`markdown`	string	Markdown version of the content. Only when page completed.
`error`	string	Error message. Only when page failed.
`durationMs`	number	Per-page processing time in milliseconds.

Code Examples

curl -X GET https://api.anakin.io/v1/crawl/job_abc123xyz \
  -H "X-API-Key: your_api_key"

import requests
import time

job_id = "job_abc123xyz"

while True:
    result = requests.get(
        f'https://api.anakin.io/v1/crawl/{job_id}',
        headers={'X-API-Key': 'your_api_key'}
    )
    data = result.json()

    if data['status'] == 'completed':
        print(f"Crawled {data['completedPages']}/{data['totalPages']} pages:")
        for page in data['results']:
            if page['status'] == 'completed':
                print(f"  {page['url']} — {len(page['markdown'])} chars")
            else:
                print(f"  {page['url']} — FAILED: {page['error']}")
        break
    elif data['status'] == 'failed':
        print(f"Error: {data['error']}")
        break

    time.sleep(2)

const jobId = 'job_abc123xyz';

const poll = async () => {
  const res = await fetch(`https://api.anakin.io/v1/crawl/${jobId}`, {
    headers: { 'X-API-Key': 'your_api_key' }
  });
  const data = await res.json();

  if (data.status === 'completed') {
    console.log(`Crawled ${data.completedPages}/${data.totalPages} pages:`);
    data.results.forEach(page => {
      if (page.status === 'completed') {
        console.log(`  ${page.url} — ${page.markdown.length} chars`);
      } else {
        console.log(`  ${page.url} — FAILED: ${page.error}`);
      }
    });
  } else if (data.status === 'failed') {
    console.error(data.error);
  } else {
    setTimeout(poll, 2000);
  }
};

poll();

For polling patterns, see the Polling Jobs reference.