Java

Scrape your first page from Java using the built-in HttpClient — works with any Java 11+ project.

Submit a scrape, poll for the result, and handle transient errors — using java.net.http.HttpClient from the Java 11+ standard library plus Jackson for JSON parsing.


Authentication

Set your API key as an environment variable. Get a key from the Dashboard.

export ANAKIN_API_KEY=ak-your-key-here

The base URL is https://api.anakin.io/v1. Every request authenticates via the X-API-Key header.


Install

HttpClient is in the Java 11+ standard library. Jackson is the de facto JSON library and is already on the classpath in nearly every Spring Boot, Quarkus, or Micronaut project.

Maven:

<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-databind</artifactId>
  <version>2.17.0</version>
</dependency>

Gradle:

implementation 'com.fasterxml.jackson.core:jackson-databind:2.17.0'

Scrape a page

Save as Quickstart.java:

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;
import java.util.Map;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

public class Quickstart {
    static final String BASE = "https://api.anakin.io/v1";
    static final String API_KEY = System.getenv("ANAKIN_API_KEY");
    static final HttpClient HTTP = HttpClient.newBuilder()
        .connectTimeout(Duration.ofSeconds(30)).build();
    static final ObjectMapper JSON = new ObjectMapper();

    static JsonNode request(String method, String path, Object body) throws Exception {
        var publisher = body == null
            ? HttpRequest.BodyPublishers.noBody()
            : HttpRequest.BodyPublishers.ofString(JSON.writeValueAsString(body));
        var req = HttpRequest.newBuilder(URI.create(BASE + path))
            .header("X-API-Key", API_KEY)
            .header("Content-Type", "application/json")
            .method(method, publisher).build();
        var resp = HTTP.send(req, HttpResponse.BodyHandlers.ofString());
        return JSON.readTree(resp.body());
    }

    static JsonNode scrape(String url) throws Exception {
        var submitted = request("POST", "/url-scraper", Map.of("url", url));
        var jobId = submitted.get("jobId").asText();
        for (int i = 0; i < 60; i++) {
            JsonNode job;
            try { job = request("GET", "/url-scraper/" + jobId, null); }
            catch (Exception e) { Thread.sleep(3000); continue; } // retry transient errors
            switch (job.get("status").asText()) {
                case "completed": return job;
                case "failed":
                    throw new RuntimeException("scrape failed: " + job.path("error").asText(""));
            }
            Thread.sleep(3000);
        }
        throw new RuntimeException("timed out after 3 minutes");
    }

    public static void main(String[] args) throws Exception {
        if (API_KEY == null) throw new RuntimeException("ANAKIN_API_KEY is not set");
        var job = scrape("https://example.com");
        System.out.println(job.get("markdown").asText());
    }
}

Run it (with Maven/Gradle handling Jackson on the classpath):

mvn compile exec:java -Dexec.mainClass=Quickstart
# or with Gradle: ./gradlew run

What this does

  1. Submits https://example.com to /url-scraper and gets back a jobId.
  2. Polls /url-scraper/{jobId} every 3 seconds (up to 60 attempts = 3 minutes).
  3. Retries transient I/O errors silently — only surfaces real failures.
  4. Prints the final markdown when the job completes.

Most jobs finish in 3–15 seconds.


Go further

Extract structured JSON with AI

Replace the submit body with generateJson: true to have AI return structured data:

var submitted = request("POST", "/url-scraper", Map.of(
    "url",          "https://news.ycombinator.com",
    "generateJson", true
));

The completed response includes a generatedJson field with structured data inferred from the page.

Scrape JavaScript-heavy sites

For SPAs and dynamically-loaded pages, add useBrowser: true:

var submitted = request("POST", "/url-scraper", Map.of(
    "url",        "https://example.com/spa",
    "useBrowser", true
));

Only use browser mode when needed — standard scraping is faster and cheaper.


Use it from Spring Boot

Wrap the request and scrape methods in a @Service and call from a @Async method or a Spring Batch job — the polling loop blocks for up to 3 minutes per URL, so background execution is the natural fit. For non-blocking use, switch to HTTP.sendAsync() and chain the polling with CompletableFuture.


Next steps