Dify
AnakinScraper plugin for Dify
Web scraping and AI-powered search plugin for Dify. Extract data from any website, perform intelligent web searches, and conduct deep research — all inside your Dify workflows and agents.
| Marketplace | Dify Plugin Store |
| Source | GitHub |
| Type | Tool Plugin |
| Version | 0.0.1 |
| Tools | 5 |
Key features
- Anti-detection — Proxy routing across 207 countries prevents blocking
- Intelligent Caching — Up to 30x faster on repeated requests
- AI Extraction — Convert any webpage into structured JSON
- Browser Automation — Full headless Chrome support for SPAs and JS-heavy sites
- Session Management — Authenticated scraping with encrypted session storage (AES-256-GCM)
- Batch Processing — Submit multiple URLs in a single request
Setup
1. Get your API key
- Sign up at anakin.io/signup
- Go to your Dashboard
- Copy your API key (starts with
ask_)
2. Install in Dify
- Install the Anakin plugin in your Dify workspace from the Plugin Store
- Go to Plugins > Anakin > Configure
- Enter a name for the authorization (e.g., "Production")
- Paste your API key
- Click Save
Tools
1. URL Scraper
Scrapes a single URL, returning HTML, markdown, and optionally structured JSON.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | — | Target URL to scrape (HTTP/HTTPS) |
country | string | No | us | Proxy location from 207 countries |
use_browser | boolean | No | false | Enable headless Chrome for JavaScript-heavy sites |
generate_json | boolean | No | false | Use AI to extract structured data |
session_id | string | No | — | Browser session ID for authenticated pages |
Response includes: Raw HTML, cleaned HTML, markdown conversion, structured JSON (if generate_json enabled), cache status, timing metrics.
2. Batch URL Scraper
Scrape up to 10 URLs simultaneously in parallel.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
urls | string | Yes | — | Comma-separated list of URLs (1–10) |
country | string | No | us | Proxy location from 207 countries |
use_browser | boolean | No | false | Enable headless Chrome for JavaScript-heavy sites |
generate_json | boolean | No | false | Use AI to extract structured data from each page |
3. AI Search
Synchronous AI-powered web search returning results with citations and relevance scoring. Results are returned immediately without polling.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
prompt | string | Yes | — | Search query or question |
limit | number | No | 5 | Maximum results to return |
Response includes: Array of results with URLs, titles, snippets, publication dates, last updated timestamps.
4. Deep Research (Agentic Search)
Multi-stage automated research pipeline combining search, scraping, and AI synthesis. Takes 1–5 minutes.
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Research question or topic |
Response includes: AI-generated comprehensive answers, summaries, structured findings, citations with source URLs, scraped source data, processing metrics.
5. Custom Web Scraper
Execute pre-configured scraper templates for domain-specific structured data extraction.
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Target URL to scrape |
scraper_code | string | Yes | Configuration identifier |
scraper_params | string | No | JSON string of scraper-specific parameters |
Response: Structured JSON matching the scraper's defined schema.
Examples
In a Workflow
- Add a Tool node to your workflow
- Select Anakin and choose your tool
- Configure parameters (e.g., enter URL, enable
generate_json) - Connect to the next node for processing
In an Agent
- Create an Agent app
- Add Anakin tools to the agent's toolset
- The agent will automatically use scraping/search based on user queries
Scraping with AI extraction
Tool: URL Scraper
URL: https://example.com/products
Generate JSON: trueReturns structured product data automatically extracted by AI.
Authenticated scraping
Tool: URL Scraper
URL: https://example.com/dashboard
Session ID: your-session-id-from-dashboard
Use Browser: trueScrapes pages that require login using your saved browser session. Learn more about Browser Sessions.
Processing times
| Tool | Type | Typical Duration |
|---|---|---|
| URL Scraper | Async | 3–15 seconds |
| Batch Scraper | Async | 5–30 seconds |
| AI Search | Sync | Immediate |
| Deep Research | Async | 1–5 minutes |
| Custom Scraper | Async | 3–15 seconds |
Troubleshooting
| Code | Meaning | Action |
|---|---|---|
| 400 | Invalid parameters | Check your input |
| 401 | Invalid API key | Verify your API key in plugin settings |
| 402 | Plan upgrade required | Upgrade at Pricing |
| 404 | Job not found | Job may have expired |
| 429 | Rate limit exceeded | Wait and retry |
| 5xx | Server error | Retry with backoff |
Country codes
Proxy routing supports 207 countries. Common codes:
| Code | Country |
|---|---|
us | United States (default) |
gb | United Kingdom |
de | Germany |
fr | France |
jp | Japan |
au | Australia |