brightdata-plugin

knowledge-work-plugins

OtherClaude Codeby sun2443

Summary

Web scraping, Google search, structured data extraction, and MCP server integration powered by Bright Data. Includes 7 skills: scrape any webpage as markdown (with bot detection/CAPTCHA bypass), search Google with structured JSON results, extract data from 40+ websites (Amazon, LinkedIn, Instagram, TikTok, YouTube, and more), orchestrate Bright Data's 60+ MCP tools, built-in best practices for Web Unlocker, SERP API, Web Scraper API, and Browser API, Python SDK best practices for the brightda...

Install to Claude Code

/plugin install brightdata-plugin@knowledge-work-plugins

Run in Claude Code. Add the marketplace first with /plugin marketplace add sun2443/designer-skills if you haven't already.

README.md

<p align="center"> <img src="https://brightdata.com/wp-content/themes/brightdata/assets/images/favicon.png" alt="Bright Data" width="80" height="80"> </p>

<h1 align="center">Bright Data Plugin for Claude Code</h1>

<p align="center"> <strong>Unlock the web with AI-powered scraping, search, and structured data</strong> </p>

<p align="center"> <a href="https://brightdata.com"><img src="https://img.shields.io/badge/Powered%20by-Bright%20Data-3D7FFC?style=for-the-badge" alt="Powered by Bright Data"></a> <a href="#license"><img src="https://img.shields.io/badge/License-MIT-10b981?style=for-the-badge" alt="MIT License"></a> <a href="#skills"><img src="https://img.shields.io/badge/Skills-21-9D97F4?style=for-the-badge" alt="21 Skills"></a> <a href="#data-feeds-skill"><img src="https://img.shields.io/badge/Datasets-40+-15C1E6?style=for-the-badge" alt="40+ Datasets"></a> <a href="#bright-data-mcp-skill"><img src="https://img.shields.io/badge/MCP_Tools-60+-FF6B35?style=for-the-badge" alt="60+ MCP Tools"></a> </p>

<p align="center"> <a href="#-quick-start">Quick Start</a> • <a href="#-skills">Skills</a> • <a href="#agent-onboarding-skill">Agent Onboarding</a> • <a href="#-data-feeds">Data Feeds</a> • <a href="#bright-data-mcp-skill">MCP</a> • <a href="#brightdata-cli-skill">CLI</a> • <a href="#competitive-intel-skill">Competitive Intel</a> • <a href="#scraper-builder-skill">Scraper Builder</a> • <a href="#scraper-studio-skill">Scraper Studio</a> • <a href="#proxy-skill">Proxy</a> • <a href="#best-practices-skill">Best Practices</a> • <a href="#python-sdk-best-practices-skill">Python SDK</a> • <a href="#javascript-sdk-best-practices-skill">JavaScript SDK</a> • <a href="#discover-api-skill">Discover API</a> • <a href="#live-research-skill">Live Research</a> • <a href="#rag-pipeline-skill">RAG Pipeline</a> • <a href="#-setup">Setup</a> • <a href="#-examples">Examples</a> </p>

---

Overview

This plugin brings Bright Data's powerful web infrastructure directly into Claude Code, enabling AI agents to:

  • Scrape any webpage as clean markdown — bypassing bot detection, CAPTCHAs, and JavaScript rendering
  • Search Google with structured JSON results — titles, links, and descriptions ready for processing
  • Extract structured data from 40+ websites — Amazon, LinkedIn, Instagram, TikTok, YouTube, and more
  • Orchestrate 60+ MCP tools — search, scrape, extract structured data, and automate browsers via Bright Data's MCP server
  • Use the Bright Data CLI — scrape, search, extract data, manage zones, and check budget directly from the terminal with brightdata / bdata
  • Run competitive intelligence — real-time competitor analysis, pricing monitoring, review mining, hiring signal analysis, and market landscape mapping using live web data
  • Write correct Bright Data code — built-in best practices for Web Unlocker, SERP API, Web Scraper API, and Browser API
  • Route requests through proxies — generate working code for Datacenter, ISP, Residential, and Mobile proxy networks, with the right network, IP pool type, targeting, and SSL setup
  • Build with the Python SDK — comprehensive guide for the brightdata-sdk package with patterns for async/sync clients, platform scrapers, SERP, datasets, and more
  • Build with the JavaScript/TypeScript SDK — comprehensive guide for the @brightdata/sdk package (bdclient) with patterns for web unlocker, platform scrapers, SERP, discover, datasets, browser automation, and Scraper Studio
  • Run the Discover API — intent-ranked, AI-relevance-scored semantic web search with parsed page content, across REST/CLI/SDK (the foundation for research and RAG)
  • Produce live research briefs — decompose a question into multi-angle Discover queries, dedup and rank by relevance, and synthesize a cited report from live web data
  • Build RAG / search-engine pipelines — use Discover as the retrieval layer for an LLM, via live web-grounded retrieval or vector-store ingestion

Built on Bright Data's Web Unlocker, SERP API, and Web Data APIs, this plugin handles the complexity of web access so your AI agents can focus on what matters.

---

Skills

| Skill | Description | |-------|-------------| | agent-onboarding | Entry point for agents new to Bright Data — one install command, then routes to the right path: live CLI tools, app integration, MCP server, auth-only, or REST without installing | | search | Search Google and get structured JSON results with titles, links, and descriptions | | scrape | Scrape any webpage as clean markdown with automatic bot detection bypass | | data-feeds | Extract structured data from 40+ websites with automatic polling | | bright-data-mcp | Orchestrate 60+ Bright Data MCP tools for search, scraping, structured extraction, and browser automation | | scraper-builder | Build production-ready scrapers for any website — guides through site analysis, API selection, selector extraction, pagination, and complete implementation. Triggers on "build a scraper for..." | | scraper-studio | Build and run AI-generated Bright Data scrapers from the terminal via bdata scraper create (generate from a natural-language description) and bdata scraper run (execute against a URL). Handles async + poll, --sync fast-path, and silent auto-fallback to batch for paginated pages | | bright-data-best-practices | Built-in reference for Web Unlocker, SERP API, Web Scraper API, and Browser API — Claude consults this automatically when writing Bright Data code | | brightdata-proxy | Generate working code to route requests through Bright Data's Datacenter, ISP, Residential, and Mobile proxy networks — network and IP-pool selection, username targeting/session params, SSL CA setup, and integrations for cURL, Python (requests/httpx/aiohttp/Scrapy), Node (fetch/axios), Playwright, Puppeteer, and Selenium | | python-sdk-best-practices | Comprehensive guide for the brightdata-sdk Python package — async/sync clients, platform scrapers, SERP, datasets, Scraper Studio, Browser API, error handling, and common patterns | | js-sdk-best-practices | Comprehensive guide for the @brightdata/sdk JavaScript/TypeScript package (bdclient, Node ≥20, ESM+CJS) — web unlocker, 11 platform scrapers, SERP, Discover, datasets, Browser API, Scraper Studio, error classes, and batch/trigger orchestration | | discover-api | Use the Discover API — intent-ranked, AI-relevance-scored semantic web search with parsed page content. Covers REST (POST/GET /discover), CLI (bdata discover), the SDKs (client.discover), the standard/zeroRanking/deep/fast modes, and the trigger→poll flow. Foundation for live-research and rag-pipeline | | live-research | Produce a deep, multi-source, cited research brief — decompose a question into multiple intent-ranked Discover queries, pull page content, dedup and rank by relevance, then synthesize a structured report with inline citations | | rag-pipeline | Build a RAG pipeline or custom search engine on top of Discover — live web-grounded retrieval for an LLM, or ingestion into a vector store (discover → chunk → embed → retrieve). Provider-agnostic JS + Python code | | brightdata-cli | Guide for using the Bright Data CLI (brightdata / bdata) to scrape, search, extract structured data from 40+ platforms, manage proxy zones, and check account budget — all from the terminal | | competitive-intel | Real-time competitive intelligence using live web data — competitor snapshots, pricing comparison, review mining, hiring signal analysis, content & SEO battles, and market landscape mapping. Replaces $15K+/yr enterprise CI tools at pennies per analysis | | brand-listening | Social listening and brand reputation research — collects what people are saying about a brand across Reddit, X, Instagram, TikTok, YouTube, news, and review sites, then classifies sentiment, clusters themes, and delivers a cited digest with recommendations. Triggers on "what are people saying about us", "monitor mentions", "brand sentiment" | | price-comparison | Shopping price comparison using live retailer data — resolves a product (name, ASIN, or URL) across Amazon, Walmart, eBay, Best Buy, and Google Shopping, normalizes prices/availability/condition into one ranked table, and names the cheapest in-stock buy. Region-aware (currency, stock, local retailers). Triggers on "compare prices", "find the cheapest", "price check", "how much is X on Amazon vs Walmart" | | seo-audit | Comprehensive SEO audit using live web data — sitemap-stratified sampling, JS-injected schema detection (Yoast/RankMath/AIOSEO), hreflang validation, signal-driven SERP ranking checks, HTML-level Core Web Vitals proxies. Auto-routes between single-page and site-wide audits | | design-mirror | Replicates design system patterns, tokens, and components to build consistent, high-quality UIs | | brd-browser-debug | Debug Bright Data Scraping Browser sessions — smart triage of failures, per-session bandwidth tracking, captcha reporting, and pattern detection using the Browser Sessions API |

---

Agent Onboarding Skill

The agent-onboarding skill is the entry point for any coding agent that's new to Bright Data. It runs the install once, walks the human through OAuth, verifies the environment, and then routes the agent to the right narrower skill instead of dumping everything into one place.

How it works

One install command sets up the CLI, agent skills, and authentication:

# macOS / Linux
curl -fsSL https://cli.brightdata.com/install.sh | bash

# Cross-platform
npm install -g @brightdata/cli

# Then authenticate once (browser OAuth)
bdata login

# Headless / SSH
bdata login --device

bdata login saves the API key locally, auto-creates the cli_unlocker and cli_browser zones, and sets defaults — no token plumbing required.

Path routing

After install, the skill routes the agent based on the task:

| Situation | Path | Hands off to | |---|---|---| | Need web data during this session | A — live CLI tools | brightdata-cli, search, scrape, data-feeds, competitive-intel, seo-audit | | Need to add Bright Data to app code | B — SDK / REST integration | python-sdk-best-practices, bright-data-best-practices, scraper-builder | | Want a drop-in tool layer for an LLM agent | M — MCP server | bright-data-mcp | | Need an API key first | C — auth only | dashboard or bdata login | | Don't want to install anything | D — REST API directly | direct calls to api.brightdata.com |

Why this skill matters

Without onboarding, agents either skip auth and crash on the first request, or pick the wrong path (calling REST when the CLI would do, or scraping a site that has a pre-built dataset). This skill makes the first 60 seconds deterministic: install, log in, verify, then hand off.

Reference files

---

Quick Start

1. Get Your Credentials

1. Sign up at brightdata.com if you haven't already 2. Go to the Bright Data Dashboard 3. Create a Web Unlocker zone: Click "Add" → Select "Unlocker zone" 4. Copy your API Key from the dashboard

2. Set Environment Variables

export BRIGHTDATA_API_KEY="your-api-key"
export BRIGHTDATA_UNLOCKER_ZONE="your-zone-name"

3. Start Using

# Search Google
bash skills/search/scripts/search.sh "artificial intelligence trends"

# Scrape a webpage
bash skills/scrape/scripts/scrape.sh "https://example.com/article"

# Get LinkedIn profile data
bash skills/data-feeds/scripts/datasets.sh linkedin_person_profile "https://linkedin.com/in/satyanadella"

---

Bright Data MCP Skill

The bright-data-mcp skill teaches Claude how to optimally use Bright Data's MCP server — selecting the right tool, handling errors, and following best practices across 60+ tools.

MCP Server Connection

Connect the Bright Data MCP server using the remote URL:

https://mcp.brightdata.com/mcp?token=YOUR_BRIGHTDATA_API_TOKEN

Optional parameters:

| Parameter | Description | Example | |-----------|-------------|---------| | pro=1 | Enable all 60+ Pro tools | ...&pro=1 | | groups=<name> | Enable specific tool groups | ...&groups=social,ecommerce | | tools=<names> | Enable specific tools only | ...&tools=search_engine,scrape_as_markdown |

Available Tool Groups

| Group | Tools | Platforms | |-------|-------|-----------| | Rapid (Free) | search_engine, scrape_as_markdown | Google/Bing/Yandex search, any webpage | | ecommerce | 11 tools | Amazon, Walmart, eBay, Best Buy, Etsy, Home Depot, Zara, Google Shopping | | social | 22 tools | LinkedIn, Instagram, Facebook, TikTok, YouTube, X, Reddit | | business | 4 tools | Crunchbase, ZoomInfo, Google Maps, Zillow | | finance | 1 tool | Yahoo Finance | | research | 2 tools | Reuters, GitHub | | app_stores | 2 tools | Google Play, Apple App Store | | travel | 1 tool | Booking.com | | browser | 13 tools | Full browser automation (navigate, click, type, screenshot) | | advanced_scraping | 5 tools | HTML scraping, AI extraction, batch operations |

Claude Code Setup

Add to your MCP settings:

{
  "mcpServers": {
    "brightdata": {
      "url": "https://mcp.brightdata.com/mcp?token=YOUR_TOKEN&pro=1"
    }
  }
}

Why This Skill Matters

The bright-data-mcp skill ensures Bright Data MCP is always the default for all web data tasks — replacing built-in tools like WebFetch and WebSearch with superior alternatives that handle bot detection, CAPTCHAs, and JavaScript rendering automatically.

With this skill:

  • Default web tool — Claude always uses Bright Data MCP for any web data request, no exceptions
  • Replaces built-in toolssearch_engine replaces WebSearch, scrape_as_markdown replaces WebFetch
  • Automatic tool selection — the most specific tool is chosen based on the task
  • Structured data preferredweb_data_* tools used over raw scraping when available
  • Error handling — built-in fallback strategies and URL validation guidance
  • Workflow orchestration — multi-step workflows for research, competitive analysis, social monitoring, and lead generation

---

Competitive Intel Skill

The competitive-intel skill turns Bright Data's scraping infrastructure into real-time competitive intelligence — replacing $15K–$50K/year enterprise CI tools (Crayon, Klue, AlphaSense) with an AI-native alternative that costs pennies per analysis.

How it works

The skill combines live web data (via bdata CLI) with strategic analysis frameworks to deliver actionable competitive insights — not stale training knowledge.

Say "analyze my competitor" or "compare pricing" and Claude will: 1. Gather live data using bdata search, bdata scrape, and bdata pipelines 2. Apply the appropriate analytical framework (SWOT, Porter's Five Forces, positioning matrix, etc.) 3. Deliver a structured report with source citations and strategic recommendations

Analysis modules

| Module | What it does | Data sources | |--------|-------------|-------------| | Competitor Snapshot | Quick profile: positioning, pricing, team, funding, strengths/weaknesses | bdata scrape (website), bdata pipelines (Crunchbase, LinkedIn) | | Pricing Intelligence | Compare pricing across competitors with positioning analysis | bdata scrape (pricing pages), bdata search (third-party reviews) | | Review Intelligence | Mine customer reviews for pain points, loved features, and gaps | bdata scrape (G2, Capterra), bdata pipelines (app stores, Google Maps) | | Hiring Signal Analysis | Infer strategic direction from job postings | bdata pipelines (LinkedIn jobs), bdata scrape (careers pages) | | Content & SEO Battle | Compare content strategy and search positioning | bdata search (SERP rankings), bdata scrape (blogs) | | Market Landscape Map | Discover and categorize all players in a market | bdata search (discovery), bdata scrape (homepages), bdata pipelines (Crunchbase) |

Example usage

> Analyze Notion as a competitor. I'm building a PM tool for small teams.

# Claude will:
# 1. Search Google for Notion
# 2. Scrape notion.com, /pricing, /about
# 3. Pull Crunchbase and LinkedIn data
# 4. Deliver a structured Competitor Snapshot with
#    pricing table, messaging analysis, strengths,
#    vulnerabilities, and strategic recommendations
> Compare pricing between Slack, Teams, and Discord

# Claude will:
# 1. Scrape all three pricing pages
# 2. Normalize into comparison matrix
# 3. Identify pricing models and positioning opportunities
# 4. Deliver a Pricing Intelligence Report

Reference files

---

Scraper Builder Skill

The scraper-builder skill guides you through building production-ready web scrapers for any website. Say "build a scraper for [site]" and Claude will walk you through the entire process.

How it works

The skill follows a 6-phase workflow:

1. Understand the target — What site, what data, what scope? 2. Check for pre-built scrapers — Searches 100+ supported domains via the Dataset List API before writing any custom code 3. Site reconnaissance — Fetches HTML via Web Unlocker, analyzes rendering (SSR vs CSR), discovers hidden APIs and JSON-LD data 4. Build the extractor — Picks the right approach: Web Unlocker + parsing, direct API endpoint, or Browser API with Playwright 5. Handle pagination — URL params, next-links, cursor tokens, infinite scroll, load-more buttons, sitemap crawling 6. Assemble complete scraper — Runnable script with config, fetcher, parser, paginator, output, and error handling

API selection logic

| Scenario | API Used | |----------|----------| | Site has pre-built scraper (Amazon, LinkedIn, etc.) | Web Scraper API | | Static HTML, no interaction needed | Web Unlocker | | Site exposes JSON API internally | Web Unlocker → API endpoint | | JS-rendered (React, Vue, Angular) | Browser API | | Infinite scroll / click required | Browser API | | Search engine results | SERP API |

Example usage

> build a scraper for Amazon product pages, I have 50 ASINs

# Claude will:
# 1. Find Amazon's pre-built scraper
# 2. Use async trigger/poll/fetch for the batch
# 3. Output a complete Python script with structured JSON
> scrape all job listings from jobs.customsite.com with pagination

# Claude will:
# 1. Check for pre-built scraper (not found)
# 2. Fetch HTML to analyze structure
# 3. Pick Web Unlocker + BeautifulSoup
# 4. Handle pagination automatically
# 5. Output a complete working scraper

Reference files

---

Scraper Studio Skill

The scraper-studio skill teaches Claude how to use Bright Data's AI-generated scrapers from the terminal — turn a URL + plain-English description into a reusable scraper, then run it against any URL.

Two commands, one workflow

# 1. Build a scraper from a natural-language description
bdata scraper create https://example.com/product/1 \
    "Extract title, price, currency, image URL, and availability"

# → prints a collector_id like c_mp3tuab31lswoxvpws

# 2. Run that scraper on any URL
bdata scraper run c_mp3tuab31lswoxvpws https://example.com/product/2 \
    --pretty -o product.json

The bridge between the two is collector_id — printed by create, consumed by run. The skill ensures Claude always captures and surfaces it (it's recoverable from every failure path, even partial generations).

What the skill covers

| Behavior | How it's handled | |---|---| | AI generation flow | create chains POST /dca/collectorPOST .../automate_template → poll .../automate_template/progress for status: "done". Typical generation: 5–10 min. | | Default run mode | Async + poll: POST /dca/trigger_immediate → poll /dca/get_result. Safe for any duration. | | --sync fast path | POST /dca/crawl for pages expected to finish in < 50 s. On 202 timeout, returns a response_id — re-run without --sync to pick it up. | | --sync-timeout | Bounded to [25, 50] seconds. Skill explicitly documents the range. | | Auto-fallback to batch | When a URL expands past the realtime page limit (paginated listings, infinite scroll), the CLI silently falls back to /dca/trigger + /dca/dataset polling. No flag required. | | Failure recovery | Both collector_id and response_id are always printed on error — the skill teaches Claude to surface them and resume via the web UI (https://brightdata.com/cp/scrapers/{collector_id}) instead of restarting. | | Routing rule | For known platforms (Amazon, LinkedIn, TikTok, etc.) the skill redirects to data-feeds — pre-built scrapers are faster, cheaper, and more reliable than building a new one. |

Description quality matters

The natural-language description is the only signal the AI Flow has. A strong description names every field, disambiguates location (e.g. "the price near the title, not the recommendations sidebar"), handles missing-field cases, and pins types. The skill's prompts.md reference has worked strong-vs-weak examples for product pages, listings, and articles.

Reference files

---

Proxy Skill

The brightdata-proxy skill generates working code and configuration for Bright Data's four proxy networks. It covers usage and integration — choosing the right network, picking an IP pool type, composing the proxy username, and wiring the SSL certificate — but it does not walk through control-panel UI setup (the in-app proxy helper is the right surface for that).

The four networks

| Network | When to use | |---------|-------------| | Datacenter | Cheapest and fastest. Unprotected targets, public APIs, light scraping, geo-bypass on permissive sites | | ISP | Static IPs that look residential. Long-lived sessions, account management, when IP reputation matters but you want speed | | Residential | Real-user IPs from a 100M+ pool. Default when DC/ISP get blocked; best success rate on hard targets. Requires SSL cert or KYC | | Mobile | 3G/4G/5G carrier IPs. Most expensive — reserve for extreme targets (Instagram, TikTok, some banking) |

For Datacenter and ISP, the skill also helps choose an IP pool type — shared pool (pay-per-GB rotating), shared IPs (a leased set shared with others), or dedicated IPs (a set leased exclusively to you).

Canonical proxy facts

These are easy to get wrong, so the skill anchors on them:

host:        brd.superproxy.io
HTTP(S) port: 33335           # paired with the new CA (expires Sept 2034); legacy 22225 is deprecated
SOCKS5 port:  22228
username:    brd-customer-CUSTOMER_ID-zone-ZONE_NAME[-param-value...]
test URL:    https://geo.brdtest.com/mygeo.json   # returns exit IP, country, city, ASN

The username carries all per-request config — country/state/city targeting, sticky sessions (-session-), dedicated-IP pinning (-ip-), DNS resolution, and more — appended with hyphens, not as query-string params. State, city, zip, ASN, OS, and carrier targeting are Residential/Mobile only; Datacenter and ISP honor -country- only.

Residential & Mobile SSL access

Residential and Mobile enforce a network-access policy that DC and ISP don't. The skill defaults generated code to loading the bundled CA (assets/brightdata_proxy_ca.crt) directly in code — no system install, no separate download — and mentions KYC verification and verify=False as alternatives.

Bundled tools

  • assets/brightdata_proxy_ca.crt — the Bright Data Proxy Root CA (port 33335, valid until Sept 2034). Generated code references this path instead of telling the user to download it.
  • scripts/smoke_test.sh — cURL credential smoke test. Hits the geo endpoint through a zone and returns parsed JSON, with distinct exit codes for auth vs HTTP vs network failure.

Reference files

---

Brightdata CLI Skill

The brightdata-cli skill teaches Claude how to use the Bright Data CLI (brightdata or bdata) — a terminal tool for scraping, searching, structured data extraction, zone management, and budget monitoring.

Quick start

# One-time login (auto-creates zones, saves API key)
brightdata login

# Scrape any URL as markdown
brightdata scrape https://example.com

# Search Google with structured results
brightdata search "web scraping best practices"

# Extract LinkedIn profile data
brightdata pipelines linkedin_person_profile "https://linkedin.com/in/username"

# Check account balance
brightdata budget

Key commands

| Command | Purpose | |---------|---------| | brightdata scrape <url> | Scrape any URL as markdown, HTML, JSON, or screenshot | | brightdata search "<query>" | Search Google/Bing/Yandex with structured results | | brightdata pipelines <type> [params] | Extract structured data from 40+ platforms | | brightdata status <job-id> | Check async job status | | brightdata zones | List proxy zones | | brightdata budget | View account balance and costs | | brightdata skill add | Install AI agent skills | | brightdata config | View/set configuration |

Pipe-friendly

# Search → extract first URL → scrape it
brightdata search "top open source projects" --json \
  | jq -r '.organic[0].link' \
  | xargs brightdata scrape

# Amazon product data to CSV
brightdata pipelines amazon_product "https://amazon.com/dp/xxx" --format csv > product.csv

Reference files

---

Best Practices Skill

The bright-data-best-practices skill is a reference knowledge base that Claude consults automatically when writing or reviewing Bright Data code. It is not a slash command — it works silently in the background.

What it covers

| API | Key topics | |-----|-----------| | Web Unlocker | All request parameters, special headers (x-unblock-expect, x-unblock-data-format), async flow, billing model, anti-patterns | | SERP API | All Google params (Search, Maps, Trends, Reviews, Lens, Hotels, Flights), Bing params, brd_json=1 parsed output, async with x-response-id | | Web Scraper API | Sync /scrape vs async /trigger, progress polling, status values, snapshot lifecycle, output formats | | Browser API | Connection strings by framework, session rules, all custom CDP functions (Captcha.solve, Proxy.setLocation, Emulation.setDevice, etc.), bandwidth optimization |

How it works

The skill has user-invocable: false — it never appears in the / command menu. Instead, Claude loads its reference material when you ask it to write Bright Data integrations, ensuring generated code uses correct endpoints, real parameter names, and proper patterns.

Reference files

---

Python SDK Best Practices Skill

The python-sdk-best-practices skill is a comprehensive guide for writing correct code with the brightdata-sdk Python package. Claude consults this automatically when writing, modifying, or reviewing Python code that uses the Bright Data SDK.

What it covers

| Topic | Details | |-------|---------| | Client setup | Async (BrightDataClient) and sync (SyncBrightDataClient) clients, context managers, authentication | | Platform scrapers | Amazon, LinkedIn, Instagram, Facebook, YouTube, ChatGPT, TikTok, Reddit — URL-based and keyword search | | SERP API | Google, Bing, Yandex search with location, language, device, and async mode | | Datasets API | 175+ pre-collected datasets with filter, download, sample, and export | | Scraper Studio | Custom scraper execution with trigger/poll/fetch lifecycle | | Browser API | CDP WebSocket URLs for Playwright/Puppeteer automation | | Error handling | Full exception hierarchy with APIError.status_code, APIError.response_text | | Batch operations | Concurrent scraping with asyncio.gather() | | Common mistakes | Forgetting context managers, sync in async, missing await, hardcoded tokens |

Reference files

---

JavaScript SDK Best Practices Skill

The js-sdk-best-practices skill is a comprehensive guide for writing correct code with the @brightdata/sdk JavaScript/TypeScript package. Claude consults this automatically when writing, modifying, or reviewing Node.js/TypeScript code that uses the Bright Data SDK.

What it covers

| Topic | Details | |-------|---------| | Client setup | bdclient (single async client, Node ≥20, ESM+CJS, TS types), BRIGHTDATA_API_TOKEN, await using / close() lifecycle | | Web Unlocker | client.scrapeUrl(url \| url[], { format, dataFormat, country }) — html/markdown/json/screenshot | | Platform scrapers | 11 platforms (Amazon, LinkedIn, Instagram, Facebook, TikTok, YouTube, Reddit, Pinterest, ChatGPT, Perplexity, DigiKey) — collect / orchestrated / discover method triad | | SERP API | client.search.google/bing/yandex (SERP-only — no platform search router in JS) | | Discover | client.discover(query, { intent }) and discoverTrigger for AI-ranked entity/page discovery | | Datasets API | query → getStatus → download snapshot lifecycle, camelCase dataset names | | Browser API | client.browser.getConnectUrl() CDP URLs for Playwright/Puppeteer/Selenium | | Scraper Studio | client.scraperStudio.run/trigger/status for custom collectors | | Error handling | BRDError hierarchy (ValidationError, AuthenticationError, …) | | Batch operations | array inputs and the *Trigger + job.wait() pattern (not Promise.all of blocking calls) | | Key differences from Python | bdclient vs BrightDataClient, camelCase, SERP-only search, no platform search router |

Reference files

---

Discover API Skill

The discover-api skill is the foundation for intent-ranked semantic web search with Bright Data's Discover API. It's the building block the live-research and rag-pipeline skills sit on top of.

What it covers

| Topic | Details | |-------|---------| | Trigger → poll | Async flow: POST /discovertask_idGET ?task_id= until status: "done" | | Surfaces | REST (curl), CLI (bdata discover), JS SDK & Python SDK (client.discover) — with a per-surface parameter matrix | | Parameters | query, intent, num_results (1–20), filter_keywords, include_content, country/city/language, date bounds, format | | Modes | standard (default), deep (exhaustive), fast (low latency), zeroRanking (max raw volume) — and which are REST-only | | Result shape | { link, title, description, relevance_score, content? }; relevance-ranked | | Verification | poll-before-read, block-page filtering, relevance sanity, error codes (401/403/404/429) |

Reference files

---

Live Research Skill

The live-research skill turns one question into a cited, synthesized brief by fanning out intent-ranked Discover queries, reading the best sources, and writing findings with inline citations.

What it covers

| Topic | Details | |-------|---------| | Decomposition | Break a topic into 4–8 angles, each its own Discover call with a tailored intent (coverage via breadth, since num_results caps at 20) | | Retrieval | Parallel discover --include-content; deep mode via REST for hard topics | | Merge & rank | Dedup by normalized URL, sort by relevance_score, drop block-page content | | Synthesis | Structured brief with inline [n] citations, contradictions surfaced, gaps named | | Quality bar | Cite everything, prefer primary sources, never answer from training data |

Reference files

---

RAG Pipeline Skill

The rag-pipeline skill builds a retrieval-augmented-generation pipeline or a custom search engine using Discover as the retrieval layer for an LLM.

What it covers

| Topic | Details | |-------|---------| | Two architectures | Live retrieval (Discover at query time, always fresh) vs. ingestion (Discover → chunk → embed → vector store) | | Live retrieval | Over-fetch by relevance_score, filter block pages, assemble a cited prompt | | Ingestion | Dedup → chunk (overlap, paragraph-aware) → embed → upsert with source provenance | | Bulk corpus | REST zeroRanking for max volume, then fetch content separately | | Design rules | Store provenance, validate content before embedding, re-ingest for freshness, keep embedder/vector store pluggable |

Reference files

---

Setup

Prerequisites

| Dependency | Version | Description | |------------|---------|-------------| | curl | ≥ 7.0 | HTTP client for API requests | | jq | ≥ 1.5 | JSON processor for parsing responses |

Install on Ubuntu/Debian:

sudo apt-get install curl jq

Install on macOS:

brew install curl jq

Environment Variables

| Variable | Required | Description | |----------|----------|-------------| | BRIGHTDATA_API_KEY | Yes | Your Bright Data API key from the dashboard | | BRIGHTDATA_UNLOCKER_ZONE | Yes | Name of your Web Unlocker zone (required for search/scrape) | | BRIGHTDATA_POLLING_TIMEOUT | No | Max seconds to wait for data-feeds (default: 600) |

---

Usage

Search Skill

Search Google and receive structured JSON results.

bash skills/search/scripts/search.sh "query" [page]

Parameters: | Parameter | Required | Default | Description | |-----------|----------|---------|-------------| | query | Yes | - | Search query string | | page | No | 0 | Page number (0-indexed) for pagination |

Output Format:

{
  "organic": [
    {
      "link": "https://example.com/result",
      "title": "Result Title",
      "description": "Brief description of the page..."
    }
  ]
}

Scrape Skill

Scrape any webpage and get clean markdown content.

bash skills/scrape/scripts/scrape.sh "url"

Features:

  • Automatic bot detection bypass
  • CAPTCHA solving
  • JavaScript rendering
  • Clean markdown output

Data Feeds Skill

Extract structured data from 40+ supported websites.

bash skills/data-feeds/scripts/datasets.sh <dataset_type> <url> [params...]

Run without arguments to see all available datasets:

bash skills/data-feeds/scripts/datasets.sh

---

Data Feeds

E-Commerce

| Dataset | Command | Description | |---------|---------|-------------| | Amazon Product | datasets.sh amazon_product <url> | Product details, pricing, ratings | | Amazon Reviews | datasets.sh amazon_product_reviews <url> | Customer reviews | | Amazon Search | datasets.sh amazon_product_search <keyword> <domain> | Search results | | Walmart Product | datasets.sh walmart_product <url> | Product details | | eBay Product | datasets.sh ebay_product <url> | Listing details | | Best Buy | datasets.sh bestbuy_products <url> | Product info | | Etsy | datasets.sh etsy_products <url> | Listing data | | Home Depot | datasets.sh homedepot_products <url> | Product data | | Zara | datasets.sh zara_products <url> | Product details |

Professional Networks

| Dataset | Command | Description | |---------|---------|-------------| | LinkedIn Person | datasets.sh linkedin_person_profile <url> | Profile, experience, skills | | LinkedIn Company | datasets.sh linkedin_company_profile <url> | Company page data | | LinkedIn Jobs | datasets.sh linkedin_job_listings <url> | Job posting details | | LinkedIn Posts | datasets.sh linkedin_posts <url> | Post content | | Crunchbase | datasets.sh crunchbase_company <url> | Funding, employees | | ZoomInfo | datasets.sh zoominfo_company_profile <url> | Company profile |

Social Media

| Dataset | Command | Description | |---------|---------|-------------| | Instagram Profiles | datasets.sh instagram_profiles <url> | Bio, followers | | Instagram Posts | datasets.sh instagram_posts <url> | Post details | | Instagram Reels | datasets.sh instagram_reels <url> | Reel metrics | | TikTok Profiles | datasets.sh tiktok_profiles <url> | Creator data | | TikTok Posts | datasets.sh tiktok_posts <url> | Video details | | TikTok Shop | datasets.sh tiktok_shop <url> | Product data | | Facebook Posts | datasets.sh facebook_posts <url> | Post content | | Facebook Marketplace | datasets.sh facebook_marketplace_listings <url> | Listings | | X (Twitter) | datasets.sh x_posts <url> | Tweet data | | YouTube Profiles | datasets.sh youtube_profiles <url> | Channel data | | YouTube Videos | datasets.sh youtube_videos <url> | Video details | | YouTube Comments | datasets.sh youtube_comments <url> [num] | Comments | | Reddit Posts | datasets.sh reddit_posts <url> | Post data |

Other

| Dataset | Command | Description | |---------|---------|-------------| | Google Maps Reviews | datasets.sh google_maps_reviews <url> [days] | Business reviews | | Google Shopping | datasets.sh google_shopping <url> | Product comparison | | Google Play Store | datasets.sh google_play_store <url> | App details | | Apple App Store | datasets.sh apple_app_store <url> | iOS app data | | Yahoo Finance | datasets.sh yahoo_finance_business <url> | Stock data | | Zillow | datasets.sh zillow_properties_listing <url> | Property listings | | Booking.com | datasets.sh booking_hotel_listings <url> | Hotel data | | Reuters News | datasets.sh reuter_news <url> | Article content | | GitHub | datasets.sh github_repository_file <url> | Repository file |

---

Examples

Research Task

# Search for recent news
bash skills/search/scripts/search.sh "climate change 2024"

# Get full article content
bash skills/scrape/scripts/scrape.sh "https://example.com/climate-article"

Competitive Analysis

# With the competitive-intel skill, just ask Claude:
# "Analyze Notion as a competitor" or "Compare pricing between Slack and Teams"
# Claude will automatically run the right bdata commands:

# Search for competitor info
bdata search "Notion project management" --json

# Scrape competitor pricing page
bdata scrape https://www.notion.com/pricing

# Get company data from Crunchbase
bdata pipelines crunchbase_company "https://www.crunchbase.com/organization/notion-so"

# Get LinkedIn company profile
bdata pipelines linkedin_company_profile "https://www.linkedin.com/company/notionhq"

Social Media Monitoring

# Get Instagram profile
bash skills/data-feeds/scripts/datasets.sh instagram_profiles "https://instagram.com/natgeo"

# Get YouTube video stats
bash skills/data-feeds/scripts/datasets.sh youtube_videos "https://youtube.com/watch?v=dQw4w9WgXcQ"

Lead Generation

# Get LinkedIn profile
bash skills/data-feeds/scripts/datasets.sh linkedin_person_profile "https://linkedin.com/in/satyanadella"

# Get company funding data
bash skills/data-feeds/scripts/datasets.sh crunchbase_company "https://crunchbase.com/organization/openai"

---

Project Structure

brightdata-plugin/
├── .claude-plugin/
│   ├── plugin.json              # Plugin configuration
│   └── marketplace.json         # Marketplace metadata
├── skills/
│   ├── agent-onboarding/
│   │   └── SKILL.md             # Onboarding entry point — install, auth, path routing
│   ├── search/
│   │   ├── SKILL.md             # Search skill
│   │   └── scripts/
│   │       └── search.sh        # Google search implementation
│   ├── scrape/
│   │   ├── SKILL.md             # Scrape skill
│   │   └── scripts/
│   │       └── scrape.sh        # Web scraper implementation
│   ├── data-feeds/
│   │   ├── SKILL.md             # Data feeds skill
│   │   └── scripts/
│   │       ├── datasets.sh      # Dataset wrapper (40+ sources)
│   │       └── fetch.sh         # Core polling logic
│   ├── bright-data-mcp/
│   │   ├── SKILL.md             # MCP workflow guide
│   │   └── references/
│   │       ├── mcp-tools.md     # Complete MCP tool reference (60+ tools)
│   │       └── mcp-setup.md     # MCP server setup guide
│   ├── bright-data-best-practices/
│   │   ├── SKILL.md             # API best practices (user-invocable: false)
│   │   └── references/
│   │       ├── web-unlocker.md  # Web Unlocker API deep reference
│   │       ├── serp-api.md      # SERP API deep reference
│   │       ├── web-scraper-api.md  # Web Scraper API deep reference
│   │       └── browser-api.md   # Browser API deep reference
│   ├── python-sdk-best-practices/
│   │   ├── SKILL.md             # Python SDK patterns and best practices
│   │   └── references/
│   │       └── api-reference.md # Full API surface, payloads, constants
│   ├── js-sdk-best-practices/
│   │   ├── SKILL.md             # JavaScript/TypeScript SDK (@brightdata/sdk) patterns
│   │   └── references/
│   │       ├── scrapers.md      # Web Unlocker + per-platform method tables
│   │       ├── search.md        # SERP engines and Discover API
│   │       ├── datasets-overview.md  # Dataset names and snapshot lifecycle
│   │       └── advanced.md      # Config, batch/trigger, Browser API, Scraper Studio, errors
│   ├── discover-api/
│   │   ├── SKILL.md             # Discover API — surfaces, modes, trigger/poll
│   │   └── references/
│   │       └── api-reference.md # REST spec, per-surface param matrix, errors, limits
│   ├── live-research/
│   │   ├── SKILL.md             # Multi-query Discover → dedup → cited brief
│   │   └── references/
│   │       └── brief-template.md  # Output structure and citation rules
│   ├── rag-pipeline/
│   │   ├── SKILL.md             # RAG architectures on Discover (live + ingestion)
│   │   └── references/
│   │       └── code.md          # Runnable JS + Python pipelines
│   ├── proxy/
│   │   ├── SKILL.md             # Proxy networks, pool types, targeting, SSL, code patterns
│   │   ├── references/
│   │   │   ├── code-templates.md   # Framework integrations (httpx, Scrapy, Playwright, ...)
│   │   │   └── troubleshooting.md  # Symptom-based error catalog and escalation order
│   │   ├── scripts/
│   │   │   └── smoke_test.sh    # cURL credential smoke test
│   │   └── assets/
│   │       └── brightdata_proxy_ca.crt  # Bright Data Proxy Root CA (port 33335)
│   ├── scraper-builder/
│   │   ├── SKILL.md             # Scraper builder workflow and decision tree
│   │   └── references/
│   │       ├── supported-domains.md   # Pre-built scraper lookup + API
│   │       ├── site-analysis-guide.md # HTML analysis playbook
│   │       └── pagination-patterns.md # 7 pagination strategies
│   ├── scraper-studio/
│   │   ├── SKILL.md             # bdata scraper create / run workflow
│   │   └── references/
│   │       ├── prompts.md       # Writing descriptions for AI generation
│   │       ├── recipes.md       # End-to-end shell recipes
│   │       └── api-flow.md      # Underlying REST endpoints and artefacts
│   ├── brightdata-cli/
│   │   ├── SKILL.md             # CLI usage guide and command patterns
│   │   └── references/
│   │       ├── commands.md      # Full command reference with all flags
│   │       └── pipelines.md     # 40+ pipeline types and parameters
│   ├── competitive-intel/
│   │   ├── SKILL.md             # Competitive intelligence (6 analysis modules)
│   │   └── references/
│   │       ├── data-source-guide.md    # CI needs → bdata command mapping
│   │       ├── output-templates.md     # 8 report templates
│   │       ├── analysis-frameworks.md  # SWOT, Porter's, positioning, JTBD, Blue Ocean
│   │       └── industry-signals.md     # Signal interpretation guide
│   └── design-mirror/
│       └── SKILL.md             # Design system mirroring skill
├── sdk-python/                  # Bright Data Python SDK source
├── README.md
└── LICENSE

---

API Reference

Search & Scrape (Web Unlocker API)

POST https://api.brightdata.com/request
Authorization: Bearer <BRIGHTDATA_API_KEY>
Content-Type: application/json

{
  "url": "<target_url>",
  "zone": "<BRIGHTDATA_UNLOCKER_ZONE>",
  "format": "raw",
  "data_format": "markdown" | "parsed_light"
}

Data Feeds (Web Data API)

POST https://api.brightdata.com/datasets/v3/trigger?dataset_id=<id>
Authorization: Bearer <BRIGHTDATA_API_KEY>
Content-Type: application/json

[{"url": "<target_url>"}]

Then poll for results:

GET https://api.brightdata.com/datasets/v3/snapshot/<snapshot_id>?format=json

---

Troubleshooting

"BRIGHTDATA_API_KEY is not set"

export BRIGHTDATA_API_KEY="your-api-key"

"BRIGHTDATA_UNLOCKER_ZONE is not set"

Create a Web Unlocker zone in your dashboard:

export BRIGHTDATA_UNLOCKER_ZONE="your-zone-name"

Data feeds timing out

Increase the polling timeout:

export BRIGHTDATA_POLLING_TIMEOUT=900

Empty or malformed results

  • Verify your API key is valid
  • Check that your zone is active
  • Ensure jq is installed

---

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

1. Fork the repository 2. Create your feature branch (git checkout -b feature/amazing-feature) 3. Commit your changes (git commit -m 'Add amazing feature') 4. Push to the branch (git push origin feature/amazing-feature) 5. Open a Pull Request

---

License

This project is licensed under the MIT License - see the LICENSE file for details.

---

Links

---

<p align="center"> <sub>Built with the power of <a href="https://brightdata.com">Bright Data</a></sub> </p>

Related plugins

Browse all →