Links To Pdfs

chrisling-dev/links-to-pdfs

Run it on Hostinger →up to 70% off + an extra 10% with code ZACAARON10 Free API →

Installation

clawhub install chrisling-dev/links-to-pdfs

Summary

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

SKILL.md

docs-scraper

CLI tool that scrapes documents from various sources into local PDF files using browser automation.

Installation

bash

npm install -g docs-scraper

Quick start

Scrape any document URL to PDF:

bash

docs-scraper scrape https://example.com/document

Returns local path: ~/.docs-scraper/output/1706123456-abc123.pdf

Basic scraping

Scrape with daemon (recommended, keeps browser warm):

bash

docs-scraper scrape <url>

Scrape with named profile (for authenticated sites):

bash

docs-scraper scrape <url> -p <profile-name>

Scrape with pre-filled data (e.g., email for DocSend):

bash

docs-scraper scrape <url> -D email=user@example.com

Direct mode (single-shot, no daemon):

bash

docs-scraper scrape <url> --no-daemon

Authentication workflow

When a document requires authentication (login, email verification, passcode):

Initial scrape returns a job ID:

bash

docs-scraper scrape https://docsend.com/view/xxx
# Output: Scrape blocked
#         Job ID: abc123

Retry with data:

bash

docs-scraper update abc123 -D email=user@example.com
# or with password
docs-scraper update abc123 -D email=user@example.com -D password=1234

Profile management

Profiles store session cookies for authenticated sites.

bash

docs-scraper profiles list     # List saved profiles
docs-scraper profiles clear    # Clear all profiles
docs-scraper scrape <url> -p myprofile  # Use a profile

Daemon management

The daemon keeps browser instances warm for faster scraping.

bash

docs-scraper daemon status     # Check status
docs-scraper daemon start      # Start manually
docs-scraper daemon stop       # Stop daemon

Note: Daemon auto-starts when running scrape commands.

Cleanup

PDFs are stored in ~/.docs-scraper/output/. The daemon automatically cleans up files older than 1 hour.

Manual cleanup:

bash

docs-scraper cleanup                    # Delete all PDFs
docs-scraper cleanup --older-than 1h    # Delete PDFs older than 1 hour

Job management

bash

docs-scraper jobs list         # List blocked jobs awaiting auth

Supported sources

Direct PDF links - Downloads PDF directly
Notion pages - Exports Notion page to PDF
DocSend documents - Handles DocSend viewer
LLM fallback - Uses Claude API for any other webpage

Scraper Reference

Each scraper accepts specific -D data fields. Use the appropriate fields based on the URL type.

DirectPdfScraper

Handles: URLs ending in .pdf

Data fields: None (downloads directly)

Example:

bash

docs-scraper scrape https://example.com/document.pdf

DocsendScraper

Handles: docsend.com/view/*, docsend.com/v/*, and subdomains (e.g., org-a.docsend.com)

URL patterns:

Documents: https://docsend.com/view/{id} or https://docsend.com/v/{id}
Folders: https://docsend.com/view/s/{id}
Subdomains: https://{subdomain}.docsend.com/view/{id}

Data fields:

Field	Type	Description
`email`	email	Email address for document access
`password`	password	Passcode/password for protected documents
`name`	text	Your name (required for NDA-gated documents)

Examples:

bash

# Pre-fill email for DocSend
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com

# With password protection
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D password=secret123

# With NDA name requirement
docs-scraper scrape https://docsend.com/view/abc123 -D email=user@example.com -D name="John Doe"

# Retry blocked job
docs-scraper update abc123 -D email=user@example.com -D password=secret123

Notes:

DocSend may require any combination of email, password, and name
Folders are scraped as a table of contents PDF with document links
The scraper auto-checks NDA checkboxes when name is provided

NotionScraper

Handles: notion.so/*, *.notion.site/*

Data fields:

Field	Type	Description
`email`	email	Notion account email
`password`	password	Notion account password

Examples:

bash

# Public page (no auth needed)
docs-scraper scrape https://notion.so/Public-Page-abc123

# Private page with login
docs-scraper scrape https://notion.so/Private-Page-abc123 \
  -D email=user@example.com -D password=mypassword

# Custom domain
docs-scraper scrape https://docs.company.notion.site/Page-abc123

Notes:

Public Notion pages don't require authentication
Toggle blocks are automatically expanded before PDF generation
Uses session profiles to persist login across scrapes

LlmFallbackScraper

Handles: Any URL not matched by other scrapers (automatic fallback)

Data fields: Dynamic - determined by Claude analyzing the page

The LLM scraper uses Claude to analyze the page HTML and detect:

Login forms (extracts field names dynamically)
Cookie banners (auto-dismisses)
Expandable content (auto-expands)
CAPTCHAs (reports as blocked)
Paywalls (reports as blocked)

Common dynamic fields:

Field	Type	Description
`email`	email	Login email (if detected)
`password`	password	Login password (if detected)
`username`	text	Username (if login uses username)

Examples:

bash

# Generic webpage (no auth)
docs-scraper scrape https://example.com/article

# Webpage requiring login
docs-scraper scrape https://members.example.com/article \
  -D email=user@example.com -D password=secret

# When blocked, check the job for required fields
docs-scraper jobs list
# Then retry with the fields the scraper detected
docs-scraper update abc123 -D username=myuser -D password=secret

Notes:

Requires ANTHROPIC_API_KEY environment variable
Field names are extracted from the page's actual form fields
Limited to 2 login attempts before failing
CAPTCHAs require manual intervention

Data field summary

Scraper	email	password	name	Other
DirectPdf	-	-	-	-
DocSend	✓	✓	✓	-
Notion	✓	✓	-	-
LLM Fallback	✓*	✓*	-	Dynamic*

*Fields detected dynamically from page analysis

Environment setup (optional)

Only needed for LLM fallback scraper:

bash

export ANTHROPIC_API_KEY=your_key

Optional browser settings:

bash

export BROWSER_HEADLESS=true   # Set false for debugging

Common patterns

Archive a Notion page:

bash

docs-scraper scrape https://notion.so/My-Page-abc123

Download protected DocSend:

bash

docs-scraper scrape https://docsend.com/view/xxx
# If blocked:
docs-scraper update <job-id> -D email=user@example.com -D password=1234

Batch scraping with profiles:

bash

docs-scraper scrape https://site.com/doc1 -p mysite
docs-scraper scrape https://site.com/doc2 -p mysite

Output

Success: Local file path (e.g., ~/.docs-scraper/output/1706123456-abc123.pdf) Blocked: Job ID + required credential types

Troubleshooting

Timeout: docs-scraper daemon stop && docs-scraper daemon start
Auth fails: docs-scraper jobs list to check pending jobs
Disk full: docs-scraper cleanup to remove old PDFs

Score

0–100

50/ 100

Grade

Popularity15/30

1,759 installs — growing adoption.

Completeness18/30

Documented: description, one-line install, catalog metadata. Missing: SKILL.md body.

Trust6/25

Limited provenance information — review the source before installing.

Freshness11/15

Updated within the last 6 months.

Scored automatically from popularity, completeness, trust, and freshness — computed only from data in our catalog, never fabricated.

Proud of your score? Add this badge to your README.

Paste a snippet into your GitHub README. The badge updates automatically and links back to this page.

Score badge

Markdown

[![Links To Pdfs skill](https://www.remoteopenclaw.com/skills/chrisling-dev/links-to-pdfs/badges/score.svg)](https://www.remoteopenclaw.com/skills/chrisling-dev/links-to-pdfs)

HTML

<a href="https://www.remoteopenclaw.com/skills/chrisling-dev/links-to-pdfs"><img src="https://www.remoteopenclaw.com/skills/chrisling-dev/links-to-pdfs/badges/score.svg" alt="Links To Pdfs skill"/></a>