Europe PMC Database
A skill for searching, downloading, and exploring open-access papers from Europe PMC — a comprehensive, free life-science literature database with over 43 million abstracts and 9 million full-text articles.
Prerequisites
uv: Read theuvskill and follow its Setup instructions to ensure
uv is installed and on PATH.
- User Notification: If LICENSE_NOTIFICATION.txt does not already exist in
this skill directory then (1) prominently notify the user to check the terms at https://europepmc.org/ and to always check the license of the papers retrieved by the skill for any restrictions, then (2) create the file recording the notification text and timestamp.
Core Rules
- Open Access Only: This skill exclusively searches open-access content.
The script automatically appends OPEN_ACCESS:y to every search query. Do NOT remove or override this filter.
- NEVER run python3 or python3 -c directly: the system Python does not
necessarily have all key dependencies. Do not attempt to pip install or create new venvs.
- Use the Wrapper: ALWAYS use the provided script rather than calling the
API directly. The script handles rate limiting (1 req/s) and errors.
- Output Files: All subcommands require
--outputto write results to a
file. Read the output file separately to avoid context overflow.
- List Sources. If this skill is used, ensure this is mentioned in the
output AND list the URLs of all papers that were used in producing the output.
Utility Scripts
All commands are subcommands of scripts/europepmc_api.py. Rate limiting and retries are handled automatically.
1. Search (search)
Search Europe PMC by query. Supports DOI lookup, keyword search, author search, PMID lookup, and the full Europe PMC search syntax.
# Look up a paper by DOI
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
# Keyword search
uv run scripts/europepmc_api.py search "CRISPR cancer" --max_results 5 --output results.json
# Author search
uv run scripts/europepmc_api.py search "AUTH:Jumper J" --max_results 10 --output results.json
# PMID lookup
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
# Sorted by citations
uv run scripts/europepmc_api.py search "machine learning" \
--sort "CITED desc" --max_results 20 --output results.json
Arguments:
query(str, required) — search query using Europe PMC syntax--output(str, required) — output JSON file path--max_results(int, default 10) — maximum results per page (max 1000)--result_type(str, defaultcore) —core(full metadata) orlite--cursor(str, default*) — cursor mark for pagination; pass the
nextCursorMark value from a previous response to get the next page
--sort(str) — sort order, e.g.CITED desc,P_PDATE_D desc
(publication date descending), P_PDATE_D asc
Output: JSON file with three fields:
hitCount(int) — total number of matching articlesnextCursorMark(str) — cursor for next page; empty string if no more pagesresults(list) — array of article metadata objects
Search Syntax Quick Reference:
DOI:10.xxxx/yyyy— look up by DOIEXT_ID:12345678 AND SRC:MED— look up by PMIDAUTH:surname initials— author searchTITLE:keyword— search in title onlyJOURNAL:name— search by journalPUB_YEAR:2024or(FIRST_PDATE:[2023-01-01 TO 2023-12-31])— date filterHAS_FT:y— restrict to articles with full text in Europe PMC- Boolean operators:
AND,OR,NOT
Note:
OPEN_ACCESS:yis automatically appended to all queries. You do not need to add it manually.
2. Download PDF (download_pdf)
Download an open-access PDF from Europe PMC by PMCID.
uv run scripts/europepmc_api.py download_pdf PMC8371605 --output alphafold.pdf
Arguments:
pmcid(str, required) — PubMed Central ID (e.g.,PMC8371605)--output(str, required) — filepath to save the PDF
Output: Saves the PDF to the specified file. Exits with an error if the PMCID is not found or the response is not a valid PDF. Whenever you download a PDF, check the pdf downloaded is not empty or corrupted.
3. Get Full Text (get_fulltext)
Retrieve the full text of an open-access article and save to a file. Returns plain text (XML tags stripped) by default, or raw XML with --format xml.
# Get plain text (default)
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --output fulltext.txt
# Get raw XML
uv run scripts/europepmc_api.py get_fulltext PMC8371605 --format xml --output fulltext.xml
Arguments:
pmcid(str, required) — PubMed Central ID--output(str, required) — output file path--format(str, defaulttext) —text(plain text) orxml(raw JATS
XML)
Output: Full text written to the specified file. Exits with an error if the article is not in the Europe PMC open-access subset.
Important: Only articles in the PMC Open Access Subset have full text available. If retrieval fails, use
searchto check theisOpenAccessfield and fall back to the abstract.
4. Get Citations (get_citations)
Retrieve articles that cite a given paper.
# Get citations for the AlphaFold paper (PMID 34265844)
uv run scripts/europepmc_api.py get_citations MED 34265844 \
--page_size 25 --output citations.json
Arguments:
source(str, required) — source database:MED(PubMed),PMC,PPR
(preprints), PAT (patents)
article_id(str, required) — article ID in the source database--output(str, required) — output JSON file path--page(int, default 1) — page number--page_size(int, default 25) — results per page
Output: JSON file with hitCount and citations array.
5. Get References (get_references)
Retrieve the reference list (bibliography) of a given paper.
# Get references from the AlphaFold paper
uv run scripts/europepmc_api.py get_references MED 34265844 \
--page_size 100 --output references.json
Arguments:
source(str, required) — source database:MED,PMC,PPR,PATarticle_id(str, required) — article ID in the source database--output(str, required) — output JSON file path--page(int, default 1) — page number--page_size(int, default 25) — results per page
Output: JSON file with hitCount and references array.
Common Workflows
DOI to PDF
# Step 1: Search for the PMCID
uv run scripts/europepmc_api.py search "DOI:10.1038/s41586-021-03819-2" --output result.json
PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
# Step 2: Download the PDF
uv run scripts/europepmc_api.py download_pdf "$PMCID" --output paper.pdf
PMID to Full Text
# Step 1: Find the PMCID from a PMID
uv run scripts/europepmc_api.py search "EXT_ID:34265844 AND SRC:MED" --output result.json
PMCID=$(jq -r '.results[0].pmcid // empty' result.json)
# Step 2: Get the full text
uv run scripts/europepmc_api.py get_fulltext "$PMCID" --output fulltext.txt
Citation Graph Traversal
# Find what papers cite a landmark study, then check their references
uv run scripts/europepmc_api.py get_citations MED 34265844 --page_size 50 --output citing.json
# Parse a cited paper's PMID and explore its references
uv run scripts/europepmc_api.py get_references MED <CITING_PMID> --output refs.json
Search with Pagination
# First page
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --output page1.json
# Extract cursor for next page
CURSOR=$(jq -r '.nextCursorMark // empty' page1.json)
# Next page
uv run scripts/europepmc_api.py search "CRISPR" --max_results 100 --cursor "$CURSOR" --output page2.json
