Open Targets Database Skill
Overview
This skill provides access to the Open Targets Platform GraphQL API. It aggregates multi-modal evidence from genetics (GWAS/eQTL), pathways, animal models, and clinical trials to rank target-disease associations and identify druggable genes.
Prerequisites
uv: Read theuvskill and follow its Setup instructions to ensure
uv is installed and on PATH.
- User Notification: If LICENSE_NOTIFICATION.txt does not already exist in
this skill directory then (1) prominently notify the user to check the terms at https://platform-docs.opentargets.org/licence, then (2) create the file recording the notification text and timestamp.
Core Rules
- Use the Wrapper: ALWAYS execute the provided helper scripts to query the
database rather than accessing the database directly. The scripts automatically enforce fair use and implement retry logic.
- Output Flag: The
--outputflag is always required as output can be
very large. Use jq or write your own code to process this JSON file.
- Notification: If this skill is used, ensure this is mentioned in the
output.
Quick Reference
Always use the provided Python script scripts/query_opentargets.py to quickly query the database. It handles API communication, retries, formatting, and automatically truncates overly large responses. NEVER write your own curl or similar requests.
Usage:
uv run scripts/query_opentargets.py --output /tmp/opentargets_results.json [OPTIONS] COMMAND [ARGS]...
Common Options:
--output PATH: Required. Path to write the JSON output file.--limit N: Limit the number of items returned in arrays (default is 50).
Use a smaller number like 10 when doing preliminary exploration.
--page-size N: Set the API pagination size (default is 200). Increase if
you need more results (e.g., a study with many credible sets).
Available Commands:
get-gwas-studiesefo_id: Fetches all GWAS studies associated with
a specific disease ontology EFO ID (e.g. EFO_0000685).
get-study-credible-setsstudy_id: Fetches all credible sets for a
given study ID (e.g. FINNGEN_R12_RX_CROHN_2NDLINE). Returns confidence, finemapping method, variant, and p-value info.
get-qtl-credible-setsvariant_id: Retrieves QTL credible sets for
a specific variant ID (e.g. 19_44908822_C_T).
get-l2gvariant_id [--study-id ID]: Returns Locus-to-Gene (L2G)
predictions/scores for a locus to identify the most likely causal gene. Only variant_id is required; use --study-id to filter to a specific study. Accepts chr prefix (e.g. chr1_113834946_A_G).
get-target-druggabilityensembl_id: Provides tractability data
(small molecule, antibody, etc.) and clinical trial safety info for a gene/target.
get-associated-targetsefo_id: Find all target genes associated
with a specific disease EFO ID.
get-associated-diseasesensembl_id: Find all diseases associated
with a specific target Ensembl ID.
search-diseasequery_string: Search for a disease by name to find
its EFO ID and other metadata.
get-credible-sets-near-targetensembl_id [--window N]: Fetches
credible sets for a target and filters them to those within a genomic window around the target. Useful for finding variants "nearby" a gene.
custom-queryquery [--variables '{}']: Run a raw GraphQL query for
any other Open Targets data.
L2G Query Usage
The get-l2g command has two modes:
- Variant only (
get-l2g <variant_id>): Returns L2G predictions from
all credible sets across all studies where that variant is the lead variant. This can return a large number of results (e.g., hundreds). Use this when the user wants a broad view of which gene is most likely causal at a locus, or when no specific study is mentioned.
- Variant + study (
get-l2g <variant_id> --study-id <study_id>): Returns
L2G predictions only for credible sets from that specific study. Use this when the user asks about a specific GWAS study or when you need to narrow down the results.
Incomplete results warning: The variant-only mode can return hundreds of credible sets. The default
--page-sizeis 200, so if the API reports acounthigher than the number ofrowsreturned, you are seeing incomplete results. Always comparecountto the actual number of rows. If they differ, either increase--page-sizeor inform the user that only a subset was retrieved.
Querying by Region
To find studies with variants "nearby" a gene, use get-credible-sets-near-target, which improves upon the base API by performing a flexible search based on genomic position: uv run scripts/query_opentargets.py --output /tmp/results.json get-credible-sets-near-target ENSG00000156515 --window 500000
Note that the Open Targets GraphQL schema includes a regions parameter for credibleSets, however it performs an exact match against pre-computed region strings (e.g., chr10:68769984-69903496) and there is some missing data. Use get-credible-sets-near-target as it allows a genomic range overlap search.
This fetches credible sets associated with the target and filters them in Python based on the variant's genomic position.
Advanced GraphQL Queries
If you need to query endpoints or fields not exposed by the built-in subcommands, use the custom-query subcommand.
Before writing a custom query: Read the reference documentation to understand the API schema, types, and see example queries. See references/OpenTargets_GraphQL_Guide.md for full schema details, endpoints, and examples.
Example: Finding drugs for a disease
uv run scripts/query_opentargets.py custom-query \
query drugsForDisease($id: String!) {
disease(efoId: $id) {
name
drugAndClinicalCandidates {
count
rows {
maxClinicalStage
drug {
id
name
}
}
}
}
}' \
--variables '{"id": "EFO_1001006"}'
--output '/tmp/opentargets_result.json'
Confidence Star Ratings
The Open Targets Platform assigns a confidence level to each credible set based on the fine-mapping method and quality checks. These correspond to star ratings displayed in the platform UI:
| Stars | Confidence String (API value) |
|---|---|
| ★★★★ (4 stars) | SuSiE fine-mapped credible set with in-sample LD |
| ★★★ (3 stars) | SuSiE fine-mapped credible set with out-of-sample LD |
| ★★ (2 stars) | `PICS fine-mapped credible set extracted from summary |
: : statistics : | ★ (1 star) | PICS fine-mapped credible set based on reported top hit | | None | Unknown confidence` |
When users ask about "N-star confidence", match their request to the corresponding string in the confidence field of the API response.
Tips and Common Mistakes
- ID Formats:
- Disease IDs must be in EFO format (e.g.
EFO_0000685). - Target IDs must be Ensembl IDs (e.g.
ENSG00000169083), not HGNC
symbols. If you only have a gene symbol, you may need to map it first using a custom GraphQL search query.
- Variant IDs are formatted as
chromosome_position_ref_alt(e.g.,
1_154426264_C_T). A chr prefix (e.g. chr1_154426264_C_T) is automatically stripped by the tool.
- Study IDs can be GWAS Catalog IDs (e.g.
GCST90204201) or
project-specific IDs (e.g. FINNGEN_R12_RX_CROHN_2NDLINE).
- Truncation: The tool truncates arrays longer than
--limitto protect
the context window. If you see "_truncated", you can run the query again with a higher limit if you specifically need more data, but be cautious with large limit values. Always use the --output flag to save the result to a file and avoid terminal output truncation.
- Pagination and incomplete results: The
--page-sizeoption (default:
200) controls how many items are fetched from the API. Always check the count field in the response and compare it to the number of rows actually returned. If count > number of rows, you have incomplete data — either increase --page-size to fetch more, or inform the user that only a partial result set was returned. This is especially important for get-l2g without --study-id, which can return hundreds of credible sets.

