Featured

Launch OpenClaw on Hostinger in about 60 seconds and keep your agent live 24/7. Our referral link gives you 20% off, no coupon code needed.

Launch on Hostinger →

Run your Hermes agent on Hostinger, fully managed

Launch Hermes on Hostinger in one click, fully managed, no VPS knowledge needed. Use code ZACAARON10 for 10% off.

Launch on Hostinger →

Turn any website into LLM-ready data with Firecrawl

Firecrawl crawls and scrapes any site into clean markdown for your agent. Get 1,000 free credits plus 10% off through our link.

Try Firecrawl free →

Your own AI agent, running 24/7 with QwikClaw

QwikClaw sets up and runs an always-on OpenClaw agent for you. One click, no config files, no server setup.

Deploy now →

One API to scrape, enrich, and extract the internet.

Context.dev gives your agents a single API to scrape, enrich, and extract live web data — no proxies, no parsers, no maintenance.

Start building free →

Deploy OpenClaw in 60 seconds — 20% off

Launch OpenClaw on Hostinger in about 60 seconds and keep your agent live 24/7. Our referral link gives you 20% off, no coupon code needed.

Launch on Hostinger →

Run your Hermes agent on Hostinger, fully managed

Launch Hermes on Hostinger in one click, fully managed, no VPS knowledge needed. Use code ZACAARON10 for 10% off.

Launch on Hostinger →

Turn any website into LLM-ready data with Firecrawl

Firecrawl crawls and scrapes any site into clean markdown for your agent. Get 1,000 free credits plus 10% off through our link.

Try Firecrawl free →

Your own AI agent, running 24/7 with QwikClaw

QwikClaw sets up and runs an always-on OpenClaw agent for you. One click, no config files, no server setup.

Deploy now →

One API to scrape, enrich, and extract the internet.

Context.dev gives your agents a single API to scrape, enrich, and extract live web data — no proxies, no parsers, no maintenance.

Start building free →

protein-sequence-similarity-search

google-deepmind/science-skills

656 installs

Run it on Hostinger →up to 70% off + an extra 10% with code ZACAARON10 Free API →

Installation

npx skills add https://github.com/google-deepmind/science-skills --skill protein-sequence-similarity-search

Summary

SKILL.md

Prerequisites

uv: Read the uv skill and follow its Setup instructions to ensure

uv is installed and on PATH.

User Notification: If LICENSE_NOTIFICATION.txt does not already exist in

this skill directory then (1) prominently notify the user to check the terms at https://www.ebi.ac.uk/jdispatcher/sss/ncbiblast and https://colabfold.com, then (2) create the file recording the notification text and timestamp.

.env file: Make sure the .env file exists in your home directory.

Create one if it does not exist.

USER_EMAIL (optional but recommended): Recommended by the EBI for

BLAST job tracking, but the skill works without it. If the variable is missing from .env, do NOT ask the user to paste it into the chat (this would leak the value into the agent's context). Instead, give the user this command — substituting ENV_FILE with the resolved literal path to the .env file:

    printf "Enter contact email: " && read email && echo "USER_EMAIL=$email" >> "ENV_FILE" && echo "Saved."

The scripts load credentials automatically via dotenv. NEVER read, print, or inspect the .env file or its variables (e.g. no cat, grep, echo, printenv, or os.environ.get on keys). Credentials must stay out of the agent's context.

Goal

Take a user-provided amino acid sequence (or a path to a .fasta file), search for sequence homologues using the fastest available method, generate a Markdown-formatted table of the top hits, interpret key alignment metrics, summarize the inferred protein functions, and save results locally for future programmatic analysis.

Core Rules

Strict Validation: For BLAST, only use database codes listed in the

table below.

No Hallucinations: If a script throws an error or returns no hits,

inform the user clearly. Do NOT invent sequence homologues.

Do Not Parse Output Files: Do not parse the JSON, a3m, or any other raw

output files. Rely on the generated .md file for your summary. The JSON and other outputs are for subsequent tool use only.

Always State the Method: Every report must clearly state whether the

search used the quick MMseqs2 (ColabFold API) or the slower EBI BLAST method.

Notification: If this skill is used, ensure this is mentioned in the

output. Explicitly state that the corresponding program (MMSEQS2 or EBI BLAST) and Sequence Databases were used.

Search Method Selection

Choose the search method based on the user's request:

If the user says "quick search" or "fast search", no specific method requested / general homologue search, of if you are unsure: Run MMseqs2 (fast, default) using mmseqs2_search.py

If MMseqs2 fails (exit code 2: RATELIMIT or API error) or User explicitly requests "BLAST" or a specific BLAST database (e.g. uniprotkb_swissprot, pdb, uniprotkb_human): Run BLAST using uniprot_blast.py

Instructions

Identify the query from the user. It can be a raw sequence string (e.g.,

"MKVLY...") or a path to a local file (e.g., "./data/sequence.fasta").

Determine the search method using the list above.

Path A: MMseqs2 Search (Default)

Generate File Names: Generate descriptive output file names based on the

input (e.g., proteinA_mmseqs2.json and proteinA_mmseqs2.md).

Execute the MMseqs2 script:

Default:

    uv run scripts/mmseqs2_search.py <SEQUENCE_OR_FILE> -o <generated-filename.md> -j <generated-filename.json>

With mgnify:

    uv run scripts/mmseqs2_search.py <SEQUENCE_OR_FILE> -o <generated-filename.md> -j <generated-filename.json> --include-mgnify

The script will query the ColabFold MMseqs2 API and poll for completion.

This is typically fast (under 2 minutes).

If the script exits with code 2 (API failure, rate limit), automatically

fall back to BLAST (Path B below). Inform the user: "MMseqs2 search failed, falling back to BLAST."

Read the Results: Open and read the generated .md file.

Path B: BLAST Search (Explicit or Fallback)

Database Selection & Validation: Determine the most appropriate

database(s) based on the user's prompt.

Consult the Available BLAST Databases table below.
If the user specifies a taxonomic group (e.g., "Find homologues in

microbes"), select the corresponding Database Code (e.g., uniprotkb_bacteria).

If the user explicitly requests curated hits, use uniprotkb_swissprot.
If no specific database is requested, do not specify --databases.
Validation: Ensure the database code exactly matches an entry in the

table. If the user requests a database not on the list, do not proceed and provide the allowed list.

Generate File Names: (e.g., proteinA_ebi_blast.json and

proteinA_ebi_blast.md).

This API requires the user email address to be set in the USER_EMAIL

environment variable for inclusion in request header.

Execute the BLAST script:

Default (uniprotkb):

    uv run scripts/uniprot_blast.py <SEQUENCE_OR_FILE> -o <generated-filename.md> -j <generated-filename.json>

Custom database:

    uv run scripts/uniprot_blast.py <SEQUENCE_OR_FILE> -o <generated-filename.md> -j <generated-filename.json> --databases <db1,db2>

The script will query the EBI BLAST API and poll the server. Note: This

can take up to 15 minutes; wait patiently.

Read the Results: Open and read the generated .md file.

Common Steps (Both Methods)

Interpret the Metrics: Summarize the top 3 to 5 sequence homologues.

Assess match quality using:

Q-Cov (Query Coverage): High percentages mean the match covers most

of the query sequence.

E-value: Lower E-values (e.g., 1e-50) indicate extreme statistical

significance.

Seq Identity: Provides evolutionary context (highly conserved vs.

distant homologue).

Perform Functional Analysis:

If the results table includes protein descriptions, analyze them

directly: report specific protein names/functions of the top homologues and summarize the variety of functions, domains, or protein families found.

If the results contain only UniProt accession IDs without descriptions

(common with MMseqs2), look up the protein names and functions for the top 3–5 hits using the uniprot-database skill or other appropriate methods before summarizing.

Inform the user of both newly created files (.json and .md) and their

locations.

Available BLAST Databases

uniprotkb – UniProt Knowledgebase (The UniProt Knowledgebase includes

UniProtKB/Swiss-Prot and UniProtKB/TrEMBL): The UniProt Knowledgebase (UniProtKB) is the central access point for extensive curated protein information, including function, classification, and cross-references. Search UniProtKB to retrieve "everything that is known" about a particular sequence

uniprotkb_swissprot – UniProtKB/Swiss-Prot (The manually annotated section

of UniProtKB): The manually curated subsection of the UniProt Knowledgebase

uniprotkb_swissprotsv – UniProtKB/Swiss-Prot isoforms (The manually

annotated isoforms of UniProtKB/Swiss-Prot): The isoform sequences for the manually curated subsection of the UniProt Knowledgebase

uniprotkb_reference_proteomes – UniProtKB Reference Proteomes: Taxonomic

subset of the UniProtKB Reference Proteomes

uniprotkb_trembl – UniProtKB/TrEMBL (The automatically annotated section

of UniProtKB): Subsection of the UniProt Knowledgebase derived from ENA Sequence (formerly EMBL-Bank) coding sequence translations with annotation produced by an automated process

uniprotkb_refprotswissprot – UniProtKB Reference Proteomes plus

Swiss-Prot: UniProtKB Reference Proteomes plus Swiss-Prot

uniprotkb_archaea – UniProtKB Archaea: Taxonomic subset of the UniProt

Knowledgebase for archaea

uniprotkb_arthropoda – UniProtKB Arthropoda: Taxonomic subset of the

UniProt Knowledgebase for arthropoda

uniprotkb_bacteria – UniProtKB Bacteria: Taxonomic subset of the UniProt

Knowledgebase for bacteria

uniprotkb_complete_microbial_proteomes – UniProtKB Complete Microbial

Proteomes: Taxonomic subset of the UniProt Knowledgebase for complete microbial proteomes

uniprotkb_eukaryota – UniProtKB Eukaryota: Taxonomic subset of the UniProt

Knowledgebase for eukaryota

uniprotkb_fungi – UniProtKB Fungi: Taxonomic subset of the UniProt

Knowledgebase for fungi

uniprotkb_human – UniProtKB Human: Taxonomic subset of the UniProt

Knowledgebase for human

uniprotkb_mammals – UniProtKB Mammals: Taxonomic subset of the UniProt

Knowledgebase for mammals

uniprotkb_nematoda – UniProtKB Nematoda: Taxonomic subset of the UniProt

Knowledgebase for nematoda

uniprotkb_rodents – UniProtKB Rodents: Taxonomic subset of the UniProt

Knowledgebase for rodents

uniprotkb_vertebrates – UniProtKB Vertebrates: Taxonomic subset of the

UniProt Knowledgebase for vertebrates

uniprotkb_viridiplantae – UniProtKB Viridiplantae: Taxonomic subset of the

UniProt Knowledgebase for viridiplantae

uniprotkb_viruses – UniProtKB Viruses: Taxonomic subset of the UniProt

Knowledgebase for viruses

uniprotkb_enzyme – UniProtKB Enzyme: Taxonomic subset of the UniProt

Knowledgebase for enzymes

uniprotkb_covid19 – UniProtKB COVID-19: Taxonomic subset of the UniProt

Knowledgebase for COVID-19

uniref100 – UniProt Clusters 100% (UniRef100): The UniProt Reference

Clusters (UniRef) containing sequences which are 100% identical.

uniref90 – UniProt Clusters 90% (UniRef90): The UniProt Reference Clusters

(UniRef) containing sequences which are 90% identical.

uniref50 – UniProt Clusters 50% (UniRef50): The UniProt Reference Clusters

(UniRef) containing sequences which are 50% identical.

pdb – Protein Structure Sequences (PDBe protein structure sequences):

Protein sequences from structures described in the Brookhaven Protein Data Bank (PDB)

Score

0–100

55/ 100

Grade

Popularity15/30

656 installs — growing adoption.

Completeness19/30

Documented: full SKILL.md body, one-line install. Missing: description, category/license metadata.

Trust15/25

Community skill with a public GitHub source repository you can review.

Freshness6/15

No update timestamp is tracked for this skill in our catalog.

Scored automatically from popularity, completeness, trust, and freshness — computed only from data in our catalog, never fabricated.

Proud of your score? Add this badge to your README.

Paste a snippet into your GitHub README. The badge updates automatically and links back to this page.

Score badge

Markdown

[![Protein Sequence Similarity Search skill](https://www.remoteopenclaw.com/skills/google-deepmind/science-skills/protein-sequence-similarity-search/badges/score.svg)](https://www.remoteopenclaw.com/skills/google-deepmind/science-skills/protein-sequence-similarity-search)

HTML

<a href="https://www.remoteopenclaw.com/skills/google-deepmind/science-skills/protein-sequence-similarity-search"><img src="https://www.remoteopenclaw.com/skills/google-deepmind/science-skills/protein-sequence-similarity-search/badges/score.svg" alt="Protein Sequence Similarity Search skill"/></a>