Tolerance Identification API
The Tolerance Identification API finds the top N closest (by Blosum62) N-mers in the human genome against a given protein of sequence.
Command Line Interface
Examples
Get the top 10 closest 9-mers:
lev engine submit tolerance-identification NLYIQWLKDGGPSSGRPPPS \
--top-n 10
Get the top 5 closet 9 and 15-mers:
lev engine submit tolerance-identification NLYIQWLKDGGPSSGRPPPS \
--top-n 5 \
--nmer-sizes 9,15
Flags
--sequence
(str) (Required)- Input protein of sequence to compare against
--top-n
(int) (Default:20
)- Collect the top N matches
--nmer-sizes
(str) (Default:9
)- Nmer size(s) to run this on (Comma separated string ex: 9,10,11,12)
Python Interface
Examples
Get the top 10 closest 9-mers:
from engine import EngineClient
client = EngineClient()
client.authorize()
client.submit_tolerance_identification(
sequence="NLYIQWLKDGGPSSGRPPPS",
top_n=10
)
Get the top 5 closest 9 and 15-mers:
client.submit_tolerance_identification(
sequence="NLYIQWLKDGGPSSGRPPPS",
top_n=5,
nmer_sizes="9,15"
)
Flags
sequence
(str) (Required)- Input protein of sequence to compare against
top_n
(int) (Default:20
)- Collect the top N matches
nmer_sizes
(str) (Default:9
)- Nmer size(s) to run this on (Comma separated string ex: 9,10,11,12)
Outputs
out.csv
- CSV file containing the following columns
nmer_size
- size of this nmerresnum
- residue number (1 indexed) of the nmer position in the query sequencequery_seq
- query sequencematchrank
- Rank (0=best, N = worst ) out of the top-N closest (by blosum62) nmers to the querymatchscore
- blosum62 score of the result to the query sequencematchseq
- the found human genome sequencematchscore/max_score
- matchscore divided by the score of a 100% (normalized Blosum62)
- CSV file containing the following columns
- out.json
- JSON format of the
out.csv
- JSON format of the
Notes
Running this protocol takes between 4 and 5 GB of memory per CPU
Input proteome
The input proteome file was taken from https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.abinitio.fa.gz