Create a PSSM

BACKGROUND

A protein Position Specific Scoring Matrix (PSSM) takes the alignment of your protein to a list of homologs and counts the amino acids found at each position for the homologs. This is an easy way of measuring evolutionarily related mutations accepted at each position. This can be used to guide protein design because mutations found in homologs are likely to maintain structural integrity and function. You can create a PSSM for your protein, then use it as a list of potential mutations for design in Cyrus Bench.

For more information on how to run design with your PSSM, click here.

CREATING YOUR PSSM

Warning: Creating a PSSM on your computer requires significant disc space and takes hours for most computers to download, format, and process. Before beginning the process, make sure that you have room for 200GB of data. Though you will be able to delete the compressed version once everything is unpacked to free up much of this space.

We can also run your PSSM manually. Just email support and include the protein sequence (or sequences) that you would like to be run and a name for your sequence. We will happily run this on our system and email the PSSM back to you within one week, hopefully much sooner.

QUICKSTART INSTRUCTIONS FOR EXPERTS:

  1. Download NR Database
    Go to this page and click on nr.gz to download
    If you already have one and you want to update, execute with this command:
    Mac: PATH/update_blastdb.pl –passive –decompress nr
    PC: perl PATH\update_blastdb.pl –passive –decompress nr
  2. Choose and download the Blast+ executable
  3. Build the Database with this command:
    Mac: PATH/ncbi-blast-2.7.1+/bin/makeblastdb -in PATH/nr -input_type fasta -title nonR -dbtype prot
    PC: PATH\ncbi-blast-2.7.1+\bin\makeblastdb -in PATH\nr -input_type fasta -title nonR -dbtype prot
  4. Execute with this command:
    Mac: PATH/blast+/bin/legacy_blast.pl blastpgp -d PATH/nr -j 4 -b 1 -a 2 -Q PATH/output.pssm -i PATH/input.fasta –path PATH/blast+/bin
    PC: perl PATH\blast+\bin\legacy_blast.pl blastpgp -d PATH\nr -j 4 -b 1 -a 2 -Q PATH\output.pssm -i PATH\input.fasta –path PATH\blast+\bin

INSTRUCTIONS FOR BEGINNERS:

This process can be challenging so feel free to email support@levitate.bio any time you run into trouble. We are happy to help.

You need three things in order to create a PSSM, your protein sequence, a database of protein sequences that contains potential homologs, and the program that will create the PSSM.

  1. Your protein sequence – This need to be put in fasta format and saved as name.fasta (using any name).
    Example: 5ORB.fasta
    > 5ORB

    SMSVKKPKRDDSKDLALCSMILTEMETHEDAWPFLLPVNLKLVPGYKKVIKKP
    MDFSTIREKLSSGQYPNLETFALDVRLVFDNCETFNEDDSDIGRAGHNMRKYF
    EKKWTDTFKVS
  2. Download a database of non-redundant protein sequence from NCBI.
    Click here to begin the download. This is a very large file so can take a long time to download. Once it has been downloaded, move it to the desired location and double click to unzip. This is just the fastas for the database. You will still need to build the database, which will be described below.
  3. Download a program from the NCBI called Blast+. This is what you will pass your database and your sequence to in order to create the PSSM.
    Click here to go to NCBI’s latest version of Blast+. You need to select the version that is compatible for your operating system. Unless you have a specific preference, we recommend choosing a file that ends in win64.tar.gz for Windows (There isn’t currently a compatible version for 32 bit Windows). Double click the file to prepare for use.

RUNNING NCBI BLAST+

  1. Download a Linux environment or a command line shell because Blast+ only runs as command line. For Mac, you can use the Terminal program. For Windows, we recommend installing a Linux subsystem, Powershell, or downloading PuTTY. Here are some basic Linux commands that you can use in Terminal or PuTTY.
    pwd - Tells you the path for the directory where you are located
    ls - Lists files and directory in your directory
    ls PATH - Lists files and directory in PATH
    cd PATH - Moves you to the location defined by PATH
    cd ~ - Moves you to your home directory
  2. Make sure that you have perl. For Mac, you should have it. For Windows, use these commands
    perl -MCPAN -e 'install List::MoreUtils'
    sudo apt-get install perl-doc
  3. Build the database with the nr fastas that you downloaded. We recommend creating a directory where you can keep blastx and the nr database. Go to this directory in Terminal of PuTTY and build the database with this command:
    Mac: PATH/ncbi-blast-2.7.1+/bin/makeblastdb -in PATH/nr -input_type fasta -title nonR -dbtype prot
    PC: PATH\ncbi-blast-2.7.1+\bin\makeblastdb -in PATH\nr -input_type fasta -title nonR -dbtype prot
    Replace PATH with the path to the files. Use the pwd and/or ls commands to get the path. This program takes a long time to run because it is breaking the whole database into subunits.
  4. Create the PSSM with this command:
    Mac: PATH/blast+/bin/legacy_blast.pl blastpgp -d PATH/nr -j 4 -b 1 -a 2 -Q PATH/output.pssm -i PATH/input.fasta –path PATH/blast+/bin
    PC: perl PATH\blast+\bin\legacy_blast.pl blastpgp -d PATH\nr -j 4 -b 1 -a 2 -Q PATH\output.pssm -i PATH\input.fasta –path PATH\blast+\bin
    This will also take a long time. By default, it sets the E value cutoff is 10-3, but this can be changed by adding -h 0.0000001 in the command line. Change value to your new cutoff.
  5. Update the database if enough time has passed since you downloaded the non-redundant NCBI fastas. This is just as time consuming as the original download.
    Mac: PATH/update_blastdb.pl –passive –decompress nr
    PC: perl PATH\update_blastdb.pl –passive –decompress nr

Updated: