Sample Orientations API
The Sample Orientations
API provides an interface to generate multiple orientations of protein structure inputs. It works by first aligning the structures on their center of mass (or termini if using terminal mode), performing random rotation, picking a direction, and then moving the structures away from each other along that direction until they are no longer in contact plus an additional distance equal to the pad-distance. This allows you to create lots of different conformations of the input structures relative to each other in 3D space. This is primarily useful for either motif scaffolding with RF diffusion or to test possible conformations by which the proteins could be linked together. As such this workflow also generates a json file with the contigs to run RF diffusion to link all the elements together with several of the input arguments, described below, allowing for some control over these contigs. The pdb outputs of this tool plus the contigs.json file generated by this tool can be used with the rf-diffusion tool to link everything together in accordance with the contig definitions in the json file.
Command Line Interface
Examples
Generate orientations for combinations of all PDB structures in a directory
# All PDB files in the current directory using wildcard
lev engine submit sample-orientations *.pdb --num-structures 10 --pad-distance 5 --max-size 300 --n-contigs 2
Flags
--structures
(str) (Optional)- Comma-separated list of chain identifiers to process. This parameter controls which chains from the input PDB files are used for orientation sampling. Use this if you want to fix the order the structures occur in the output file. I.e. You know you want the C terminus of input1.pdb linked to the N terminus of input2.pdb which in turn is linked to input3.pdb.
- Example:
--structures input1.pdb,input2.pdb,input3.pdb
--num-structures
(int) (Default:1
)- Number of orientations to generate.
--pad-distance
(float) (Default:5.0
)- Distance padding around the structure for orientation sampling. This will determine how far apartment from each other the structures will be.
--full-atom
(bool) (Default:false
)- Whether to use full atom representation instead of backbone-only. By default only backbone atoms are used to assess whether the structures are in contact with each other. You can override this to include all atoms at the cost of increased run time.
--terminal-mode
(bool) (Default:false
)- Enable terminal mode for orientation sampling. When true the structures will be aligned N-C terminus after the random rotation is performed and they are pulled appart. This can improve the sampling of conformations where the connecting termini are close to each other which is useful for linker design.
--max-size
(int) (Default:100
)- This protocol generates contigs to be used in rf-diffusion. This parameter caps the maximum size of the complex those contigs can generate. All the residues in the starting structures are included in this count. Contigs will not be generated for an orientation if there are not enough residues to close the gap between termini so this value must be set large enough to generate reasonable final complexes.
--exclude-terminus
(bool) (Default:false
)- If this is set to true the contigs will be set not to add any new residues to the N and C termini of the final complex. You probably want this to be false for motif scaffolding (contigs will be generated for the termini) and true for linker design (where you only want to link the structures together not add anything extraneous)
--n-contigs
(int) (Default:1
)- Number of contigs to generation for each orientation.
--fixed-size
(int) (Default:0
)- This will fix the size of the output model generated by the rf-diffusion contigs. This will override max size. This can be useful when you want to ensure you can easily compare the rf-diffusion results because they all have the same size. When set to 0 this is turned off and the structures will be no larger than the max size variable allows.
--pdb-file
(str) (Required)- The path to a PDB files containing the protein structures you want to sample orientations for.
- Example:
--pdb-file input.pdb --pdb-file input2.pdb
--pdb-file *.pdb
--pdb-file input.tgz
Python Interface
Examples
Generate orientations with custom parameters
job_id = client.submit_sample_orientations(
pdb_paths=["input.pdb", "input2.pdb"],
structures="input1.pdb,input2.pdb",
num_structures=10,
pad_distance=5.0,
full_atom=False,
terminal_mode=False,
max_size=200,
exclude_terminus=False,
n_contigs=2,
fixed_size=0
)
Parameters
pdb_paths
(List[str]) (Required)- The path to PDB file(s) containing the protein structures you want to sample orientations for.
structures
(str) (Optional)- Comma-separated list corresponding to the order in which the pdb files should be added to the complex.
num_structures
(int) (Default:1
)- Number of orientation structures to generate per input structure.
pad_distance
(float) (Default:5.0
)- Distance padding around the structure for orientation sampling.
full_atom
(bool) (Default:false
)- Whether to use full atom representation instead of backbone-only.
terminal_mode
(bool) (Default:false
)- Enable terminal mode for orientation sampling.
max_size
(int) (Default:100
)- Maximum size parameter for orientation sampling.
exclude_terminus
(bool) (Default:false
)- Whether to exclude terminal regions from orientation sampling.
n_contigs
(int) (Default:1
)- Number of contigs to use in orientation sampling.
fixed_size
(int) (Default:0
)- Fixed size parameter for orientation sampling. Set to 0 to use variable sizing.
Outputs
pdbs
- The pdb files containing the different orientations sampled by this tool
contigs.json
- The json file which maps the generated contigs to the pdb file they correspond to.