RFDiffusion All Atom API

The RFDiffusionAllAtom API runs RFDiffusionAllAtom on an input template PDB file. Like RFDiffusion RFDiffusionAllAtom is a method for structure generation (with or without conditional information) useful for protein design challenges. RFDiffusion is capable of motif scaffolding, unconditional protein generation, symmetric motif scaffolding, binder design, and more (See References). RfDiffusionAllAtom improves upon the original by extending its use to complexes containing non protein residues including small molecule ligands, glycans, and nucleic acids.

Quickstart

RFDiffusionAllAtom can be run with the following command (note the contig syntax is slightly different than default RfDiffusion):

lev engine submit rf-diffusion-aa input-template.pdb --n-rfdiffusion-designs 5 --rfdiffusion-contigs ['\A1-25/25-25\'] --ligand-id XYZ

(See Notes for more information on RFDiffusion contigs and workflow)

Inputs

  • --pdb-file (str)
    • Input template PDB file
  • --n-rfdiffusion-designs (str)
    • The number of backbones to generate
  • --rfdiffusion-contigs
    • Contigs for RFDiffusion runs
  • --custom-arguments (str)
    • Used to set custom arguments for rf diffusion all atom. i.e. ppi.hotspot_res=”[A1]”
  • --deterministic (bool)
    • If set to true backbones will be generated deterministically (i.e. runing with the same commands will generate the same results)
  • --ligand-id (str)
    • The three letter code for the ligand in the input pdb file
  • --temperature (float)
    • The sampling temperature used in model generation. Higher temperature will result in more diverse models at the expensive of model confidence. Default 100.

Outputs

  • outputs (directory)
    • Directory containing results of RFDiffusionAllAtom runs.

Notes

RFDiffusion Contigs

The contigs flags are discussed at length in the RFdiffusion repository. The exact syntax varies slightly for the AllAtom version linked above.

README

Now, what does 'contigmap.contigs=[150-150]' mean? To those who have used RFjoint inpainting, this might look familiar, but a little bit different. Diffusion, in fact, uses the identical ‘contig mapper’ as inpainting, except that, because we’re using hydra, we have to give this to the model in a different way. The contig string has to be passed as a single-item in a list, rather than as a string, for hydra reasons and the entire argument MUST be enclosed in ‘’ so that the commandline does not attempt to parse any of the special characters.

The contig string allows you to specify a length range, but here, we just want a protein of 150aa in length, so you just specify [150-150] This will then run 10 diffusion trajectories, saving the outputs to your specified output folder.

In more detail, if we want to scaffold a motif, the input is just like RFjoint Inpainting, except needing to navigate the hydra config input. If we want to scaffold residues 10-25 on chain A a pdb, this would be done with 'contigmap.contigs=[5-15/A10-25/30-40]'. This asks RFdiffusion to build 5-15 residues (randomly sampled at each inference cycle) N-terminally of A10-25 from the input pdb, followed by 30-40 residues (again, randomly sampled) to its C-terminus.

References

Generalized biomolecular modeling and design with RoseTTAFold All-Atom Broadly applicable and accurate protein design by integrating prediction networks and diffusion generative models

Updated: