Boltz1 API

The Boltz1 API provides an interface to access the Boltz1 structure prediction tool. Boltz1 is the first open source, commercially available, replication of AlphaFold3. The tool is used for structure prediction of complexes involving both protein and non-protein components, including RNA, DNA, ligands, and more. Levitate offers the latest version of boltz (boltz-1x) with its steering potential to limit hallucinations. The source code and additional documentation can be found on the github page and the latest version of the preprint can be found here

Command Line Interface

You need to include at least one fasta file for one of the following flags:

--fasta-file
--dna-fasta
--rna-fasta

Examples

Predict a complex which contains protein, dna, rna, and a standalone ligand

lev engine submit boltz1 \
  --fasta-file input.fasta \
  --msa input.a3m \
  --dna-fastas input_dna.fasta \
  --rna input_rna.fasta \
  --smiles "O=C=O"

Predict a protein-metal ion complex

lev engine submit boltz1 \
    --fasta-file input.fasta \
    --smiles '[Zn+2]' \
    --single-seq input

Predict a complex which contains 2 proteins one with a matching MSA and the other in single-seq mode.

lev engine submit boltz1 \
  --fasta-file input1.fasta \
  --fasta-file input2.fasta \
  --msa input1.a3m \
  --single-seq input2 

A precalculated MSA file is required for each protein sequence that isn’t being run in singleseq mode. This can be generated using the colab-search API. The base name of the MSA file must match the base name of it’s matching fasta file. i.e. input.fasta and input.a3m.

Predict a complex which contains multiple protein, double stranded DNA, double stranded RNA, and a small molecule ligand

lev engine submit boltz1 \
  --fasta-file input1.fasta \
  --fasta-file input2.fasta \
  --msa input1.a3m \
  --msa input2.a3m \
  --dna-fasta plus_strand.fasta \
  --dna-fasta minus_strand.fasta \
  --rna-fasta plus_strand.fasta \
  --rna-fasta minus_strand.fasta \
  --smiles "O=C=O"

Predict a complex which contains a covalently bound ligand (only works for ligands in the CCD)

lev engine submit boltz1 \
  --fasta-file input1.fasta \
  --fasta-file input2.fasta \
  --msa input1.a3m \
  --ligand-id "NAG" \
  --covalent-bonds input1,92,ND2,NAG,1,C1

Predict a complex which contains a 2 elements where you want one to interact with certain residues of the other

lev engine submit boltz1 \
  --fasta-file input1.fasta \
  --fasta-file input2.fasta \
  --msa input1.a3m \
  --ligand-id "NAG" \
  --pocket-constraints constraints.json

Contents of constraints.json

{
  "binder": "input1",
  "contacts": [["input2", 10], ["input2", 20]]
}

Replace input1 with either a protein fasta base name or a ligand CCD ID. Replace input2 with either another protein fasta base name or a ligand CCD ID. The contacts are a list of lists where the first element is the base name of a protein fasta or a ligand CCD ID and the second element is the residue index or atom index, respectively. You can therefore only define the specific residue or ligand contact constraints on one side of the interaction.

Flags

--covalent-bonds (str) (Optional)
- A comma separated list following the format ID1,residue_number,atom_name1,ID2,residue_number2,atom_name2. This will create a covalent bond between the two atoms. The ID is either the base name of the fasta file or the CCD id of the ligand.
--dna-fasta (str) (Optional)
- The path to the dna fasta file. Multiple files can be included by using the flag multiple times.
--fasta-file (str) (Required)
- The path to the protein fasta file. Multiple files can be included by using the flag multiple times. Each protein fasta must have a matching MSA file or be set to single-seq mode.
--gpu-type (str) (Default: t4)
- The type of GPU to use. Default is “t4” but boltz1 frequently requires very high GPU memory, especially when large non-protein non-nucleic acid ligands are invovled. If you see a file titled OUT_OF_MEMORY_TRY_BIGGER_GPU in the output that means your GPU ran out of memory and you should try and A100 GPU. If you were already using an A100 and still ran out of memory please contact support for assistance.
- Options:
  - t4
  - l4
  - a100
--ligand-id (str) (Optional)
- A CCD id for a ligand. Multiple ligands can be included by using the flag multiple times.
--msa (str) (Optional)
- The path to an a3m file that matches the base name of a protein fasta set with –fasta-file. Multiple files can be included by using the flag multiple times and each should correspond to a matching protein fasta.
--rna-fasta (str) (Optional)
- The path to the rna fasta file. Multiple files can be included by using the flag multiple times.
--single-seq (str) (Optional)
- The ID of a protein you wish to run in single sequence mode. The ID is the base name of the fasta file for the protein. i.e. input.fasta would be identified as “input”.
--smiles (str) (Optional)
- A smile string for a ligand. Multiple ligands can be included by using the flag multiple times.
--pocket-constaints (str) (Optional)
- Either a json file or string. The format should be: {“binder”: “element1”, “contacts”: [[“element2”, 10], [“element2”, 20]]}. where element1 & 2 refer to the base name of the fasta file or a ligand ID.

Python Interface

Flags

fasta_paths (List[str]) (Required)
- The path to the protein fasta file. Multiple files can be included by using the flag multiple times. Each protein fasta must have a matching MSA file or be set to single-seq mode.
dna_fasta_paths (List[str]) (Optional)
- The path to the dna fasta file. Multiple files can be included by using the flag multiple times.
rna_fasta_paths (List[str]) (Optional)
- The path to the rna fasta file. Multiple files can be included by using the flag multiple times.
smiles (List[str]) (Optional)
- A smile string for a ligand. Multiple ligands can be included by using the flag multiple times.
gpu_type (str) (Default: t4)
- The type of GPU to use. Default is “t4” but boltz1 frequently requires very high GPU memory, especially when large non-protein non-nucleic acid ligands are invovled. If you see a file titled OUT_OF_MEMORY_TRY_BIGGER_GPU in the output that means your GPU ran out of memory and you should try and A100 GPU. If you were already using an A100 and still ran out of memory please contact support for assistance.
- Options:
  - t4
  - l4
  - a100

Outputs

The output is a tarball with the file found at “output/boltz_results_output/predictions/output/output_model_0.pdb” being the predicted structure. The other files mostly represent intermediates and logs which are useful for debugging.