Boltz API
The Boltz API provides an interface to access the Boltz structure prediction tool. Boltz is the first open source, commercially available, replication of AlphaFold3. The tool is used for structure prediction of complexes involving both protein and non-protein components, including RNA, DNA, ligands, and more. Levitate offers the latest versions of Boltz (Boltz-1x and Boltz-2) with their options for constraining hallucinations and providing binding affinity predictions, respectively. The source code and additional documentation can be found on the github page and the latest preprint can be found here.
Command Line Interface
You need to include at least one fasta file for one of the following flags:
--fasta-file
--dna-fasta
--rna-fasta
Examples
Predict a complex which contains protein, dna, rna, and a standalone ligand.
lev engine submit boltz \
--fasta-file input.fasta \
--msa input.a3m \
--dna-fasta input_dna.fasta \
--rna-fasta input_rna.fasta \
--smiles "O=C=O"
Predict a protein-metal ion complex.
lev engine submit boltz \
--fasta-file input.fasta \
--smiles '[Zn+2]' \
--single-seq input
Predict a complex which contains 2 proteins one with a matching MSA and the other in single-seq mode.
lev engine submit boltz \
--fasta-file input1.fasta \
--fasta-file input2.fasta \
--msa input1.a3m \
--single-seq input2
A precalculated MSA file is required for each protein sequence that isn’t being run in singleseq mode. This can be generated using the colab-search API. The base name of the MSA file must match the base name of it’s matching fasta file. i.e. input.fasta and input.a3m.
Predict two monomers independently in batch mode.
lev engine submit boltz \
--fasta-file input1.fasta \
--fasta-file input2.fasta \
--msa input1.a3m \
--single-seq input2 \
--batch-size 2
Predict a complex which contains multiple protein, double stranded DNA, double stranded RNA, and a small molecule ligand.
lev engine submit boltz \
--fasta-file input1.fasta \
--fasta-file input2.fasta \
--msa input1.a3m \
--msa input2.a3m \
--dna-fasta plus_strand.fasta \
--dna-fasta minus_strand.fasta \
--rna-fasta plus_strand.fasta \
--rna-fasta minus_strand.fasta \
--smiles "O=C=O"
Predict a protein structure which contains a covalently bound ligand (only works for ligands in the CCD).
lev engine submit boltz \
--fasta-file input1.fasta \
--msa input1.a3m \
--ligand-id "NAG" \
--covalent-bonds input1,92,ND2,NAG,1,C1
Predict a protein structure which contains 2 molecules, where you want one to interact with certain positions of the other.
lev engine submit boltz \
--fasta-file input1.fasta \
--fasta-file input2.fasta \
--msa input1.a3m \
--ligand-id "NAG" \
--pocket-constraints constraints.json
Contents of constraints.json
{
"binder": "input1",
"contacts": [["input2", 10], ["input2", 20]]
}
Replace input1
with either a protein fasta base name or a ligand CCD ID. Replace input2
with either another protein fasta base name or a ligand CCD ID. The contacts are a list of lists where the first element is the base name of a protein fasta or a ligand CCD ID and the second element is the residue index or atom index, respectively. You can therefore only define the specific residue or ligand contact constraints on one side of the interaction.
Predict a protein structure and its affinity to a small molecule ligand.
lev engine submit boltz \
--fasta-file input1.fasta \
--msa input1.a3m \
--ligand-id "NAG" \
--affinity "NAG"
Flags
--affinity
(str) (Optional)- The CCD ligand ID or corresponding ligand SMILES string to for which calculate binding affinity.
--batch-size
(int32) (Optional)- The number of sequences to process in each batch.
--covalent-bonds
(str) (Optional)- A comma separated list following the format ID1,residue_number,atom_name1,ID2,residue_number2,atom_name2. This will create a covalent bond between the two atoms. The ID is either the base name of the fasta file or the CCD id of the ligand.
--dna-fasta
(str) (Optional)- The path to the dna fasta file. Multiple files can be included by using the flag multiple times.
--fasta-file
(str) (Required)- The path to the protein fasta file. Multiple files can be included by using the flag multiple times. Each protein fasta must have a matching MSA file or be set to single-seq mode.
--gpu-type
(str) (Default:t4
)- The type of GPU to use. Default is “t4” but boltz1 frequently requires very high GPU memory, especially when large non-protein non-nucleic acid ligands are invovled. If you see a file titled OUT_OF_MEMORY_TRY_BIGGER_GPU in the output that means your GPU ran out of memory and you should try and A100 GPU. If you were already using an A100 and still ran out of memory please contact support for assistance.
- Options:
t4
l4
a100
--ligand-id
(str) (Optional)- A CCD id for a ligand. Multiple ligands can be included by using the flag multiple times.
--msa
(str) (Optional)- The path to an a3m file that matches the base name of a protein fasta set with –fasta-file. Multiple files can be included by using the flag multiple times and each should correspond to a matching protein fasta.
--rna-fasta
(str) (Optional)- The path to the rna fasta file. Multiple files can be included by using the flag multiple times.
--single-seq
(str) (Optional)- The ID of a protein you wish to run in single sequence mode. The ID is the base name of the fasta file for the protein. i.e. input.fasta would be identified as “input”. Optionally, pass string “all” to run all given fasta files in single sequence mode.
--smiles
(str) (Optional)- A smile string for a ligand. Multiple ligands can be included by using the flag multiple times.
--pocket-constraints
(str) (Optional)- Either a json file or string. The format should be: {“binder”: “element1”, “contacts”: [[“element2”, 10], [“element2”, 20]]}. where element1 & 2 refer to the base name of the fasta file or a ligand ID.
--reference-structure
(str) (Optional)- The reference structure to use as a template for the prediction. The base name of the pdb should match one of the fasta files. i.e. input1.fasta -> input1.pdb.
Python Interface
Flags
fasta_paths
(List[str]) (Required)- The path to the protein fasta file. Multiple files can be included by using the flag multiple times. Each protein fasta must have a matching MSA file or be set to single-seq mode.
dna_fasta_paths
(List[str]) (Optional)- The path to the dna fasta file. Multiple files can be included by using the flag multiple times.
rna_fasta_paths
(List[str]) (Optional)- The path to the rna fasta file. Multiple files can be included by using the flag multiple times.
smiles
(List[str]) (Optional)- A smile string for a ligand. Multiple ligands can be included by using the flag multiple times.
gpu_type
(str) (Default:t4
)- The type of GPU to use. Default is “t4” but boltz1 frequently requires very high GPU memory, especially when large non-protein non-nucleic acid ligands are invovled. If you see a file titled OUT_OF_MEMORY_TRY_BIGGER_GPU in the output that means your GPU ran out of memory and you should try and A100 GPU. If you were already using an A100 and still ran out of memory please contact support for assistance.
- Options:
t4
l4
a100
reference_structures
(List[str]) (Optional)- List of paths to reference structure PDB files. The base name of the PDB file should match the base name of the fasta file it is paired with.
Outputs
The output is a tarball with the structure prediction found at “predictions/output/output_model_0.pdb” being the predicted structure. Additionally, “confidence_output_model_0.json” contains AI confidence metrics indicating the model’s confidence in its prediction accuracy and “affinity_output.json” contains predicted affinity metrics.
The affinity output includes two key predictions: affinity_pred_value and affinity_probability_binary, each suited to different stages of drug discovery.
- affinity_probability_binary (range: 0–1) estimates the likelihood that a ligand is a binder and is ideal for distinguishing binders from decoys in early hit discovery.
- affinity_pred_value predicts binding strength as log(IC50) from IC50 μM data, useful for assessing affinity changes during ligand optimization (e.g., hit-to-lead, lead optimization).
They are trained on different datasets with distinct supervision and should not be used interchangeably.
The other files mostly represent intermediates and logs which are useful for debugging.