Boltz1 API
The Boltz1 API provides an interface to access the Boltz1 structure prediction tool. Boltz1 is the first open source, commercially available, replication of AlphaFold3. The tool is used for structure prediction of complexes involving both protein and non-protein components, including RNA, DNA, ligands, and more. The source code and additional documentation can be found on the github page.
Examples
Predict a complex which contains protein, dna, rna, and a standalone ligand
lev engine submit boltz1 --fasta-file input.fasta --msa input.a3m --dna-fastas input_dna.fasta --rna input_rna.fasta --smiles "O=C=O"
Predict a complex which contains 2 proteins one with a matching MSA and the other in single-seq mode.
lev engine submit boltz1 --fasta-file input1.fasta --fasta-file input2.fasta --msa input1.a3m --single-seq input2
*Please note that a precaculated MSA file is required for each protein sequence that isn’t being run in singleseq mode. This can be generated using the colab-search API. The base name of the MSA file must match the base name of it’s matching fasta file. i.e. input.fasta and input.a3m.
Predict a complex which contains multiple protein, dna, and ligand elements
lev engine submit boltz1 --fasta-file input1.fasta --fasta-file input2.fasta --msa input1.a3m --msa input2.a3m --dna-fasta input_dna1.fasta --dna-fasta input_dna2.fasta --rna-fasta input_rna1.fasta --rna-fasta input_rna2.fasta --smiles "O=C=O" --smiles "O=C=O"
Predict a complex which contains a covalently bound ligand (only works for ligands in the ccd)
lev engine submit boltz1 --fasta-file input1.fasta --fasta-file input2.fasta --msa input1.a3m --ligand-id "NAG" --covalent-bonds input1,92,ND2,NAG,1,C1
Inputs
--fasta-file
(str)- The path to the protein fasta file. Multiple files can be included by using the flag multiple times. Each protein fasta must have a matching MSA file or be set to single-seq mode.
--msa
(str)- The path to an a3m file that matches the base name of a protein fasta set with –fasta-file. Multiple files can be included by using the flag multiple times and each should correspond to a matching protein fasta.
--dna-fasta
(str)- The path to the dna fasta file. Multiple files can be included by using the flag multiple times.
--rna-fasta
(str)- The path to the rna fasta file. Multiple files can be included by using the flag multiple times.
--single-seq
(str)- The ID of a protein you wish to run in single sequence mode. The ID is the base name of the fasta file for the protein. i.e. input.fasta would be identified as “input”.
--smiles
(str)- A smile string for a ligand. Multiple ligands can be included by using the flag multiple times.
--ligand-cd
(str)- A CCD id for a ligand. Multiple ligands can be included by using the flag multiple times.
--covalent-bonds
(str)- A comma separated list following the format ID1,residue_number,atom_name1,ID2,residue_number2,atom_name2. This will create a covalent bond between the two atoms. The ID is either the base name of the fasta file or the CCD id of the ligand.
Options
--gpu-type
- The type of GPU to use. Default is “t4” but boltz1 frequently requires very high GPU memory, especially when large non-protein non-nucleic acid ligands are invovled. If you see a file titled OUT_OF_MEMORY_TRY_BIGGER_GPU in the output that means your GPU ran out of memory and you should try and A100 GPU. If you were already using an A100 and still ran out of memory please contact support for assistance.
Outputs
The output is a tarball with the file found at “output/boltz_results_output/predictions/output/output_model_0.pdb” being the predicted structure. The other files mostly represent intermediates and logs which are useful for debugging.