Post-Translational Modification Prediction API

The post-translational modification (PTM) prediction API paces industry standards for in silico liability predictions for therapeutic development. Maintaining therapeutic protein stability and potency during development continues to be a costly and significant challenge due to degradation by PTMs. Early screening for liabilities is critical in reducing development costs and enabling downstream success. The ptm-prediction API provides both sequence and structure based predictions.

ptm-prediction implements state of the art machine learning models and standard industry methods for predicting labile residues for post-translational modifications (PTMs) of concern, including:

  • Asparagine Deamidation
  • Aspartic Acid Isomerization
  • Methionine Oxidation
  • Hyper-reactive Cysteines
  • N-Linked Glycosylation
  • Lysine Glycation
  • Pyroglutamylation
  • N-Term Cyclization
  • C-Term Lysine Processing

These predictions are formed by a combination of structural and sequential features extracted from provided inputs. Currently, ptm-prediction is in BETA - future versions aim to further optimize models and improve reporting methods.

NOTE: While the reported findings are trained and informed by experimental data and observations found in literature, recommendations provided should supplement and not replace domain expertise and insights of the user.

For more sophisticated N-linked glycosylation prediction, use our Glycosylation Prediction API

Examples

Flag known PTM motifs for a given sequence

lev engine submit ptm-prediction --fasta-file input.fasta

Predict liabilities for a given structure and return results with residue number offset by 12

lev engine submit ptm-prediction --pdb-file input.pdb --offset 12

Predict liabilities for more than one input pdb

lev engine submit ptm-prediction --pdb-file input1.pdb --pdbfile input2.pdb

Inputs

Sequence PTM Prediction

  • --fasta-file (str)
    • FASTA file containing sequence(s) of interest

Structural PTM Prediction

  • --pdb-file (str)
    • Input PDB file for predictions
    • Ideally the PDB is cleaned using the Clean PDB API and doesn’t not have regions of missing density as this would impact the quality of results

Options

  • --offset (int32)
    • For Structural PTM prediction only
    • Adjusts output residue numbering to original numbering scheme by offset provided
    • The API will automatically convert a given input PDB to sequential numbering (first residue starts at 1) internally to extract necessary features for predictions.

Outputs

  • ptm_report.csv
    • CSV file containing residue numbers and corresponding predicted PTM
    • Residue numbers will be offset by value provided if --offset is used
  • ptm_report.pml
    • Only for jobs run with PDB file(s)
    • PyMOL script generated to highlight liable residues predicted to have PTMs
    • Output will be prefixed with structure ID if given more than one PDB input.
    • See Notes for more details and how to use this script to visualize your results
  • ptm_report.md
    • Formalized report on PTMs of concern with associated mitgation strategies
    • Formatted as a markdown file
    • A separate report will be generated for every sequence/structure provided
    • Reports for multiple inputs will be suffixed with corresponding sequence/structure ID

Notes

Visualizing Liable Residues in PyMOL

Requires updated and valid PyMOL license to run

To visualize the predicted PTM residues, run the following command:

pymol structure_ptm.pml

  • Ensure that line 1 in the script correctly sets the path to the input PDB file so that it may be correctly loaded into the PyMOL session.
  • Residues will be color coded and shown as sticks in separate scenes as follows:
  • Upon starting the session, the current view of the structure would be of all predictions (scene = all_predictions)
  • Clicking on a specific PTM scene would change views to only show residues of that specific PTM

Asn Deamidation and Asp Isomerization

  • Asparagine deamidation occurs in three potential pathways:
    • Nucleophilic attack by backbone carbonyl group
    • Nucleophilic attack on backbone nitrogen of N+1 residue
    • Direct hydrolysis
  • Canonical motifs for deamidation are NG, NS, NN and NH in order of high to low deamidation rate.
  • Aspartic acid isomerization occurs through a related pathway to Asn deamidation, but typically occurs at higher rates at low pH
  • Canonical motifs for isomerization are DG, DS, DD, DT, and DH in order of high to low isomerization rate
  • Both deamidation and isomerization share the same structural attributes for prediction include N+1 residue, solvent accessibility, dihedral angles and nucleophilic attack distance

Detection of Asp isomerization by mass spectrometry is challenging due to the same molecular mass of IsoAsp compared to Asp resulting in limited available experimental data. Therefore, the API for isomerization prediction is currently limited to sequence based flagging.

Updated: