RosettaCon: A Glimpse into the Future

7 minute read

Observations and thoughts from Summer RosettaCon 2024.

A photo of the river below the Suncadia Resort, with some driftwood in the foreground and the mountains in the background

Summer RosettaCon is the major annual gathering for the Rosetta community, alongside the winter session in Boston, the European edition, and the newly introduced South American conference. This year’s event brought together over 300 participants, featuring a dynamic mix of PIs, postdocs, graduate students, and even a few undergraduates. Industry representation was notable, with about 30 professionals, mainly people who had graduated or done postdocs in Rosetta Labs that came to keep up to date on their skills. Some were also scouting for potential hires, eager to tap into the talent within the Rosetta community.

RosettaCon 2024 Highlights, Part One of Three

This is the first of three articles summarizing the new innovations in protein engineering methods that we saw at Summer RosettaCon. I will post two more in the next couple of weeks.

This year, we saw over 150 posters, with most presented by graduate students and postdocs from various Rosetta Labs and a few from outside collaborators. The diversity of research was astounding, ranging from early-phase studies to more mature projects, hinting at the future directions of the field.

A significant portion of the posters—about 75%—focused on the de novo design of proteins, utilizing tools like RFDiffusion and ProteinMPNN for a wide array of targets. This prevalence underscores the validation and growing reliance on these methods. The remaining posters showcased protocols for docking, structure refinement, protein dynamics, protein language models, and some exciting small molecule work..

The biggest changes that I see among the community lies in the different flavors of ProteinMPNN and RFDiffusion. Both are being retrained or enhanced to broaden their use for different types of biomolecules or different goals. Some of these are easy to implement at Levitate because we can just change the weights for current tools, but others will require their own implementation even though they share a lot of code with our current tools.

1. The Evolution of ProteinMPNN

ProteinMPNN, initially developed at the Institute of Protein Design at the University of Washington, has become a cornerstone for generating protein sequences. Since it was published in Science in 2022, the tool has seen numerous adaptations and extensions, each tailored for specific applications.

Our Implementation of ProteinMPNN can be found here.

Here are eight flavors that we saw discussed (Levitate Bio provides access to the first two):

  • Original ProteinMPNN Function: Generates sequences for protein backbones that are stable and soluble, retaining any binding interactions if other proteins are included in the input models.
    Impact: Successfully designed hundreds of proteins that have been expressed and found to be soluble just within this community. Many of these proteins exhibit the desired structure and function, making ProteinMPNN a well-established tool in the Rosetta community and beyond.
  • ThermoMPNN Development: Enhanced by the Kuhlman Lab (ref), this variant performs single-point mutation scans to predict stabilizing or destabilizing effects.
    Usage: Widely used in workflows to stabilize proteins. ThermoMPNN-D, A new version supporting double mutations, is currently being trained in the Kuhlman Lab.
  • LigandMPNN Development: Introduced in a 2023 preprint (ref) from the Baker Lab, this variant includes additional representations for non-protein atoms.
    Function: Works with or without non-protein molecules in the model, potentially replacing the original ProteinMPNN.
    Note: This is available as a beta feature in Engine
  • SolubleMPNN Function: Designed to fully redesign membrane proteins to make them water-soluble.
    Advantage: Overcomes the original ProteinMPNN’s expectation of hydrophobic residues in regions that it predicts to be membrane.
  • MultistateMPNN Function: Generates sequences for proteins that can adopt two states.
  • HyperMPNN Development: Trained by the Meiler Lab for thermostable proteins. Not yet published.
    Function: Generates sequences for protein backbones that remain stable even when heated well above boiling.
  • TCR/AntibodyMPNN Development: Currently being developed by the Pierce Lab with masking and decoding schemes for antigen-antibody and TCR-pMHC complexes.
    Impact: Pierce Lab has regularly makes groundbreaking tools for TCR modeling so this variant is likely to be promising for designing interfaces for TCRs and may be an improvement for antibodies.
  • Mini-MPNN
    Function: An adaptation used in the Huang Lab for simultaneous AI-based structure and sequence design.
    Relation: Integral to the Protpardelle protocol which is described below (well worth reading).

These variants highlight the continuous evolution and specialization of ProteinMPNN to meet diverse needs in protein design, each offering unique capabilities and improvements over the original tool.

2. RFDiffusion: Rapid Evolution and Expanding Applications

The paper for RFDiffusion came out just a year ago in 2023. This was also created at the Institute of Protein Design at the University of Washington and has become commonly used throughout the Rosetta community. It is able to generate protein structures from scratch. Our description of it can be found here. The Rosetta community has successfully used RFDiffusion for a wide range of novel protein engineering tasks and, has extended the code for applications beyond the original scope. The original paper demonstrated the creation of monomers, binders, binders with a motif, multimers, and enzymes.

Workflow
The typical workflow with RFDiffusion involves generating a large number of structures. For simpler design challenges, researchers might generate around 1000 structures, while more complex projects can involve the creation of hundreds of thousands of structures, each with varying RFDiffusion parameters. Usually the top 5-10% are kept because the rest will lack globular packing. Sequences are then generated for these structures using ProteinMPNN, followed by structural predictions using AlphaFold2 in single-sequence mode. Only those designs predicted to refold are retained. It is common for only a handful of sequences from a large set to be expressed and tested, ultimately yielding stable proteins for further experimentation.

Broad Applications
RFDiffusion’s flexibility has enabled its use in a wide array of applications within the Rosetta community. Researchers have reported success in creating multistate proteins, dual-target binders, biosensors, multimeric nanoparticles for vaccine epitopes, metal-binding proteins, peptide binders, coiled coils, motif designs, pMHC binders, novel enzymes, motor proteins, and antibodies. These efforts are being carried out by dozens of graduate students and postdocs across multiple Rosetta Labs, promising a wealth of published research in the coming years. A handful of these examples are created using novel variants of RFDiffusion, but most are using default RFDiffusion or RFDiffusion All Atom.

All Atom RFDiffusion and Innovations
RosettaFold (RF) All Atom (AA) & RFDiffusion AA (RFDiff AA) support the inclusion of non-protein atoms during protein generation, facilitating the creation of binding proteins targeting metals, nucleotides, non-canonical residues, and small molecules. In a preprint released not long after RFDiffusion, the Baker Lab describes RF AA for structure prediction incorporating proteins, nucleic acids, small molecules, metals, and covalent modifications. They also described RFDiff AA and its application by generating designs that bind to therapeutic digoxigenin, the enzymatic cofactor heme, and optically active bilin molecules.

The tool has been further used throughout the Rosetta community and further modifications have been highlighted at this conference. For example, the Meiler Lab has modified RFDiffusion to design cyclic peptides. They used an adaptation from AlphaFold2 variation that allows C to N linkage in structure prediction so they could have RFDiffusion generate cyclic peptides that bound a target protein. This approach can be used for other chemistries involved in cyclization of peptides and will likely be widely used before long.

Partial Diffusion is a technique which was possible from the start but we are seeing it used finally for some interesting experiments. The partial diffusion process begins with an initial protein structure which is partially noised by introducing random perturbations to its coordinates. The RFDiffusion model then denoises the structure iteratively, refining it to achieve a favorable structure that looks similar to the input fold. By controlling the extent and nature of the initial noise, researchers can explore different structural variations and optimize the protein’s performance.

Part One Conclusions**

The widespread adoption of ProteinMPNN and RFDiffusion underscores their emergence as the new “standard” for designing highly stable, soluble, and functional proteins, replacing the traditional Rosetta tools that were dominant just a few years ago. The current focus within the Rosetta community is on broadening the application of these tools to tackle new and complex protein challenges while developing new variations to overcome existing limitations.

Strides are being made in areas like de novo design of peptides, RNA/DNA-binding proteins, complex protein-protein interactions (including antibody-antigen), and proteins bound to intricate small molecules, both covalent and non-covalent. While these remain challenging problems, the progress achieved so far highlights the capabilities of these tools and sets the stage for further advancements in the coming years.

Updated: