Thursday, December 19, 2024

It takes two to tango: What a protein’s “dance” reveals about human health

Very recommendable!

Excerpt: "“I eat and breathe proteins,” she jokes."

"... we developed an algorithm to predict a protein’s different conformations. We have shown over and over that these conformational substates are essential for biological function. That means that learning how proteins “dance” is key to understanding the difference between health and disease. ...

why certain protein language models work. The models we investigated are fed a single protein sequence and then predict what the corresponding 3-D structure looks like. Yet, how they arrive at these conclusions was a “black box” of sorts.

In this paper, we were determined to figure out how these models learn and predict, if we are to use them reliably as a field.  ... The third one—which turned out to be true—is that it learned to find paired interacting protein fragments, including whole segments.  

Big picture, we shed light into the question of how these protein language models learn. On the more technical side, we determined how long the protein segments must be for the model to identify the correct 3-D structure. Machine learning language models, including the Nobel Prize-winning AlphaFold, are so hot right now because solving protein structures with a single sequence has been a monumental breakthrough. ...

Protein language models like AlphaFold are limited because they only come up with one structure. The “signal” for the other protein structure—the one it “dances” between—gets diluted. ...

using the protein KaiB as a benchmark. KaiB is essential for regulating circadian rhythm in certain bacteria ... KaiB only has two protein conformations, and if you put it in a test tube with its partner KaiC, they create a 24-hour oscillation—the underlying mechanism we revealed in another Nature paper in early 2023. In this Nature paper, however, we had only predicted KaiB’s two end states—not the actual dynamics, or pathway, of how it went from one conformation to the other. 

In our new PNAS study, we looked at exactly how KaiB “travels over the mountain,” so to speak. In other words, how it climbs over the free energy landscape. ...

First of all, we figured out that KaiB’s conversion to its alternate state takes hours. ... The speed of any protein conformational transition is tuned to the biological function needed. The KaiB protein controls the organism’s 24-hour clock, meaning it needs to move super, super slow. In our paper, we saw what evolution had to do to slow this process down.

We studied these changes using nuclear magnetic resonance (NMR)—an amazing method where you can measure protein dynamics in solution at atomic resolution—and we found that KaiB’s conversion takes three hours to be completed. And then we determined the atomistic pathway, which was very complicated. At the high level, we figured out how evolution tunes protein kinetics to align with its overall function. ..."

From the significance and abstract:
"Significance
Protein language models (pLMs) have exhibited remarkable capabilities in protein structure prediction and design. However, the extent to which they comprehend the intrinsic biophysics of protein structures remains uncertain. We present a suite of analyses that dissect how the flagship pLM ESM-2 predicts structure. Motivated by a consistent error of protein isoforms predicted as structured fragments, we developed a completely unsupervised method to uniformly evaluate any pLM, allowing us to compare coevolutionary statistics to linear models. We further identified that ESM-2 does not require full context for predicting interresidue contacts. Our study highlights the current limitations of pLMs and contributes to a deeper understanding of their underlying mechanisms, paving the way for more reliable protein structure predictions.
Abstract
Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a “categorical Jacobian” calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models. We further investigated how ESM-2 “stores” information needed to predict contacts by comparing sequence masking strategies, and found that providing local windows of sequence information allowed ESM-2 to best recover predicted contacts. This suggests that pLMs predict contacts by storing motifs of pairwise contacts. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models."

It takes two to tango: What a protein’s “dance” reveals about human health - Scripps Research Magazine



A graphic depicting how KaiB converts to its alternate states and “climbs” over the free energy landscape.


Fig. 1 Three hypotheses of how language models predict protein structures.



Fig. 2 Deep learning structure-based methods predict isoforms as fragments of full-length structures with exposed aggregation-prone residues. 


No comments: