Sunday, September 18, 2022

Biologists glean insight into repetitive protein sequences

Amazing stuff!

"About 70 percent of all human proteins include at least one sequence consisting of a single amino acid repeated many times, with a few other amino acids sprinkled in. These “low-complexity regions” are also found in most other organisms. ...
Using their technique, the researchers have analyzed all of the proteins found in eight different species, from bacteria to humans. They found that while LCRs can vary between proteins and species, they often share a similar role — helping the protein in which they’re found to join a larger-scale assembly such as the nucleolus, an organelle found in nearly all human cells. ...
The researchers also found some differences between LCRs of different species and showed that these species-specific LCR sequences correspond to species-specific functions, such as forming plant cell walls. ...
To do that, the researchers used a technique called dotplot matrix, which is a way to visually represent amino acid sequences, to generate images of each protein under study. They then used computational image processing methods to compare thousands of these matrices at the same time. ...
In a comparison of the proteins found in eight different species, the researchers found that some LCR types are highly conserved between species, meaning that the sequences have changed very little over evolutionary timescales. These sequences tend to be found in proteins and cell structures that are also highly conserved, such as the nucleolus. ...
The researchers also found many similarities between LCRs that are involved in forming larger-scale assemblies such as the extracellular matrix, a network of molecules that provides structural support to cells in plants and animals. ..."

From the abstract:
"Low complexity regions (LCRs) play a role in a variety of important biological processes ... Here, we use dotplots and dimensionality reduction to systematically define LCR type/copy relationships and create a map of LCR sequence space capable of integrating LCR features and functions. By defining LCR relationships across the proteome, we provide insight into how LCR type and copy number contribute to higher order assemblies ... With LCR maps, we reveal the underlying structure of LCR sequence space, and relate differential occupancy in this space to the conservation and emergence of higher order assemblies, including the metazoan extracellular matrix and plant cell wall. Together, LCR relationships and maps uncover and identify scaffold-client relationships among E-rich LCR-containing proteins in the nucleolus, and revealed previously undescribed regions of LCR sequence space with signatures of higher order assemblies, including a teleost-specific T/H-rich sequence space. Thus, this unified view of LCRs enables discovery of how LCRs encode higher order assemblies of organisms."

Biologists glean insight into repetitive protein sequences | MIT News | Massachusetts Institute of Technology A computational analysis reveals that many repetitive sequences are shared across proteins and are similar in species from bacteria to humans.


A systematic dotplot approach to reveal the relationships between low complexity regions (LCRs) in proteins.


No comments: