Thursday, November 03, 2022

AlphaFold’s new rival? Meta AI predicts shape of 600 million proteins

Good news! Competition is good! Very impressive work! A game changer!

Go to the website of the ESM Metagenomic Atlas and explore to your hearts delight the all the proteins of the heart and much more! 😊 (Unfortunately, this atlas does not yet seem to have a search function that would allow you to view only the proteins of the heart)

"... Meta’s network, called ESMFold, isn’t quite as accurate as AlphaFold, Rives’ team reported earlier this year2, but it is about 60 times faster at predicting structures for short sequences, he says. “What this means is that we can scale structure prediction to much larger databases.” ...
As a test, the researchers unleashed their model on a database of bulk-sequenced ‘metagenomic’ DNA from environmental sources such as soil, seawater and the human gut and skin. The vast majority of the entries — which encode potential proteins — come from single-cell organisms that have never been isolated or cultured and are unknown to science. ...
Of the 617 million predictions, the model deemed more than one-third to be high quality, such that researchers can have confidence that the overall protein shape is correct and, in some cases, can discern atomic-level details. Millions of these structures are entirely unlike anything in the databases of protein structures determined experimentally, or any of AlphaFold’s predictions from known organisms. ..."

From the abstract:
"Artificial intelligence has the potential to open insight into the structure of proteins at the scale of evolution. It has only recently been possible to extend protein structure prediction to two hundred million cataloged proteins. Characterizing the structures of the exponentially growing billions of protein sequences revealed by large scale gene sequencing experiments would necessitate a breakthrough in the speed of folding. Here we show that direct inference of structure from primary sequence using a large language model enables an order of magnitude speed-up in high resolution structure prediction. Leveraging the insight that language models learn evolutionary patterns across millions of sequences, we train models up to 15B parameters, the largest language model of proteins to date. As the language models are scaled they learn information that enables prediction of the three-dimensional structure of a protein at the resolution of individual atoms. This results in prediction that is up to 60x faster than state-of-the-art while maintaining resolution and accuracy. Building on this, we present the ESM Metagenomic Atlas. This is the first large-scale structural characterization of metagenomic proteins, with more than 617 million structures. The atlas reveals more than 225 million high confidence predictions, including millions whose structures are novel in comparison with experimentally determined structures, giving an unprecedented view into the vast breadth and diversity of the structures of some of the least understood proteins on earth."

AlphaFold’s new rival? Meta AI predicts shape of 600 million proteins Microbial molecules from soil, seawater and human bodies are among the planet’s least understood.


No comments: