Monday, March 31, 2025

AI takes step towards cracking biology’s toughest problem – protein sequencing

Good news! Way to go!

"A new AI system, dubbed InstaNovo, could revolutionise protein sequencing just as AlphaFold transformed protein structure prediction, its developers claim.

While DNA sequencing is routine, determining protein sequences remains one of biology’s toughest challenges ... InstaNovo aims to change that by directly reading protein sequences from raw experimental data, unlocking vast areas of previously inaccessible biology. ..."

"... Another use case was conducted on small pieces of protein, called peptides, displayed on the surface of cells. These help the immune system recognize infections and diseases such as cancer. The InstaNovo models identified thousands of new peptides that were not found using traditional methods. In personalized cancer treatments empowering the immune system—immunotherapy for short—these peptides are all potential attack points. ..."

"What Are InstaNovo and InstaNovo+?
InstaNovo is a transformer-based model designed for de novo peptide sequencing. ... it translates fragment ion peaks from mass spectrometry data into peptide sequences with unprecedented precision.

Unlike traditional methods that rely on pre-existing databases, InstaNovo identifies peptides that have never been documented before—expanding the landscape of proteomic discovery.

A key innovation of the InstaNovo models is InstaNovo+, a diffusion-based iterative refinement model that enhances sequence accuracy by mimicking how researchers manually refine peptide predictions. InstaNovo+ begins with an initial sequence—either derived from InstaNovo or generated at random—and improves it, step by step.

When paired with InstaNovo, InstaNovo+ significantly reduces false discovery rates (FDR) and improves sequence accuracy, not just by refining predictions, but by exploring a broader range of potential peptide sequences.

Unlike autoregressive models such as InstaNovo and others, which predict peptide sequences one amino acid at a time, InstaNovo+ processes entire sequences holistically, enabling greater accuracy and higher detection rates. ..."

From the abstract:
"Mass spectrometry-based proteomics focuses on identifying the peptide that generates a tandem mass spectrum. Traditional methods rely on protein databases but are often limited or inapplicable in certain contexts.
De novo peptide sequencing, which assigns peptide sequences to spectra without prior information, is valuable for diverse biological applications; however, owing to a lack of accuracy, it remains challenging to apply.
Here we introduce InstaNovo, a transformer model that translates fragment ion peaks into peptide sequences. We demonstrate that InstaNovo outperforms state-of-the-art methods and showcase its utility in several applications.
We also introduce InstaNovo+, a diffusion model that improves performance through iterative refinement of predicted sequences.
Using these models, we achieve improved therapeutic sequencing coverage, discover novel peptides and detect unreported organisms in diverse datasets, thereby expanding the scope and detection rate of proteomics searches.
Our models unlock opportunities across domains such as direct protein sequencing, immunopeptidomics and exploration of the dark proteome."

AI takes step towards cracking biology’s toughest problem – protein sequencing | Research | Chemistry World (limited public access)

New AI models possible game-changers within protein science and healthcare (original news release) "Researchers have developed new AI models that can vastly improve accuracy and discovery within protein science. Potentially, the models will assist the medical sciences in overcoming present challenges within, e.g. personalised medicine, drug discovery, and diagnostics."

Enhancing Peptide Sequencing with AI (a brief summary from the website of InstaNova)



Fig. 1: InstaNovo pipeline overview.


Figure 1: illustration of how InstaNovo+ iteratively refines InstaNovo’s output.



Figure 2: illustration of how InstaNovo interprets this mass spectrum, mapping fragment ion peaks to peptide sequences. 


No comments: