Common Sense: New AI model predicts which genetic mutations truly drive disease

Monday, September 01, 2025

New AI model predicts which genetic mutations truly drive disease

Good news!

"When genetic testing reveals a rare DNA mutation, doctors and patients are frequently left in the dark about what it actually means. Now, researchers ... have developed a powerful new way to determine whether a patient with a mutation is likely to actually develop disease, a concept known in genetics as penetrance. ...

Using more than 1 million electronic health records, the researchers built AI models for 10 common diseases. They then applied these models to people known to have rare genetic variants, generating a score between 0 and 1 that reflects the likelihood of developing the disease.

A higher score, closer to 1, suggests a variant may be more likely to contribute to disease, while a lower score indicates minimal or no risk. The team calculated “ML penetrance” scores for more than 1,600 genetic variants. ..."

From the editor's summary and abstract:

"Editor’s summary

When patients are tested for genetic predisposition to clinical conditions, they sometimes are found to have rare gene variants of unknown significance, leaving clinicians to guess at the implications. In addition, mutations may not directly cause the clinical condition but could increase the risk of developing it. To help interpret such scenarios, Forrest et al. used machine learning to evaluate several large cohorts of patients (see the Perspective by Raiken and Stein). The authors used clinical data to calculate disease scores for 10 conditions and then linked those scores with genetic data, including rare mutations with previously unknown roles. In addition to helping to interpret the contributions of rare mutations for the diseases analyzed here, this study provides a model for examining the contribution of unknown genetic variants to other diseases. ...

Structured Abstract

INTRODUCTION

Accurately estimating the penetrance of genetic variants—the probability that an individual with a variant develops disease—is essential for risk assessment and clinical decision-making. Traditional approaches rely on disease-enriched families or cohorts, which are limited by small sample sizes and ascertainment bias. Moreover, case-versus-control classifications oversimplify diseases that exist on a spectrum, further reducing the accuracy of risk estimates.

Machine learning (ML) offers a scalable solution by integrating large-scale electronic health record (EHR) and genetic data to assess penetrance in a data-driven, quantitative, and precise manner.

RATIONALE

Sequencing advances have facilitated the discovery of rare variants in disease-associated genes, many of which are submitted to repositories such as ClinVar and classified based on predicted pathogenicity. However, reliance on laboratory and expert review as well as the absence of large variant datasets measuring real-world disease risk can lead to discordant variant classifications.

Emerging population-based analyses have revealed that some variants previously classified as pathogenic (P) exhibit low or variable penetrance, whereas variants of uncertain significance (VUS) remain challenging to interpret clinically. To address these challenges, we developed an ML-based approach to estimate penetrance by leveraging routine clinical laboratory tests, which are widely available in health systems, and intersecting them with genetic data.

RESULTS

We constructed ML models for 10 genetic conditions—arrhythmogenic right ventricular cardiomyopathy, familial breast cancer, familial hypercholesterolemia (FH), hypertrophic cardiomyopathy (HCM), adult hypophosphatasia, long QT syndrome, Lynch syndrome, monogenic diabetes, polycystic kidney disease (PKD), and von Willebrand disease—using 1,347,298 participants with EHR data and applied them to an independent exome-sequenced cohort. Using disease probability scores from these models, we computed ML penetrance for 1648 rare variants across 31 autosomal dominant disease-predisposition genes, spanning P, benign (B), VUS, and previously unknown loss-of-function (LoF) variants. ML penetrance was highest for P and LoF variants, followed by VUS, and lowest for B variants, providing refined quantitative estimates compared with traditional case-versus-control methods.

Notably, ML penetrance correlated with disease-relevant clinical outcomes, such as risk of end-stage renal disease for PKD variants and heart failure for HCM variants. ML penetrance also aligned with experimentally derived measures of variant function, reinforcing its biological relevance.

Importantly, ML penetrance aided in the evaluation of VUS and previously unknown LoF variants by delineating clinical trajectories—individuals with highly penetrant variants showed perturbed vital signs, electrocardiogram measures, and disease biomarkers over time.

For example, individuals with highly ML penetrant FH variants exhibited 119 mg/dl higher low-density lipoprotein cholesterol and those with highly ML penetrant PKD variants had a 40 ml/min lower glomerular filtration rate.

CONCLUSION

This study presents an ML-based blueprint to systematically evaluate penetrance at scale, integrating genomic and clinical phenotype data. By providing refined, individualized disease risk estimates, ML penetrance has the potential to improve variant assessment, guide clinical decision-making, and enhance precision medicine approaches."

New AI model predicts which genetic mutations truly drive disease | ScienceDaily

Mount Sinai Researchers Use AI and Lab Tests to Predict Genetic Disease Risk (original news release) "AI model uses common data to gauge risk from rare genetic variants"

Machine learning–based penetrance of genetic variants (no public access)

ML-based penetrance estimation of genetic variants.