Tuesday, May 20, 2025

Meta releases huge chemistry research data set and model

Good news! This could be a huge boost for chemistry research!

"Meta released a new data set called Open Molecules 2025 (OMol25), created through 6 billion compute hours and 100 million quantum mechanical calculations. The company also introduced UMA (Universal Frontier model for Atoms), an AI model that performs molecular calculations 10,000 times faster than traditional methods. Meta developed these tools with Lawrence Berkeley National Laboratory, Princeton University, Genentech, Stanford, and other research institutions. The data covers four areas: small molecules, biomolecules, metal complexes, and electrolytes, with potential applications in drug development and battery technology. ..."

From the abstract:
"Machine learning (ML) models hold the promise of transforming atomic simulations by delivering quantum chemical accuracy at a fraction of the computational cost. Realization of this potential would enable high-throughout, high-accuracy molecular screening campaigns to explore vast regions of chemical space and facilitate ab initio simulations at sizes and time scales that were previously inaccessible. However, a fundamental challenge to creating ML models that perform well across molecular chemistry is the lack of comprehensive data for training.
Despite substantial efforts in data generation, no large-scale molecular dataset exists that combines broad chemical diversity with a high level of accuracy.
To address this gap, Meta FAIR introduces Open Molecules 2025 (OMol25), a large-scale dataset composed of more than 100 million density functional theory (DFT) calculations at the B97M-V/def2-TZVPD level of theory, representing billions of CPU core-hours of compute.
OMol25 uniquely blends elemental, chemical, and structural diversity including: 83 elements, a wide-range of intra- and intermolecular interactions, explicit solvation, variable charge/spin, conformers, and reactive structures.
There are ~83M unique molecular systems in OMol25 covering small molecules, biomolecules, metal complexes, and electrolytes, including structures obtained from existing datasets. OMol25 also greatly expands on the size of systems typically included in DFT datasets, with systems of up to 350 atoms.
In addition to the public release of the data, we provide baseline models and a comprehensive set of model evaluations to encourage community engagement in developing the next-generation ML models for molecular chemistry."

How Mayo Clinic radiologists have embraced artificial intelligence


Computational Chemistry Unlocked: A Record-Breaking Dataset to Train AI Models has Launched "Accurate simulations of complex chemistry are finally within reach"



A visual overview of OMol25, including chemical scope, sampling strategies used to construct structures, chemical phenomena we seek to capture, properties available for each datapoint, and envisioned application areas. 



No comments: