Saturday, January 20, 2024

Google: Solving olympiad geometry without human demonstrations

This appears to be an impressive paper! Mathematical proofs via AI! Exciting!

This paper also presents a great combination of models (neuro-symbolic & large language model) and large amounts of generated synthetic data in lieu of real data.

We have come a long way:
"... The last words attributed to Archimedes [287-212 BC) are "Do not disturb my circles" ... a reference to the mathematical drawing that he was supposedly studying when disturbed by the Roman soldier [who would then go on to kill Archimedes]. ..." (Source)

"... AlphaGeometry solving a simple problem: Given the problem diagram and its theorem premises ..., AlphaGeometry ... first uses its symbolic engine to deduce new statements about the diagram until the solution is found or new statements are exhausted. If no solution is found, AlphaGeometry’s language model adds one potentially useful construct (blue), opening new paths of deduction for the symbolic engine. This loop continues until a solution is found ... In this example, just one construct is required. ...
Generating 100 million synthetic data examples
... Our synthetic data generation approach emulates this knowledge-building process at scale, allowing us to train AlphaGeometry from scratch, without any human demonstrations.
Using highly parallelized computing, the system started by generating one billion random diagrams of geometric objects and exhaustively derived all the relationships between the points and lines in each diagram. AlphaGeometry found all the proofs contained in each diagram, then worked backwards to find out what additional constructs, if any, were needed to arrive at those proofs. We call this process “symbolic deduction and traceback”.
That huge data pool was filtered to exclude similar examples, resulting in a final training dataset of 100 million unique examples of varying difficulty, of which nine million featured added constructs. ..."

From the abstract:
"Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004."

AlphaGeometry: An Olympiad-level AI system for geometry Our AI system surpasses the state-of-the-art approach for geometry problems, advancing AI reasoning in mathematics

Solving olympiad geometry without human demonstrations | Nature (open access)

Fig. 1: Overview of our neuro-symbolic AlphaGeometry and how it solves both a simple problem and the IMO 2015 Problem 3.


No comments: