Sunday, June 23, 2024

Comments on: AI-Assisted Generation of Difficult Math Questions

Food for thought! Would it not be great to use machine learning & AI to develop probing questions about difficult problems as well as solve difficult problems. A different approach for each opposite direction.

Sort of to burn a candle (a very long one) from both ends becomes a new paradigm in AI? 😊 Or like digging a tunnel through a mountain from both sides.

Caveat: I have not read the paper yet.

From the abstract:
"Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging mathematics questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. Initially, leveraging LLM metacognition skills [Didolkar et al., 2024], a strong LLM is used to extract core "skills" from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills that must be utilized in the question. This ``out of distribution'' task is challenging for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multi-turn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced through further LLM interactions. Applying this pipeline on skills extracted from MATH dataset [Hendrycks et al., 2021] resulted in a dataset of complex math questions, while improving expert productivity. Despite using skills from the MATH dataset, our approach of combining random skill pairs in questions resulted in noticeably higher quality, as evidenced by:
(a) Lower performance of all models on our questions than on MATH (with open models being the most affected).
(b) Higher performance on MATH when using our questions as in-context examples.
Although focused on mathematics, our methodology seems applicable to other domains requiring structured reasoning. It can be seen as a method for {\em scalable oversight,} where human experts evaluate highly capable AI models by also using AI-assistance."

AI-Assisted Generation of Difficult Math Questions | OpenReview (open access; among the authors are Yoshua Bengio and Sanjeev Arora)




No comments: