Could be self defeating! What if AI learns that this curated and/or synthetic dada was created by AI and exploits it?
Caveat: I did not read the entire, long blog post by Meta.
"We introduce Autodata, a method that enables AI agents to act as data scientists who iteratively build high quality training and evaluation data. We show how to train (meta-optimize) such a data scientist agent, so that it can create even stronger data.
Our initial study with a specific practical implementation, Agentic Self-Instruct, shows strong gains on scientific reasoning problems compared to classical synthetic dataset creation methods. Further, meta-optimizing the data scientist agent itself delivers an even larger performance uplift.
Agentic data creation provides a way to convert increased inference compute into higher quality model training. ..."
Figure: Autodata pipeline. The framework employs an autonomous agent that emulates the role of a data scientist, iteratively generating data, conducting qualitative inspection and quantitative performance evaluation, synthesizing insights, and updating the data-generation recipe. The agent itself can be trained to be better at the data scientist task using the same criteria used in the inner loop. This cyclical process aims to progressively enhance data quality; the diagram depicts the general workflow underlying possible instantiations.
No comments:
Post a Comment