Friday, June 23, 2023

The people paid to train AI are outsourcing their work… to AI

When recursion becomes a serious conundrum! There is a distinct risk that the future training of large AI models includes data generated by AI.

Garbage in, garbage out becomes a whole new meaning! 

Just blogged here about the curse of recursion!

"A significant proportion of people paid to train AI models may be themselves outsourcing that work to AI, a new study has found. "

From the abstract:
"Large language models (LLMs) are remarkable data annotators. They can be used to generate high-fidelity supervised training data, as well as survey and experimental data. With the widespread adoption of LLMs, human gold--standard annotations are key to understanding the capabilities of LLMs and the validity of their results. However, crowdsourcing, an important, inexpensive way to obtain human annotations, may itself be impacted by LLMs, as crowd workers have financial incentives to use LLMs to increase their productivity and income. To investigate this concern, we conducted a case study on the prevalence of LLM usage by crowd workers. We reran an abstract summarization task from the literature on Amazon Mechanical Turk and, through a combination of keystroke detection and synthetic text classification, estimate that 33-46% of crowd workers used LLMs when completing the task. Although generalization to other, less LLM-friendly tasks is unclear, our results call for platforms, researchers, and crowd workers to find new ways to ensure that human data remain human, perhaps using the methodology proposed here as a stepping stone. ..."

The people paid to train AI are outsourcing their work… to AI | MIT Technology Review It’s a practice that could introduce further errors into already error-prone models.

No comments: