Large models are ingesting gigantic amounts of data for training. Thus, quality issues affecting the data etc. have been a serious concern for at least the past 10 years. For example, the expression stochastic parrot captures some of the dilemmas.
Caveat: I did not have time to read the paper.
"... This strongly indicates that, for many LLMs, there exists task contamination on zero-shot and few-shot evaluation for datasets released prior to the LLMs' training data creation date. Additionally, we utilize training data inspection, task example extraction, and a membership inference attack, which reveal further evidence of task contamination. ..."
Credits: Last Week in AI
No comments:
Post a Comment