I have not read this study, but I have some hunches! None of the authors is familiar to me. They hail from the University of Aberdeen (Scotland) and U of Tübingen (Germany). I might be very wrong, but both universities do not seem to be terribly known for main and relevant research related to machine learning. The senior author, i.e. Anson Ho, has a total lifetime citation count of 25.
- There is probably some simplistic trend extrapolation involved
- The authors invoke a distinction between "high-quality language data" and other data. Well, such a distinction is usually riddled with ambiguity!
- Who said that ever larger models need progressively larger datasets as well? Perhaps, better future algorithms make the need for larger datasets and larger models less relevant
- Human ingenuity can handle this not least with synthetic data that can be produced in any amount and quality
Credits: The Batch by Andrew Ng
No comments:
Post a Comment