Very interesting paper by Google!
Do language models as they get larger reveal "unpredictable phenomena of emergent abilities" that smaller models lack? If this observation is confirmed, it could well mean that larger models will exceed human capabilities in the foreseeable future.
However, the authors also caution that smaller "models trained on higher-quality data" could achieve similar performance. In particular, "deduplication" of data is mentioned, which is odd, because why would you have trained such models over the last two decades or so with duplicate data in the first place. Was deduplication of data neglected in the past? That would be really strange and awkward?
The other question, of course, is whether not yet to be discovered different learning methods allow smaller models to perform even better. In terms of model parameters, some of these current large models have several times as many parameters than the human brain has neurons (> 86 billion).
" ... This qualitative change is also known as a phase transition—a dramatic change in overall behavior that would not have been foreseen by examining smaller-scale systems ..."
No comments:
Post a Comment