Just finished studying this great paper by Google. This paper is about the now famous Chinchilla model. A fairly small model at 70 billion parameters, but trained on 1.4 trillion training tokens. It beats a number of larger models (300 to 500 billion parameters) on various benchmarks.
A number of current large language models appear either oversized or undertrained.
The paper seems to confirm that bigger is not always better.
No comments:
Post a Comment