Recommendable! This is a gigantic model! It can handle 101 languages!
The two key improvements are:
"... In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model -- with outrageous numbers of parameters -- but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability -- we address these with the Switch Transformer. We simplify the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs. ..."
Of course, the must feel good and silly author of this article could not resist to spare/indoctrinate the reader with the ideologies of the day:
"... Unfortunately, the researchers’ work didn’t take into account the impact of these large language models in the real world. Models often amplify the biases encoded in this public data; a portion of the training data is not uncommonly sourced from communities with pervasive gender, race, and religious prejudices. [totally exaggerated fears] ... This bias could be leveraged by malicious actors to foment discord by spreading misinformation, disinformation, and outright lies that “radicalize individuals into violent far-right extremist [what about leftist extremists?] ideologies and behaviors,” according to the Middlebury Institute of International Studies. ..."
Here is the link to the respective research paper:
No comments:
Post a Comment