The race is on to develop AI models with over 100 trillion parameters!
"In a new paper, researchers from Tsinghua University, Alibaba Group, Zhejiang Lab and Beijing Academy of Artificial Intelligence present BaGuaLu, a framework that enables the training of large AI models using the Mixture-of-Experts (MoE) architecture.
Like OpenAI’s GPT-3, it relies on Transformer models, but in AI training it forms individual expert networks that take on specific queries while conserving the resources of the rest of the network. The huge MoE models only ever activate the part of the network that is currently needed, rather than the entire network, as many other AI architectures do.
In an initial test, the researchers trained a 1.93 trillion model with their framework, outperforming Google’s Switch Transformer. They also demonstrate that their framework enables models with 14.5 trillion and a full 174 trillion parameters. ..."
From the abstract:
"... As the size of pretrained AI models grows dramatically each year in an effort to achieve higher accuracy, training such models requires massive computing and memory capabilities, which accelerates the convergence of AI and HPC [high performance computing]. However, there are still gaps in deploying AI applications on HPC systems, which need application and system co-design based on specific hardware features.
To this end, this paper proposes BaGuaLu1, the first work targeting training brain scale models on an entire exascale supercomputer, the New Generation Sunway Supercomputer. By combining hardware-specific intra-node optimization and hybrid parallel strategies, BaGuaLu enables decent performance and scalability on unprecedentedly large models. The evaluation shows that BaGuaLu can train 14.5-trillion-parameter models with a performance of over 1 EFLOPS using mixed-precision and has the capability to train 174-trillion-parameter models, which rivals the number of synapses in a human brain."
No comments:
Post a Comment