Common Sense: ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

Tuesday, February 18, 2020

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

Bigger is better! Microsoft raises the bar by creating a state of the art largest language model! I am sure Google or OpenAI or others will pick up the challenge very soon!

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters - Microsoft Research: The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to …

Here is the respective ArXiv preprint paper:
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

Tuesday, February 18, 2020

ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

No comments: