Thursday, February 06, 2025

Scaling the Tülu 3 post-training recipes to surpass the performance of DeepSeek V3

Competition is good, more competition is better! Except that this latest model by AI2 is much larger (10x) than the DeepSeek model.

"Following the success of our Tülu 3 release in November, we are thrilled to announce the launch of Tülu 3 405B—The first application of fully open post-training recipes to the largest open-weight models. With this release, we demonstrate the scalability and effectiveness of our post-training recipe applied at 405B parameter scale.

As outlined below, Tülu 3 405B achieves competitive or superior performance to both Deepseek v3 and GPT-4o, while surpassing prior open-weight post-trained models of the same size including Llama 3.1 405B Instruct and Nous Hermes 3 405B on many standard benchmarks. Interestingly, we found that our Reinforcement Learning from Verifiable Rewards (RLVR) framework improved the MATH performance more significantly at a larger scale, i.e., 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report. Overall, our results show a consistent edge over DeepSeek V3, especially with the inclusion of safety benchmarks. ..."

Scaling the Tülu 3 post-training recipes to surpass the performance of DeepSeek V3 | Ai2

No comments: