Saturday, April 25, 2026

DeepSeek AI Releases DeepSeek-V4

Recommendable! Seems the latest DeepSeek model comes with several  interesting innovations!

"Key Takeaways
  • Hybrid CSA and HCA attention cuts KV cache to 10% of DeepSeek-V3.2 at 1M tokens.
  • Manifold-Constrained Hyper-Connections (mHC) replace residual connections for more stable deep layer training.
  • The Muon optimizer replaces AdamW for most parameters, delivering faster convergence and training stability.
  • Post-training uses On-Policy Distillation from 10+ domain experts instead of traditional mixed RL.
  • DeepSeek-V4-Flash-Base outperforms DeepSeek-V3.2-Base despite having 3x fewer activated parameters.
..."

No comments: