Common Sense: DeepSeek AI Releases DeepSeek-V4

Saturday, April 25, 2026

Recommendable! Seems the latest DeepSeek model comes with several interesting innovations!

"Key Takeaways

Hybrid CSA and HCA attention cuts KV cache to 10% of DeepSeek-V3.2 at 1M tokens.
Manifold-Constrained Hyper-Connections (mHC) replace residual connections for more stable deep layer training.
The Muon optimizer replaces AdamW for most parameters, delivering faster convergence and training stability.
Post-training uses On-Policy Distillation from 10+ domain experts instead of traditional mixed RL.
DeepSeek-V4-Flash-Base outperforms DeepSeek-V3.2-Base despite having 3x fewer activated parameters.

..."