This could be an interesting, but narrowly focused, new paper by Tomaso Poggio and his team!
Caveat: I have not read the paper (40 pages total) yet.
From the abstract:
"Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found.
Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability.
We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold.
Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes:
at small batch sizes it converges to a lower plateau , reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau , where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics.
We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling."
No comments:
Post a Comment