Are we making progress in the theoretical analysis of deep learning?
From the significance and abstract:
"Significance
The practice of deep learning has long been shrouded in mystery, leading many to believe that the inner workings of these black-box models are chaotic during training. In this paper, we challenge this belief by presenting a simple and approximate law that deep neural networks follow when processing data in the intermediate layers. This empirical law is observed in a class of modern network architectures for vision tasks, and its emergence is shown to bring important benefits for the trained models. The significance of this law is that it allows for a perspective that provides useful insights into the practice of deep learning.
Abstract
While deep learning has enabled significant advances in many areas of science, its black-box nature hinders architecture design for future artificial intelligence applications and interpretation for high-stakes decision-makings. We addressed this issue by studying the fundamental question of how deep neural networks process data in the intermediate layers. Our finding is a simple and quantitative law that governs how deep neural networks separate data according to class membership throughout all layers for classification. This law shows that each layer improves data separation at a constant geometric rate, and its emergence is observed in a collection of network architectures and datasets during training. This law offers practical guidelines for designing architectures, improving model robustness and out-of-sample performance, as well as interpreting the predictions."
No comments:
Post a Comment