Common Sense: Hypernetwork to predict Parameters for Unseen Deep Architectures

Sunday, March 20, 2022

Hypernetwork to predict Parameters for Unseen Deep Architectures

This seems to be an interesting approach! I have not yet had time to read this paper. This is about e.g. better weight initialization for deep neural networks.

"[Researchers] at Facebook developed Graph Hyper Network (GHN-2), a graph neural network that computed weights that enabled arbitrary neural network architectures to perform image recognition tasks. (A neural network that finds weights for another neural network is known as a hypernetwork.) GHN-2 improves on a similar hypernetwork, GHN-1, proposed by a different team. ..."

From the abstract:

"... By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. ..."

Andrew Ng's The Batch

[2110.13100] Parameter Prediction for Unseen Deep Architectures

Sunday, March 20, 2022

Hypernetwork to predict Parameters for Unseen Deep Architectures

No comments: