Amazing stuff!
"With the emergence of deep self-supervised generative models that learn to “disentangle” high-dimensional sensory data into meaningful variables of variation, recent breakthroughs in machine learning have provided an implementational blueprint for this theory. The beta-variational autoencoder (β-VAE) is one such model that learns to faithfully reconstruct sensory data from a low-dimensional embedding while also being regularised in a way that encourages individual network units to code for semantically meaningful variables like object color, face gender, and scene arrangement. ...
The findings support previous evidence that the monkey IT’s face identification code is low-dimensional, with single neurons encoding independent axes of variance. Unlike earlier research, however, our findings show that such a code can be meaningfully interpreted at the level of a single neuron. The study shows that single IT neurons’ axes of variation align with single “disentangled” latent units that appear to be semantically meaningful and are discovered by the β-VAE ..."
The findings support previous evidence that the monkey IT’s face identification code is low-dimensional, with single neurons encoding independent axes of variance. Unlike earlier research, however, our findings show that such a code can be meaningfully interpreted at the level of a single neuron. The study shows that single IT neurons’ axes of variation align with single “disentangled” latent units that appear to be semantically meaningful and are discovered by the β-VAE ..."
From the abstract:
"... we model neural responses to faces in the macaque inferotemporal (IT) cortex with a deep self-supervised generative model, β-VAE, which disentangles sensory data into interpretable latent factors, such as gender or age. Our results demonstrate a strong correspondence between the generative factors discovered by β-VAE and those coded by single IT neurons, beyond that found for the baselines, including the handcrafted state-of-the-art model of face perception, the Active Appearance Model, and deep classifiers. Moreover, β-VAE is able to reconstruct novel face images using signals from just a handful of cells. Together our results imply that optimising the disentangling objective leads to representations that closely resemble those in the IT at the single unit level. This points at disentangling as a plausible learning objective for the visual brain."
Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons (open access; the paper was previously published as a preprint in 2020: https://arxiv.org/abs/2006.14304; this paper is a collaboration between Google, Howard Hughes Medical Institute, and Chinese Academy of Sciences)
No comments:
Post a Comment