Monday, December 06, 2021

On Masked Autoencoders Are Scalable Vision Learners

Recommendable! A new paper on self-supervised learning by Facebook with well known authors like Kaiming He (first author), Piotr Dollar, and Ross Girshick (senior author).

To mask 75% of an image is very high, no matter whether it is random or not! However, the results are impressive, but something seems to be still missing as the resulting images are still kind of blurry.

From the abstract:
"... Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. ..."

From the main text:
"...  It has an asymmetric encoder-decoder design. Our encoder operates only on the visible subset of patches (without mask tokens), and our decoder is lightweight and reconstructs the input from the latent rep-
resentation along with mask tokens ...
Random sampling with a high masking ratio (i.e., the ratio of removed patches) largely eliminates redundancy, thus creating a task that cannot be easily solved by extrapolation from visible neighboring patches ... The uniform distribution prevents a potential center bias ..."

[2111.06377] Masked Autoencoders Are Scalable Vision Learners

No comments: