This is a very impressive work on visual transformers by Facebook! It has already garnered 67 citations as of 8/25/2021 according to Microsoft Academic and it was only published end of April 2021.
From the abstract:
"... Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3% top-1 on ImageNet with a small ViT. Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. ..."
No comments:
Post a Comment