Common Sense: On ViViT: A Video Vision Transformer

Wednesday, January 12, 2022

Very recommendable! Perhaps, the first pure transformer-based model for video classification achieving SOTA results.