Recommendable research by Google! Lot's of historical background included.
It presents a single multi-modal approach based on the Transformer covering natural photos, videos, point clouds, and audio. The depth of the model is independent of the input size, which means e.g. it can process larger photos or longer videos.
The authors even cited Immanuel Kant.
It was accepted at the International Conference on Machine Learning 2021 conference.
Perceiver: General Perception with Iterative Attention (open access)
No comments:
Post a Comment