This seems to be an interesting new challenge by Kaggle!
Can we reverse the text-to-image task?
We already know e.g. that it is easy to reverse the diffusion process, i.e. denoising of an image that was subjected to a well defined multi-step process of increasing noise applied to an image.
"Goal of the Competition
The goal of this competition is to reverse the typical direction of a generative text-to-image model: instead of generating an image from a text prompt, can you create a model which can predict the text prompt given a generated image? You will make predictions on a dataset containing a wide variety of (prompt, image) pairs generated by Stable Diffusion 2.0, in order to understand how reversible the latent relationship is."
No comments:
Post a Comment