Friday, April 15, 2022

Researchers Propose 'CoordGAN': a Novel Disentangled GAN Mode That Produces Dense Correspondence Maps Represented by a Novel Coordinate Space

Recommendable! I have not yet read this research paper, but the new approach seems promising.

"... a dense correlation is created between semantically equivalent local regions but with differing appearances (e.g., patches of two different eyes). Because identifying large-scale, pixel-level annotations is exceedingly laborious, learning extensive correspondence across images of one category remains difficult. ...
a paper that looks into learning dense correspondence from GANs. Specifically, learning an explicit correspondence map is often a pixel-level semantic label map. This job is important for disentangling structure and texture in GANs since correspondence indicates structure (e.g., shapes of facial components) and is independent of texture (e.g., global appearances like skin tone and texture).
According to studies, disentangling semantic attributes can be accomplished by looking for latent directions acquired by GANs. ...
The central aim of this study is to propose a new coordinate space from which pixel-level correspondence for all synthesized images in a category may be retrieved explicitly. In this work, researchers express the dense correspondence map of a generated image as a warped coordinate frame translated from a canonical 2D coordinate map, inspired by UV maps of 3D meshes, where shapes of one category are represented as deformations of one canonical template. ...
This allows a unique structure to be represented as a transformation between the warped and canonical frames. The team creates a Coordinate GAN (CoordGAN) with two independently sampled noise vectors controlling structure and texture. Researchers train an MLP as the aforementioned transformation in the structure branch, while the texture branch uses Adaptive Instance Normalization (AdaIN) to regulate the global appearance. This converts a sampled noise vector to a warped coordinate frame, which is modified further in the generator to control the hierarchical structure of the synthesized image. ..."

From the abstract:
"Recent advances show that Generative Adversarial Networks (GANs) can synthesize images with smooth variations along semantically meaningful latent directions, such as pose, expression, layout, etc. While this indicates that GANs implicitly learn pixel-level correspondences across images, few studies explored how to extract them explicitly. In this work, we introduce Coordinate GAN (CoordGAN), a structure-texture disentangled GAN that learns a dense correspondence map for each generated image. We represent the correspondence maps of different images as warped coordinate frames transformed from a canonical coordinate frame, i.e., the correspondence map, which describes the structure (e.g., the shape of a face), is controlled via a transformation. Hence, finding correspondences boils down to locating the same coordinate in different correspondence maps. In CoordGAN, we sample a transformation to represent the structure of a synthesized instance, while an independent texture branch is responsible for rendering appearance details orthogonal to the structure. Our approach can also extract dense correspondence maps for real images by adding an encoder on top of the generator. We quantitatively demonstrate the quality of the learned dense correspondences through segmentation mask transfer on multiple datasets. We also show that the proposed generator achieves better structure and texture disentanglement compared to existing approaches. ..."

UCSD and NVIDIA AI Researchers Propose 'CoordGAN': a Novel Disentangled GAN Mode That Produces Dense Correspondence Maps Represented by a Novel Coordinate Space - MarkTechPost

No comments: