CVPR 2020: Image-to-Image Translation(2)

(CoCosNet) Cross-domain Correspondence Learning for Exemplar-based Image Translation

Author: Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, Fang Wen
Arxiv: 2004.05571
Project Site

Problem

exemplar-based image translation

Assumption in prior work

Previous exemplar-based method only use style code globally. The style code only characterizes the global style of the exemplar, regardless of spatial relevant information. Thus, it causes some local style “wash away” in the ultimate image.

Deep Image Analogy is not cross-domain, may fail to handle a more challenging mapping from mask (or edge, keypoints) to photo since the pretrained network does not recognize such images.

Insight

With the cross-domain correspondence, we present a general solution to exemplar-based image translation, that for the first time, outputs images resembling the fine structures of the exemplar at instance level.

Technical overview

The network architecture comprises two sub-networks: 1) Cross-domain correspondence Network transforms the inputs from distinct domains to an intermediate feature domain where reliable dense correspondence can be established; 2) Translation network, employs a set of spatially-variant de-normalization blocks [38] to progressively synthesizes the output, using the style details from a warped exemplar which is semantically aligned to the mask (or edge, keypoints map) according to the estimated correspondence.

Proof

Datasets: ADE20k, CelebA-HQ, Deepfashion
Baselines: Pix2pixHD, SPADE, MUNIT, SIMS, EGSC-IT
Metrics: 1)FID, SWD; 2)semantic consistency: high-level feature distance from ImageNet VGG; 3)style relevance: low-level feature distance from ImageNet VGG 4)User Study ranking

Impact

Applications: Image editing, Makeup transfer

StarGAN v2: Diverse Image Synthesis for Multiple Domains

Author: Yunjey Choi, Youngjung Uh, JaeJun Yoo, Jungwoo Ha
Arxiv: 1912.01865
GitHub-PyTorch, GitHub-TensorFlow

Problem

multiple domain image translation

Assumption in prior work

Existing Image-to-image translation methods have only considered a mapping between two domains, they are not scalable to the increasing number of domains. For example, having K domains, these methods require to train K(K-1) generators to handle translations between each and every domain, limiting their practical usage.

StarGAN still learns a deterministic mapping per each domain, which does not capture the multi-modal nature of the data distribution.

Insight

generate diverse images across multiple domains.

Technical overview

(a) The generator translates an input image into an output image reflecting the domain-specific style code. (b) The mapping network transforms a latent code into style codes for multiple domains, one of which is randomly selected during training. (c) The style encoder extracts the style code of an image, allowing the generator to perform reference-guided image synthesis. (d) The discriminator distinguishes between real and fake images from multiple domains

In particular, we start from StarGAN and replace its domain label with our proposed domain-specific style code that can represent diverse styles of a specific domain. To this end, we introduce two modules, a mapping network and a style encoder. The mapping network learns to transform random Gaussian noise into a style code, while the encoder learns to extract the style code from a given reference image. Considering multiple domains, both modules have multiple output branches, each of which provides style codes for a specific domain. Finally, utilizing these style codes, our generator learns to successfully synthesize diverse images over multiple domains

Proof

Datasets: CelebA-HQ, AFHQ(new)
Baselines: MUNIT, DRIT, MSGAN, StarGAN
Metrics: FID, LPIPS

Impact

reference-guided synthesis

Panoptic-based Image Synthesis

Author: Aysegul Dundar, Karan Sapra, Guilin Liu, Andrew Tao, Bryan Catanzaro
Arxiv: 2004.10289

Problem

from panoptic map to image

Assumption in prior work

Previous conditional image synthesis algorithms mostly rely on semantic maps, and often fail in complex environments where multiple instances occlude each other. This is the result of conventional convolution and upsampling algorithms being independent of class and instance boundaries.

Insight

We are interested in panoptic maps because semantic maps do not provide sufficient information to synthesize “things” (instances) especially in complex environments with multiple of them interacting with each other.

We propose Panoptic aware upsampling that addresses the misalignment between the upsampled low resolution features and high resolution panoptic maps. This ensures that the semantic and instance details are not lost, and that we also maintain higher accuracy alignment between the generated images and the panoptic maps.

Technical overview

Panoptic Aware Convolution Layer

Panoptic aware partial convolution layer takes a panoptic map (colorized for visualization) and based on the center of each sliding window it generates a binary mask, M. The pixels that share the same identity with the center of the window are assigned 1 and the others 0.

Panoptic Aware Upsampling Layer

As shown in Figure (top), first we correct misalignment by replicating a feature vector from a neighboring pixel that belongs to the same panoptic instance. This operation is different from nearest neighbor upsampling which would always replicate the top-left feature. Second, as shown in Figure (bottom), we resolve pixels where new semantic or instance classes have just appeared by encoding new features from semantic maps with Panoptic aware convolution layer.

Proof

Datasets: COCO-Stuff, Cityscapes
Baselines: CRN, SIMS, SPADE
Metrics: detAP(for instance detection), mIoU, pixel Acc, FID

Impact

SketchyCOCO: Image Generation from Freehand Scene Sketches

Author: Chengying Gao, Qi Liu, Qi Xu, Jianzhuang Liu, Li-Jae Wang, Changqing Zou
Arxiv: 2003.02683

Problem

controllably generating realistic images with many objects and relationships from a freehand scene-level sketch

Assumption in prior work

The author argue that this is a new problem.

Insight

Using freehand sketches as conditional signal is hard.

Technical overview

Two sequential stages, foreground and background generation, based on the characteristics of scene-level sketching. The first stage focuses on foreground generation where the generated image content is supposed to exactly meet the user’s specific requirement. The second stage is responsible for background generation where the generated image content may be loosely aligned with the sketches.

Dataset: SketchyCOCO

Illustration of five-tuple ground truth data of SketchyCOCO, i.e., (a) {foreground image, foreground sketch, foreground edge maps} (training: 18,869, test: 1,329), (b) {background image, background sketch} (training: 11,265, test: 2,816), (c) {scene image, foreground image & background sketch} (training: 11,265, test: 2,816), (d) {scene image, scene sketch} (training: 11,265, test: 2,816), and (e) sketch segmentation (training: 11,265, test: 2,816)

EdgeGAN

It contains four sub-networks: two generators $G_I$ and $G_E$ , three discriminators $D_I$ , $D_E$ , and $D_J$ , an edge encoder $E$ and an image classifier $C$. EdgeGAN learns a joint embedding for an image and various-style edge maps depicting this image into a shared latent space where vectors can encode high-level attribute information from cross-modality data.

Proof

Datasets: SketchyCOCO
Baselines: ContextualGAN, SketchyGAN, pix2pix; SPADE, Ashual ICCv19
Metrics: FID, Shape Similarity, SSIM

Impact

A weird task setting.

PREVIOUSCVPR 2020: Image-to-Image Translation(1)

(CoCosNet) Cross-domain Correspondence Learning for Exemplar-based Image Translation

Problem

Assumption in prior work

Insight

Technical overview

Proof

Impact

StarGAN v2: Diverse Image Synthesis for Multiple Domains

Problem

Assumption in prior work

Insight

Technical overview

Proof

Impact

Panoptic-based Image Synthesis

Problem

Assumption in prior work

Insight

Technical overview

Proof

Impact

SketchyCOCO: Image Generation from Freehand Scene Sketches

Problem

Assumption in prior work

Insight

Technical overview

Proof

Impact

Related