stargan-v2

StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

3,576

674

3,576

115

View on GitHub

Top Related Projects

stylegan2

11,142

StyleGAN2 - Official TensorFlow Implementation

stylegan3

6,765

Official PyTorch implementation of StyleGAN3

CycleGAN

12,676

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

glow

3,151

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

swav

2,052

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Quick Overview

StarGAN v2 is an advanced image-to-image translation model that can perform multi-domain translations using a single generator. It improves upon the original StarGAN by introducing adaptive layer instance normalization (AdaIN) and style codes, allowing for diverse and high-quality image synthesis across multiple domains.

Pros

Produces high-quality, diverse image translations across multiple domains
Utilizes a single generator for all domains, reducing model complexity
Introduces style codes for fine-grained control over output images
Achieves state-of-the-art performance in various image translation tasks

Cons

Requires significant computational resources for training
May struggle with extreme pose changes or complex backgrounds
Limited to the domains it was trained on, requiring retraining for new domains
Can sometimes produce artifacts or unrealistic results in challenging scenarios

Code Examples

Loading a pre-trained model:

import torch
from core.model import Generator

generator = Generator(img_size=256, style_dim=64, num_domains=3)
checkpoint = torch.load('pretrained_model.ckpt', map_location='cpu')
generator.load_state_dict(checkpoint['generator'])

Generating a translated image:

import torchvision.transforms as T

transform = T.Compose([
    T.Resize([256, 256]),
    T.ToTensor(),
    T.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
])

x = transform(image).unsqueeze(0)
s = torch.randn(1, 64)
y = torch.tensor([target_domain])

with torch.no_grad():
    x_fake = generator(x, s, y)

Interpolating between styles:

import numpy as np

s1 = torch.randn(1, 64)
s2 = torch.randn(1, 64)
alphas = np.linspace(0, 1, num=5)

interpolated_images = []
for alpha in alphas:
    s = alpha * s1 + (1 - alpha) * s2
    with torch.no_grad():
        x_fake = generator(x, s, y)
    interpolated_images.append(x_fake)

Getting Started

Clone the repository:

git clone https://github.com/clovaai/stargan-v2.git
cd stargan-v2

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

bash download.sh pretrained-network-celeba-hq
bash download.sh pretrained-network-afhq

Run inference:

python main.py --mode sample --num_domains 3 --resume_iter 100000 --w_hpf 1 \
               --checkpoint_dir expr/checkpoints/celeba_hq \
               --result_dir expr/results/celeba_hq \
               --src_dir assets/representative/celeba_hq/src \
               --ref_dir assets/representative/celeba_hq/ref

Competitor Comparisons

stylegan2

11,142

StyleGAN2 - Official TensorFlow Implementation

Pros of StyleGAN2

Higher image quality and resolution (up to 1024x1024)
Better stability and convergence during training
More advanced architecture with adaptive instance normalization

Cons of StyleGAN2

Limited to generating single-domain images
Lacks built-in image-to-image translation capabilities
Requires more computational resources for training

Code Comparison

StyleGAN2:

G = Generator(z_dim, w_dim, num_layers)
D = Discriminator(num_layers)
loss = StyleGAN2Loss(G, D, r1_gamma)

StarGAN v2:

generator = Generator(img_size, style_dim, num_domains)
mapping_network = MappingNetwork(latent_dim, style_dim, num_domains)
style_encoder = StyleEncoder(img_size, style_dim, num_domains)
discriminator = Discriminator(img_size, num_domains)

StyleGAN2 focuses on generating high-quality images from random noise, while StarGAN v2 is designed for multi-domain image-to-image translation. StyleGAN2 uses a simpler architecture with separate generator and discriminator, whereas StarGAN v2 incorporates additional components like mapping network and style encoder to handle multiple domains and styles.

stylegan3

6,765

Official PyTorch implementation of StyleGAN3

Pros of StyleGAN3

Improved image quality and reduced artifacts compared to previous GAN models
Alias-free architecture for better high-frequency detail generation
More stable training process and improved convergence

Cons of StyleGAN3

Higher computational requirements for training and inference
Less flexibility in terms of multi-domain image translation
Potentially more complex implementation for beginners

Code Comparison

StyleGAN3:

import torch
from torch_utils import misc
from training import networks

G = networks.Generator(z_dim=512, c_dim=0, w_dim=512, img_resolution=1024, img_channels=3)
z = torch.randn([1, G.z_dim])
img = G(z, None)

StarGAN v2:

import torch
from core.model import Generator

G = Generator(img_size=256, style_dim=64, max_conv_dim=512, w_hpf=1)
x = torch.randn(1, 3, 256, 256)
s = torch.randn(1, 64)
y = torch.randint(0, 2, (1,))
out = G(x, s, y)

The code snippets demonstrate the basic usage of generators in both models. StyleGAN3 focuses on generating images from latent vectors, while StarGAN v2 allows for style transfer and domain translation.

CycleGAN

12,676

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

Simpler architecture, easier to understand and implement
Works well with unpaired datasets, which are often more readily available
Effective for style transfer and domain adaptation tasks

Cons of CycleGAN

Limited to one-to-one mappings between domains
May struggle with preserving fine details in complex transformations
Can produce less diverse outputs compared to StarGAN v2

Code Comparison

CycleGAN:

def forward(self, real_A, real_B):
    fake_B = self.G_A(real_A)
    rec_A = self.G_B(fake_B)
    fake_A = self.G_B(real_B)
    rec_B = self.G_A(fake_A)

StarGAN v2:

def forward(self, x, y, z_trg=None, y_trg=None):
    s_trg = self.mapping_network(z_trg, y_trg)
    x_fake = self.generator(x, s_trg)
    x_rec = self.generator(x_fake, self.style_encoder(x, y))

StarGAN v2 introduces a mapping network and style encoder, allowing for more flexible and diverse style transfers across multiple domains. CycleGAN's simpler approach focuses on bidirectional mapping between two specific domains.

glow

3,151

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

Pros of Glow

Focuses on generative flow models, offering a different approach to image generation
Provides a more flexible architecture for various image manipulation tasks
Includes implementations for both image and audio generation

Cons of Glow

May require more computational resources due to its complex architecture
Less specialized for style transfer tasks compared to StarGAN-v2
Potentially more challenging to fine-tune for specific use cases

Code Comparison

StarGAN-v2:

def style_mix(self, x_src, y_src, y_ref, z_trg):
    s_ref = self.mapping_network(z_trg, y_ref)
    s_src = self.style_encoder(x_src, y_src)
    x_fake = self.generator(x_src, s_ref)
    return x_fake

Glow:

def forward(self, x, logdet=0., reverse=False):
    if not reverse:
        for flow in self.flows:
            x, logdet = flow(x, logdet, reverse=False)
    else:
        for flow in reversed(self.flows):
            x = flow(x, reverse=True)
    return x, logdet

The code snippets highlight the different approaches: StarGAN-v2 focuses on style mixing and transfer, while Glow emphasizes reversible transformations for generative modeling.

swav

2,052

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Pros of SwAV

Focuses on self-supervised learning for computer vision tasks
Designed for large-scale training on unlabeled datasets
Achieves state-of-the-art results on various downstream tasks

Cons of SwAV

More complex implementation compared to StarGAN-v2
Requires significant computational resources for training
Limited to image classification and representation learning

Code Comparison

SwAV (main training loop):

for epoch in range(args.epochs):
    for batch in data_loader:
        images = batch[0].cuda(non_blocking=True)
        with torch.no_grad():
            w = model.projection_head(model.backbone(images))
        loss = swav_loss(w)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

StarGAN-v2 (main training loop):

for i in range(args.num_iters):
    # fetch real images and labels
    x_real, y_org = next(train_iter)
    x_real, y_org = x_real.to(device), y_org.to(device)
    # train the discriminator
    d_loss, d_losses_latent = train_discriminator(args, nets, optimizer, x_real, y_org, z_trg, y_trg)
    # train the generator
    g_loss, g_losses_latent = train_generator(args, nets, optimizer, x_real, y_org, z_trgs, y_trg)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

StarGAN v2 - Official PyTorch Implementation

StarGAN v2: Diverse Image Synthesis for Multiple Domains
Yunjey Choi*, Youngjung Uh*, Jaejun Yoo*, Jung-Woo Ha
In CVPR 2020. (* indicates equal contribution)

Paper: https://arxiv.org/abs/1912.01865
Video: https://youtu.be/0EVh5Ki4dIY

Abstract: A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain variations. The code, pre-trained models, and dataset are available at clovaai/stargan-v2.

Teaser video

Click the figure to watch the teaser video.

TensorFlow implementation

The TensorFlow implementation of StarGAN v2 by our team member junho can be found at clovaai/stargan-v2-tensorflow.

Software installation

Clone this repository:

git clone https://github.com/clovaai/stargan-v2.git
cd stargan-v2/

Install the dependencies:

conda create -n stargan-v2 python=3.6.7
conda activate stargan-v2
conda install -y pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.0 -c pytorch
conda install x264=='1!152.20180717' ffmpeg=4.0.2 -c conda-forge
pip install opencv-python==4.1.2.30 ffmpeg-python==0.2.0 scikit-image==0.16.2
pip install pillow==7.0.0 scipy==1.2.1 tqdm==4.43.0 munch==2.5.0

Datasets and pre-trained networks

We provide a script to download datasets used in StarGAN v2 and the corresponding pre-trained networks. The datasets and network checkpoints will be downloaded and stored in the data and expr/checkpoints directories, respectively.

CelebA-HQ. To download the CelebA-HQ dataset and the pre-trained network, run the following commands:

bash download.sh celeba-hq-dataset
bash download.sh pretrained-network-celeba-hq
bash download.sh wing

AFHQ. To download the AFHQ dataset and the pre-trained network, run the following commands:

bash download.sh afhq-dataset
bash download.sh pretrained-network-afhq

Generating interpolation videos

After downloading the pre-trained networks, you can synthesize output images reflecting diverse styles (e.g., hairstyle) of reference images. The following commands will save generated images and interpolation videos to the expr/results directory.

CelebA-HQ. To generate images and interpolation videos, run the following command:

python main.py --mode sample --num_domains 2 --resume_iter 100000 --w_hpf 1 \
               --checkpoint_dir expr/checkpoints/celeba_hq \
               --result_dir expr/results/celeba_hq \
               --src_dir assets/representative/celeba_hq/src \
               --ref_dir assets/representative/celeba_hq/ref

To transform a custom image, first crop the image manually so that the proportion of face occupied in the whole is similar to that of CelebA-HQ. Then, run the following command for additional fine rotation and cropping. All custom images in the inp_dir directory will be aligned and stored in the out_dir directory.

python main.py --mode align \
               --inp_dir assets/representative/custom/female \
               --out_dir assets/representative/celeba_hq/src/female

AFHQ. To generate images and interpolation videos, run the following command:

python main.py --mode sample --num_domains 3 --resume_iter 100000 --w_hpf 0 \
               --checkpoint_dir expr/checkpoints/afhq \
               --result_dir expr/results/afhq \
               --src_dir assets/representative/afhq/src \
               --ref_dir assets/representative/afhq/ref

Evaluation metrics

To evaluate StarGAN v2 using Fréchet Inception Distance (FID) and Learned Perceptual Image Patch Similarity (LPIPS), run the following commands:

# celeba-hq
python main.py --mode eval --num_domains 2 --w_hpf 1 \
               --resume_iter 100000 \
               --train_img_dir data/celeba_hq/train \
               --val_img_dir data/celeba_hq/val \
               --checkpoint_dir expr/checkpoints/celeba_hq \
               --eval_dir expr/eval/celeba_hq

# afhq
python main.py --mode eval --num_domains 3 --w_hpf 0 \
               --resume_iter 100000 \
               --train_img_dir data/afhq/train \
               --val_img_dir data/afhq/val \
               --checkpoint_dir expr/checkpoints/afhq \
               --eval_dir expr/eval/afhq

Note that the evaluation metrics are calculated using random latent vectors or reference images, both of which are selected by the seed number. In the paper, we reported the average of values from 10 measurements using different seed numbers. The following table shows the calculated values for both latent-guided and reference-guided synthesis.

Dataset	FID (latent)	LPIPS (latent)	FID (reference)	LPIPS (reference)	Elapsed time
`celeba-hq`	13.73 ± 0.06	0.4515 ± 0.0006	23.84 ± 0.03	0.3880 ± 0.0001	49min 51s
`afhq`	16.18 ± 0.15	0.4501 ± 0.0007	19.78 ± 0.01	0.4315 ± 0.0002	64min 49s

Training networks

To train StarGAN v2 from scratch, run the following commands. Generated images and network checkpoints will be stored in the expr/samples and expr/checkpoints directories, respectively. Training takes about three days on a single Tesla V100 GPU. Please see here for training arguments and a description of them.

# celeba-hq
python main.py --mode train --num_domains 2 --w_hpf 1 \
               --lambda_reg 1 --lambda_sty 1 --lambda_ds 1 --lambda_cyc 1 \
               --train_img_dir data/celeba_hq/train \
               --val_img_dir data/celeba_hq/val

# afhq
python main.py --mode train --num_domains 3 --w_hpf 0 \
               --lambda_reg 1 --lambda_sty 1 --lambda_ds 2 --lambda_cyc 1 \
               --train_img_dir data/afhq/train \
               --val_img_dir data/afhq/val

Animal Faces-HQ dataset (AFHQ)

We release a new dataset of animal faces, Animal Faces-HQ (AFHQ), consisting of 15,000 high-quality images at 512Ã512 resolution. The figure above shows example images of the AFHQ dataset. The dataset includes three domains of cat, dog, and wildlife, each providing about 5000 images. By having multiple (three) domains and diverse images of various breeds per each domain, AFHQ sets a challenging image-to-image translation problem. For each domain, we select 500 images as a test set and provide all remaining images as a training set. To download the dataset, run the following command:

bash download.sh afhq-dataset

[Update: 2021.07.01] We rebuild the original AFHQ dataset by using high-quality resize filtering (i.e., Lanczos resampling). Please see the clean FID paper that brings attention to the unfortunate software library situation for downsampling. We thank to Alias-Free GAN authors for their suggestion and contribution to the updated AFHQ dataset. If you use the updated dataset, we recommend to cite not only our paper but also their paper.

The differences from the original dataset are as follows:

We resize the images using Lanczos resampling instead of nearest neighbor downsampling.
About 2% of the original images had been removed. So the set is now has 15803 images, whereas the original had 16130.
Images are saved as PNG format to avoid compression artifacts. This makes the files bigger than the original, but it's worth it.

To download the updated dataset, run the following command:

bash download.sh afhq-v2-dataset

License

The source code, pre-trained models, and dataset are available under Creative Commons BY-NC 4.0 license by NAVER Corporation. You can use, copy, tranform and build upon the material for non-commercial purposes as long as you give appropriate credit by citing our paper, and indicate if changes were made.

For business inquiries, please contact clova-jobs@navercorp.com.
For technical and other inquires, please contact yunjey.choi@navercorp.com.

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{choi2020starganv2,
  title={StarGAN v2: Diverse Image Synthesis for Multiple Domains},
  author={Yunjey Choi and Youngjung Uh and Jaejun Yoo and Jung-Woo Ha},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Acknowledgements

We would like to thank the full-time and visiting Clova AI Research (now NAVER AI Lab) members for their valuable feedback and an early review: especially Seongjoon Oh, Junsuk Choe, Muhammad Ferjad Naeem, and Kyungjune Baek. We also thank Alias-Free GAN authors for their contribution to the updated AFHQ dataset.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot