stargan

StarGAN - Official PyTorch Implementation (CVPR 2018)

5,270

971

5,270

View on GitHub

Top Related Projects

zi2zi

2,644

Learning Chinese Character style with conditional GAN

CycleGAN

12,676

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

SPADE

7,662

Semantic Image Synthesis with SPADE

stargan-v2

3,576

StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

Quick Overview

StarGAN is a deep learning project that implements a novel and scalable image-to-image translation model. It can perform multi-domain image translations using a single generator and discriminator, allowing for flexible and efficient transformations across multiple attributes such as hair color, gender, and age in facial images.

Pros

Supports multi-domain translations with a single model
Produces high-quality, realistic image transformations
Efficient and scalable architecture
Well-documented codebase with clear instructions

Cons

Requires significant computational resources for training
Limited to the specific domains it was trained on
May struggle with extreme transformations or unusual input images
Potential ethical concerns regarding facial attribute manipulation

Code Examples

Loading a pre-trained StarGAN model:

import torch
from model import Generator

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
G = Generator(c_dim=5, c2_dim=8)
G.load_state_dict(torch.load('stargan_celeba_256/models/200000-G.ckpt', map_location=lambda storage, loc: storage))
G.to(device).eval()

Performing image translation:

import torchvision.transforms as transforms
from PIL import Image

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])

image = Image.open('input.jpg')
x_real = transform(image).unsqueeze(0).to(device)
c_trg = torch.FloatTensor([[-1, 1, -1, 1, 1]]).to(device)  # Target domain
x_fake = G(x_real, c_trg)

Saving the translated image:

from torchvision.utils import save_image

save_image(x_fake.data.cpu(), 'output.jpg', nrow=1, normalize=True)

Getting Started

Clone the repository:

git clone https://github.com/yunjey/stargan.git
cd stargan

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

bash download.sh pretrained-celeba-256x256

Run the demo:

python main.py --mode test --dataset CelebA --image_size 256 --c_dim 5 \
               --selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young \
               --model_save_dir='stargan_celeba_256/models' \
               --result_dir='stargan_celeba_256/results' \
               --test_iters 200000

Competitor Comparisons

zi2zi

2,644

Learning Chinese Character style with conditional GAN

Pros of zi2zi

Focuses specifically on Chinese character generation, offering specialized functionality for this domain
Provides a more targeted approach for font style transfer in Chinese characters
Includes a comprehensive dataset of Chinese characters for training

Cons of zi2zi

Limited to Chinese character generation, lacking the versatility of StarGAN for multi-domain image-to-image translation
May require more domain-specific knowledge to use effectively
Less active development and community support compared to StarGAN

Code Comparison

zi2zi:

def discriminate(self, source, real_target, fake_target, reuse=False):
    with tf.variable_scope("discriminator", reuse=reuse):
        # Discriminator implementation

StarGAN:

def discriminate(self, x_real, y_org, y_trg, is_training=True, reuse=False):
    with tf.variable_scope("discriminator", reuse=reuse):
        # Discriminator implementation

Both projects use similar structure for their discriminator functions, but StarGAN includes additional parameters for domain labels, reflecting its multi-domain capability.

zi2zi is tailored for Chinese character generation, making it more specialized but less versatile than StarGAN. StarGAN offers a broader range of image-to-image translation tasks across multiple domains, while zi2zi excels in its specific niche of Chinese font style transfer.

CycleGAN

12,676

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

Supports unpaired image-to-image translation, allowing for more flexible dataset requirements
Implements cycle consistency loss, which helps preserve content during translation
Provides a more general-purpose approach for various image translation tasks

Cons of CycleGAN

Limited to pairwise domain translations, requiring separate models for multi-domain tasks
May struggle with preserving fine details in complex translations
Generally slower in training and inference compared to StarGAN

Code Comparison

CycleGAN:

def forward(self, real_A, real_B):
    fake_B = self.netG_A(real_A)
    rec_A = self.netG_B(fake_B)
    fake_A = self.netG_B(real_B)
    rec_B = self.netG_A(fake_A)

StarGAN:

def forward(self, x, c):
    c_trg = self.label2onehot(c, self.c_dim)
    x_fake = self.generator(x, c_trg)
    out_src, out_cls = self.discriminator(x_fake)

CycleGAN focuses on cycle consistency between two domains, while StarGAN uses a single generator for multi-domain translations with target domain labels as input.

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

Pros of pytorch-CycleGAN-and-pix2pix

Implements multiple image-to-image translation models (CycleGAN, pix2pix, etc.)
Provides extensive documentation and tutorials
Supports both paired and unpaired image translation tasks

Cons of pytorch-CycleGAN-and-pix2pix

Focuses primarily on two-domain translation, less suitable for multi-domain tasks
May require more computational resources due to multiple model implementations
Learning curve can be steeper for beginners due to the variety of models

Code Comparison

pytorch-CycleGAN-and-pix2pix

from models import create_model
model = create_model(opt)
model.setup(opt)
model.train()

StarGAN

from model import Generator, Discriminator
G = Generator(c_dim, g_conv_dim, g_repeat_num)
D = Discriminator(image_size, d_conv_dim, c_dim, d_repeat_num)

The pytorch-CycleGAN-and-pix2pix repository offers a more modular approach with a unified model creation function, while StarGAN provides direct access to generator and discriminator classes. StarGAN's implementation is more focused on multi-domain image translation, making it potentially simpler for specific use cases.

SPADE

7,662

Semantic Image Synthesis with SPADE

Pros of SPADE

More advanced semantic image synthesis with better spatial control
Supports higher resolution outputs (up to 1024x512)
Utilizes spatially-adaptive normalization for improved detail preservation

Cons of SPADE

More complex architecture, potentially harder to implement and train
Requires segmentation masks as input, which may not always be available
Higher computational requirements due to its more sophisticated design

Code Comparison

StarGAN:

def forward(self, x, c):
    c = self.label2onehot(c, self.c_dim)
    x = torch.cat([x, c], dim=1)
    return self.main(x)

SPADE:

def forward(self, input, segmap):
    x = self.fc(input)
    x = self.head_0(x, segmap)
    x = self.up(x)
    x = self.body_up_1(x, segmap)
    return self.conv_img(F.leaky_relu(x, 2e-1))

StarGAN focuses on multi-domain image-to-image translation using a single generator, while SPADE specializes in generating photorealistic images from semantic segmentation maps. SPADE offers more precise control over spatial features but requires segmentation masks. StarGAN is simpler and more versatile for various domain transfer tasks but may produce less detailed results compared to SPADE.

stargan-v2

3,576

StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

Pros of StarGAN-v2

Improved image quality and diversity compared to the original StarGAN
Supports multi-domain translation with a single generator
Introduces style code injection for fine-grained control over generated images

Cons of StarGAN-v2

More complex architecture, potentially requiring more computational resources
May be more challenging to implement and fine-tune for specific use cases

Code Comparison

StarGAN:

def forward(self, x, c):
    c = self.label2onehot(c, self.c_dim)
    x = torch.cat([x, c], dim=1)
    return self.main(x)

StarGAN-v2:

def forward(self, x, s, masks=None):
    x = self.from_rgb(x)
    for block in self.blocks:
        x = block(x, s)
    return self.to_rgb(x)

The StarGAN-v2 code shows a more sophisticated approach, incorporating style codes (s) and potentially masks, allowing for more fine-grained control over the generated images. This reflects the improved capabilities of StarGAN-v2 in terms of image quality and multi-domain translation.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

StarGAN - Official PyTorch Implementation

***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 *****

This repository provides the official PyTorch implementation of the following paper:

StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Yunjey Choi^1,2, Minje Choi^1,2, Munyoung Kim^2,3, Jung-Woo Ha², Sung Kim^2,4, Jaegul Choo^1,2 Â Â
¹Korea University, ²Clova AI Research, NAVER Corp.
³The College of New Jersey, ⁴Hong Kong University of Science and Technology
https://arxiv.org/abs/1711.09020

Abstract: Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.

Dependencies

Python 3.5+
PyTorch 0.4.0+
TensorFlow 1.3+ (optional for tensorboard)

Downloading datasets

To download the CelebA dataset:

git clone https://github.com/yunjey/StarGAN.git
cd StarGAN/
bash download.sh celeba

To download the RaFD dataset, you must request access to the dataset from the Radboud Faces Database website. Then, you need to create a folder structure as described here.

Training networks

To train StarGAN on CelebA, run the training script below. See here for a list of selectable attributes in the CelebA dataset. If you change the selected_attrs argument, you should also change the c_dim argument accordingly.

# Train StarGAN using the CelebA dataset
python main.py --mode train --dataset CelebA --image_size 128 --c_dim 5 \
               --sample_dir stargan_celeba/samples --log_dir stargan_celeba/logs \
               --model_save_dir stargan_celeba/models --result_dir stargan_celeba/results \
               --selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young

# Test StarGAN using the CelebA dataset
python main.py --mode test --dataset CelebA --image_size 128 --c_dim 5 \
 Â  Â  Â  Â  Â  Â  Â  --sample_dir stargan_celeba/samples --log_dir stargan_celeba/logs \
               --model_save_dir stargan_celeba/models --result_dir stargan_celeba/results \
               --selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young

To train StarGAN on RaFD:

# Train StarGAN using the RaFD dataset
python main.py --mode train --dataset RaFD --image_size 128 \
               --c_dim 8 --rafd_image_dir data/RaFD/train \
               --sample_dir stargan_rafd/samples --log_dir stargan_rafd/logs \
               --model_save_dir stargan_rafd/models --result_dir stargan_rafd/results

# Test StarGAN using the RaFD dataset
python main.py --mode test --dataset RaFD --image_size 128 \
               --c_dim 8 --rafd_image_dir data/RaFD/test \
 Â  Â  Â  Â  Â  Â  Â  --sample_dir stargan_rafd/samples --log_dir stargan_rafd/logs \
               --model_save_dir stargan_rafd/models --result_dir stargan_rafd/results

To train StarGAN on both CelebA and RafD:

# Train StarGAN using both CelebA and RaFD datasets
python main.py --mode=train --dataset Both --image_size 256 --c_dim 5 --c2_dim 8 \
               --sample_dir stargan_both/samples --log_dir stargan_both/logs \
               --model_save_dir stargan_both/models --result_dir stargan_both/results

# Test StarGAN using both CelebA and RaFD datasets
python main.py --mode test --dataset Both --image_size 256 --c_dim 5 --c2_dim 8 \
 Â  Â  Â  Â  Â  Â  Â  --sample_dir stargan_both/samples --log_dir stargan_both/logs \
               --model_save_dir stargan_both/models --result_dir stargan_both/results

To train StarGAN on your own dataset, create a folder structure in the same format as RaFD and run the command:

# Train StarGAN on custom datasets
python main.py --mode train --dataset RaFD --rafd_crop_size CROP_SIZE --image_size IMG_SIZE \
               --c_dim LABEL_DIM --rafd_image_dir TRAIN_IMG_DIR \
               --sample_dir stargan_custom/samples --log_dir stargan_custom/logs \
               --model_save_dir stargan_custom/models --result_dir stargan_custom/results

# Test StarGAN on custom datasets
python main.py --mode test --dataset RaFD --rafd_crop_size CROP_SIZE --image_size IMG_SIZE \
               --c_dim LABEL_DIM --rafd_image_dir TEST_IMG_DIR \
               --sample_dir stargan_custom/samples --log_dir stargan_custom/logs \
               --model_save_dir stargan_custom/models --result_dir stargan_custom/results

Using pre-trained networks

To download a pre-trained model checkpoint, run the script below. The pre-trained model checkpoint will be downloaded and saved into ./stargan_celeba_128/models directory.

$ bash download.sh pretrained-celeba-128x128

To translate images using the pre-trained model, run the evaluation script below. The translated images will be saved into ./stargan_celeba_128/results directory.

$ python main.py --mode test --dataset CelebA --image_size 128 --c_dim 5 \
                 --selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young \
                 --model_save_dir='stargan_celeba_128/models' \
                 --result_dir='stargan_celeba_128/results'

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{choi2018stargan,
author={Yunjey Choi and Minje Choi and Munyoung Kim and Jung-Woo Ha and Sunghun Kim and Jaegul Choo},
title={StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}

Acknowledgements

This work was mainly done while the first author did a research internship at Clova AI Research, NAVER. We thank all the researchers at NAVER, especially Donghyun Kwak, for insightful discussions.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot