Top Related Projects
Learning Chinese Character style with conditional GAN
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Image-to-Image Translation in PyTorch
Semantic Image Synthesis with SPADE
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Quick Overview
StarGAN is a deep learning project that implements a novel and scalable image-to-image translation model. It can perform multi-domain image translations using a single generator and discriminator, allowing for flexible and efficient transformations across multiple attributes such as hair color, gender, and age in facial images.
Pros
- Supports multi-domain translations with a single model
- Produces high-quality, realistic image transformations
- Efficient and scalable architecture
- Well-documented codebase with clear instructions
Cons
- Requires significant computational resources for training
- Limited to the specific domains it was trained on
- May struggle with extreme transformations or unusual input images
- Potential ethical concerns regarding facial attribute manipulation
Code Examples
- Loading a pre-trained StarGAN model:
import torch
from model import Generator
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
G = Generator(c_dim=5, c2_dim=8)
G.load_state_dict(torch.load('stargan_celeba_256/models/200000-G.ckpt', map_location=lambda storage, loc: storage))
G.to(device).eval()
- Performing image translation:
import torchvision.transforms as transforms
from PIL import Image
transform = transforms.Compose([
transforms.Resize(256),
transforms.ToTensor(),
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])
image = Image.open('input.jpg')
x_real = transform(image).unsqueeze(0).to(device)
c_trg = torch.FloatTensor([[-1, 1, -1, 1, 1]]).to(device) # Target domain
x_fake = G(x_real, c_trg)
- Saving the translated image:
from torchvision.utils import save_image
save_image(x_fake.data.cpu(), 'output.jpg', nrow=1, normalize=True)
Getting Started
-
Clone the repository:
git clone https://github.com/yunjey/stargan.git cd stargan
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models:
bash download.sh pretrained-celeba-256x256
-
Run the demo:
python main.py --mode test --dataset CelebA --image_size 256 --c_dim 5 \ --selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young \ --model_save_dir='stargan_celeba_256/models' \ --result_dir='stargan_celeba_256/results' \ --test_iters 200000
Competitor Comparisons
Learning Chinese Character style with conditional GAN
Pros of zi2zi
- Focuses specifically on Chinese character generation, offering specialized functionality for this domain
- Provides a more targeted approach for font style transfer in Chinese characters
- Includes a comprehensive dataset of Chinese characters for training
Cons of zi2zi
- Limited to Chinese character generation, lacking the versatility of StarGAN for multi-domain image-to-image translation
- May require more domain-specific knowledge to use effectively
- Less active development and community support compared to StarGAN
Code Comparison
zi2zi:
def discriminate(self, source, real_target, fake_target, reuse=False):
with tf.variable_scope("discriminator", reuse=reuse):
# Discriminator implementation
StarGAN:
def discriminate(self, x_real, y_org, y_trg, is_training=True, reuse=False):
with tf.variable_scope("discriminator", reuse=reuse):
# Discriminator implementation
Both projects use similar structure for their discriminator functions, but StarGAN includes additional parameters for domain labels, reflecting its multi-domain capability.
zi2zi is tailored for Chinese character generation, making it more specialized but less versatile than StarGAN. StarGAN offers a broader range of image-to-image translation tasks across multiple domains, while zi2zi excels in its specific niche of Chinese font style transfer.
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Pros of CycleGAN
- Supports unpaired image-to-image translation, allowing for more flexible dataset requirements
- Implements cycle consistency loss, which helps preserve content during translation
- Provides a more general-purpose approach for various image translation tasks
Cons of CycleGAN
- Limited to pairwise domain translations, requiring separate models for multi-domain tasks
- May struggle with preserving fine details in complex translations
- Generally slower in training and inference compared to StarGAN
Code Comparison
CycleGAN:
def forward(self, real_A, real_B):
fake_B = self.netG_A(real_A)
rec_A = self.netG_B(fake_B)
fake_A = self.netG_B(real_B)
rec_B = self.netG_A(fake_A)
StarGAN:
def forward(self, x, c):
c_trg = self.label2onehot(c, self.c_dim)
x_fake = self.generator(x, c_trg)
out_src, out_cls = self.discriminator(x_fake)
CycleGAN focuses on cycle consistency between two domains, while StarGAN uses a single generator for multi-domain translations with target domain labels as input.
Image-to-Image Translation in PyTorch
Pros of pytorch-CycleGAN-and-pix2pix
- Implements multiple image-to-image translation models (CycleGAN, pix2pix, etc.)
- Provides extensive documentation and tutorials
- Supports both paired and unpaired image translation tasks
Cons of pytorch-CycleGAN-and-pix2pix
- Focuses primarily on two-domain translation, less suitable for multi-domain tasks
- May require more computational resources due to multiple model implementations
- Learning curve can be steeper for beginners due to the variety of models
Code Comparison
pytorch-CycleGAN-and-pix2pix
from models import create_model
model = create_model(opt)
model.setup(opt)
model.train()
StarGAN
from model import Generator, Discriminator
G = Generator(c_dim, g_conv_dim, g_repeat_num)
D = Discriminator(image_size, d_conv_dim, c_dim, d_repeat_num)
The pytorch-CycleGAN-and-pix2pix repository offers a more modular approach with a unified model creation function, while StarGAN provides direct access to generator and discriminator classes. StarGAN's implementation is more focused on multi-domain image translation, making it potentially simpler for specific use cases.
Semantic Image Synthesis with SPADE
Pros of SPADE
- More advanced semantic image synthesis with better spatial control
- Supports higher resolution outputs (up to 1024x512)
- Utilizes spatially-adaptive normalization for improved detail preservation
Cons of SPADE
- More complex architecture, potentially harder to implement and train
- Requires segmentation masks as input, which may not always be available
- Higher computational requirements due to its more sophisticated design
Code Comparison
StarGAN:
def forward(self, x, c):
c = self.label2onehot(c, self.c_dim)
x = torch.cat([x, c], dim=1)
return self.main(x)
SPADE:
def forward(self, input, segmap):
x = self.fc(input)
x = self.head_0(x, segmap)
x = self.up(x)
x = self.body_up_1(x, segmap)
return self.conv_img(F.leaky_relu(x, 2e-1))
StarGAN focuses on multi-domain image-to-image translation using a single generator, while SPADE specializes in generating photorealistic images from semantic segmentation maps. SPADE offers more precise control over spatial features but requires segmentation masks. StarGAN is simpler and more versatile for various domain transfer tasks but may produce less detailed results compared to SPADE.
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Pros of StarGAN-v2
- Improved image quality and diversity compared to the original StarGAN
- Supports multi-domain translation with a single generator
- Introduces style code injection for fine-grained control over generated images
Cons of StarGAN-v2
- More complex architecture, potentially requiring more computational resources
- May be more challenging to implement and fine-tune for specific use cases
Code Comparison
StarGAN:
def forward(self, x, c):
c = self.label2onehot(c, self.c_dim)
x = torch.cat([x, c], dim=1)
return self.main(x)
StarGAN-v2:
def forward(self, x, s, masks=None):
x = self.from_rgb(x)
for block in self.blocks:
x = block(x, s)
return self.to_rgb(x)
The StarGAN-v2 code shows a more sophisticated approach, incorporating style codes (s) and potentially masks, allowing for more fine-grained control over the generated images. This reflects the improved capabilities of StarGAN-v2 in terms of image quality and multi-domain translation.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
StarGAN - Official PyTorch Implementation
***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 *****
This repository provides the official PyTorch implementation of the following paper:
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
Yunjey Choi1,2, Minje Choi1,2, Munyoung Kim2,3, Jung-Woo Ha2, Sung Kim2,4, Jaegul Choo1,2 Â Â
1Korea University, 2Clova AI Research, NAVER Corp.
3The College of New Jersey, 4Hong Kong University of Science and Technology
https://arxiv.org/abs/1711.09020Abstract: Recent studies have shown remarkable success in image-to-image translation for two domains. However, existing approaches have limited scalability and robustness in handling more than two domains, since different models should be built independently for every pair of image domains. To address this limitation, we propose StarGAN, a novel and scalable approach that can perform image-to-image translations for multiple domains using only a single model. Such a unified model architecture of StarGAN allows simultaneous training of multiple datasets with different domains within a single network. This leads to StarGAN's superior quality of translated images compared to existing models as well as the novel capability of flexibly translating an input image to any desired target domain. We empirically demonstrate the effectiveness of our approach on a facial attribute transfer and a facial expression synthesis tasks.
Dependencies
- Python 3.5+
- PyTorch 0.4.0+
- TensorFlow 1.3+ (optional for tensorboard)
Downloading datasets
To download the CelebA dataset:
git clone https://github.com/yunjey/StarGAN.git
cd StarGAN/
bash download.sh celeba
To download the RaFD dataset, you must request access to the dataset from the Radboud Faces Database website. Then, you need to create a folder structure as described here.
Training networks
To train StarGAN on CelebA, run the training script below. See here for a list of selectable attributes in the CelebA dataset. If you change the selected_attrs
argument, you should also change the c_dim
argument accordingly.
# Train StarGAN using the CelebA dataset
python main.py --mode train --dataset CelebA --image_size 128 --c_dim 5 \
--sample_dir stargan_celeba/samples --log_dir stargan_celeba/logs \
--model_save_dir stargan_celeba/models --result_dir stargan_celeba/results \
--selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young
# Test StarGAN using the CelebA dataset
python main.py --mode test --dataset CelebA --image_size 128 --c_dim 5 \
       --sample_dir stargan_celeba/samples --log_dir stargan_celeba/logs \
--model_save_dir stargan_celeba/models --result_dir stargan_celeba/results \
--selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young
To train StarGAN on RaFD:
# Train StarGAN using the RaFD dataset
python main.py --mode train --dataset RaFD --image_size 128 \
--c_dim 8 --rafd_image_dir data/RaFD/train \
--sample_dir stargan_rafd/samples --log_dir stargan_rafd/logs \
--model_save_dir stargan_rafd/models --result_dir stargan_rafd/results
# Test StarGAN using the RaFD dataset
python main.py --mode test --dataset RaFD --image_size 128 \
--c_dim 8 --rafd_image_dir data/RaFD/test \
       --sample_dir stargan_rafd/samples --log_dir stargan_rafd/logs \
--model_save_dir stargan_rafd/models --result_dir stargan_rafd/results
To train StarGAN on both CelebA and RafD:
# Train StarGAN using both CelebA and RaFD datasets
python main.py --mode=train --dataset Both --image_size 256 --c_dim 5 --c2_dim 8 \
--sample_dir stargan_both/samples --log_dir stargan_both/logs \
--model_save_dir stargan_both/models --result_dir stargan_both/results
# Test StarGAN using both CelebA and RaFD datasets
python main.py --mode test --dataset Both --image_size 256 --c_dim 5 --c2_dim 8 \
       --sample_dir stargan_both/samples --log_dir stargan_both/logs \
--model_save_dir stargan_both/models --result_dir stargan_both/results
To train StarGAN on your own dataset, create a folder structure in the same format as RaFD and run the command:
# Train StarGAN on custom datasets
python main.py --mode train --dataset RaFD --rafd_crop_size CROP_SIZE --image_size IMG_SIZE \
--c_dim LABEL_DIM --rafd_image_dir TRAIN_IMG_DIR \
--sample_dir stargan_custom/samples --log_dir stargan_custom/logs \
--model_save_dir stargan_custom/models --result_dir stargan_custom/results
# Test StarGAN on custom datasets
python main.py --mode test --dataset RaFD --rafd_crop_size CROP_SIZE --image_size IMG_SIZE \
--c_dim LABEL_DIM --rafd_image_dir TEST_IMG_DIR \
--sample_dir stargan_custom/samples --log_dir stargan_custom/logs \
--model_save_dir stargan_custom/models --result_dir stargan_custom/results
Using pre-trained networks
To download a pre-trained model checkpoint, run the script below. The pre-trained model checkpoint will be downloaded and saved into ./stargan_celeba_128/models
directory.
$ bash download.sh pretrained-celeba-128x128
To translate images using the pre-trained model, run the evaluation script below. The translated images will be saved into ./stargan_celeba_128/results
directory.
$ python main.py --mode test --dataset CelebA --image_size 128 --c_dim 5 \
--selected_attrs Black_Hair Blond_Hair Brown_Hair Male Young \
--model_save_dir='stargan_celeba_128/models' \
--result_dir='stargan_celeba_128/results'
Citation
If you find this work useful for your research, please cite our paper:
@inproceedings{choi2018stargan,
author={Yunjey Choi and Minje Choi and Munyoung Kim and Jung-Woo Ha and Sunghun Kim and Jaegul Choo},
title={StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2018}
}
Acknowledgements
This work was mainly done while the first author did a research internship at Clova AI Research, NAVER. We thank all the researchers at NAVER, especially Donghyun Kwak, for insightful discussions.
Top Related Projects
Learning Chinese Character style with conditional GAN
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Image-to-Image Translation in PyTorch
Semantic Image Synthesis with SPADE
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot