CycleGAN

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

12,676

1,963

12,676

View on GitHub

Top Related Projects

PyTorch-GAN

17,187

PyTorch implementations of Generative Adversarial Networks.

stargan

5,269

StarGAN - Official PyTorch Implementation (CVPR 2018)

MUNIT

2,681

Multimodal Unsupervised Image-to-Image Translation

UNIT

2,017

Unsupervised Image-to-Image Translation

pix2pixHD

6,789

Synthesizing and manipulating 2048x1024 images with conditional GANs

Quick Overview

CycleGAN is an innovative image-to-image translation technique that learns to translate an image from a source domain X to a target domain Y in the absence of paired examples. It uses cycle-consistent adversarial networks to achieve this task, enabling applications like style transfer, object transfiguration, season transfer, and photo enhancement.

Pros

Requires no paired training data, making it applicable to a wide range of scenarios
Produces high-quality results for various image translation tasks
Versatile and can be applied to different domains (e.g., horses to zebras, summer to winter)
Open-source implementation with pre-trained models available

Cons

Can sometimes produce artifacts or unrealistic results
Training can be computationally expensive and time-consuming
May struggle with complex transformations or preserving fine details
Limited control over specific aspects of the translation process

Code Examples

Loading a pre-trained CycleGAN model:

import torch
from models import create_model

opt = TestOptions().parse()
model = create_model(opt)
model.setup(opt)

Translating an image using CycleGAN:

from util.util import tensor2im
import torchvision.transforms as transforms

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

input_image = Image.open('input.jpg').convert('RGB')
input_tensor = transform(input_image).unsqueeze(0)

with torch.no_grad():
    output = model.netG_A(input_tensor)

output_image = tensor2im(output)
output_image.save('output.jpg')

Training a new CycleGAN model:

from options.train_options import TrainOptions
from data import create_dataset
from models import create_model

opt = TrainOptions().parse()
dataset = create_dataset(opt)
model = create_model(opt)
model.setup(opt)

for epoch in range(opt.epoch_count, opt.n_epochs + opt.n_epochs_decay + 1):
    for i, data in enumerate(dataset):
        model.set_input(data)
        model.optimize_parameters()

Getting Started

To get started with CycleGAN:

Clone the repository:

git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
cd pytorch-CycleGAN-and-pix2pix

Install dependencies:
```
pip install -r requirements.txt
```

Download a pre-trained model:

bash ./scripts/download_cyclegan_model.sh horse2zebra

Test the model:

python test.py --dataroot ./datasets/horse2zebra/testA --name horse2zebra_pretrained --model test --no_dropout

The translated images will be saved in the results/horse2zebra_pretrained/test_latest/images directory.

Competitor Comparisons

PyTorch-GAN

17,187

PyTorch implementations of Generative Adversarial Networks.

Pros of PyTorch-GAN

Implements a wide variety of GAN architectures (over 30)
Provides a consistent interface for different GAN models
Includes example outputs and training scripts for each implementation

Cons of PyTorch-GAN

Less focused on a specific GAN application (like image-to-image translation)
May require more setup and configuration for specific use cases
Documentation is less comprehensive compared to CycleGAN

Code Comparison

CycleGAN:

def forward(self, input):
    return self.model(input)

PyTorch-GAN:

def forward(self, img):
    validity = self.model(img)
    return validity

Both repositories use similar forward pass implementations, but PyTorch-GAN tends to use more generic variable names (e.g., img instead of input).

CycleGAN focuses specifically on cycle-consistent adversarial networks for image-to-image translation, while PyTorch-GAN provides a broader collection of GAN implementations. CycleGAN offers more detailed documentation and examples for its specific use case, whereas PyTorch-GAN provides a wider range of GAN architectures with a consistent interface. Users looking for a specific image-to-image translation solution might prefer CycleGAN, while those exploring various GAN architectures may find PyTorch-GAN more suitable.

stargan

5,269

StarGAN - Official PyTorch Implementation (CVPR 2018)

Pros of StarGAN

Supports multi-domain image-to-image translation with a single model
More efficient in terms of training time and model size for multiple domains
Produces higher quality results for facial attribute transfer tasks

Cons of StarGAN

Limited to specific types of transformations (e.g., facial attributes)
May struggle with more complex image-to-image translation tasks
Less flexible for arbitrary domain translations compared to CycleGAN

Code Comparison

StarGAN:

class Generator(nn.Module):
    def __init__(self, conv_dim=64, c_dim=5, repeat_num=6):
        super(Generator, self).__init__()
        self.encoder = Encoder(conv_dim, c_dim)
        self.decoder = Decoder(conv_dim, c_dim, repeat_num)

CycleGAN:

class Generator(nn.Module):
    def __init__(self, input_nc, output_nc, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False, n_blocks=6):
        super(Generator, self).__init__()
        model = [nn.ReflectionPad2d(3),
                 nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0),
                 norm_layer(ngf),
                 nn.ReLU(True)]

StarGAN uses a more compact architecture with separate encoder and decoder, while CycleGAN employs a more traditional convolutional structure. StarGAN's design is optimized for multi-domain translations, whereas CycleGAN focuses on pairwise domain translations.

MUNIT

2,681

Multimodal Unsupervised Image-to-Image Translation

Pros of MUNIT

Supports multi-modal image-to-image translation, allowing for diverse outputs
Enables style transfer between domains while preserving content
Provides better disentanglement of content and style features

Cons of MUNIT

More complex architecture, potentially requiring more computational resources
May be less stable during training compared to CycleGAN
Could be more challenging to fine-tune for specific use cases

Code Comparison

MUNIT:

content, style_fake = self.gen.encode(x_a)
images_fake = self.gen.decode(content, style_fake)
x_recon = self.gen.decode(content, style_real)

CycleGAN:

fake_B = self.netG_A(real_A)
rec_A = self.netG_B(fake_B)
fake_A = self.netG_B(real_B)
rec_B = self.netG_A(fake_A)

MUNIT's code demonstrates its content-style separation, while CycleGAN focuses on direct domain translation. MUNIT's approach allows for more flexible and diverse outputs, but at the cost of increased complexity. CycleGAN's simpler architecture may be easier to implement and train for straightforward image-to-image translation tasks.

UNIT

2,017

Unsupervised Image-to-Image Translation

Pros of UNIT

Utilizes a shared-latent space assumption, potentially leading to more coherent translations
Supports multi-modal mappings, allowing for diverse outputs from a single input
Incorporates both VAE and GAN architectures, potentially offering more stable training

Cons of UNIT

May require more computational resources due to its complex architecture
Could be more challenging to implement and fine-tune compared to CycleGAN
Might struggle with datasets that don't align well with the shared-latent space assumption

Code Comparison

UNIT:

def forward(self, x_a, x_b):
    h_a, n_a = self.gen_a.encode(x_a)
    h_b, n_b = self.gen_b.encode(x_b)
    x_ba = self.gen_a.decode(h_b + n_b)
    x_ab = self.gen_b.decode(h_a + n_a)
    return x_ab, x_ba

CycleGAN:

def forward(self, real_A, real_B):
    fake_B = self.netG_A(real_A)
    rec_A = self.netG_B(fake_B)
    fake_A = self.netG_B(real_B)
    rec_B = self.netG_A(fake_A)
    return fake_A, fake_B, rec_A, rec_B

pix2pixHD

6,789

Synthesizing and manipulating 2048x1024 images with conditional GANs

Pros of pix2pixHD

Higher resolution output (up to 2048x1024)
Improved image quality with multi-scale generator and discriminator
Better handling of complex scenes and fine details

Cons of pix2pixHD

Requires paired training data, unlike CycleGAN's unpaired approach
More computationally intensive, requiring more powerful hardware
Potentially less flexible for certain types of image-to-image translation tasks

Code Comparison

CycleGAN:

def forward(self, input):
    return self.model(input)

pix2pixHD:

def forward(self, input, z=None):
    return self.model(input)

The main difference in the forward pass is that pix2pixHD allows for an optional noise input z, which can be used to introduce variability in the output. This feature is not present in the basic CycleGAN implementation.

Both projects use PyTorch and have similar overall structures, but pix2pixHD incorporates more advanced techniques for high-resolution image synthesis. CycleGAN focuses on unpaired image-to-image translation, while pix2pixHD excels in generating high-quality, detailed images from semantic label maps or sketches.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CycleGAN

PyTorch | project page | paper

Torch implementation for learning an image-to-image translation (i.e. pix2pix) without input-output pairs, for example:

New: Please check out contrastive-unpaired-translation (CUT), our new unpaired image-to-image translation model that enables fast and memory-efficient training.

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu*, Taesung Park*, Phillip Isola, Alexei A. Efros
Berkeley AI Research Lab, UC Berkeley
In ICCV 2017. (* equal contributions)

This package includes CycleGAN, pix2pix, as well as other methods like BiGAN/ALI and Apple's paper S+U learning.
The code was written by Jun-Yan Zhu and Taesung Park.
Update: Please check out PyTorch implementation for CycleGAN and pix2pix. The PyTorch version is under active development and can produce results comparable or better than this Torch version.

Other implementations:

[Tensorflow] (by Harry Yang), [Tensorflow] (by Archit Rathore), [Tensorflow] (by Van Huy), [Tensorflow] (by Xiaowei Hu), [Tensorflow-simple] (by Zhenliang He), [TensorLayer] (by luoxier), [Chainer] (by Yanghua Jin), [Minimal PyTorch] (by yunjey), [Mxnet] (by Ldpe2G), [lasagne/Keras] (by tjwei), [Keras] (by Simon Karlsson)

Applications

Monet Paintings to Photos

Collection Style Transfer

Object Transfiguration

Season Transfer

Photo Enhancement: Narrow depth of field

Prerequisites

Linux or OSX
NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN may work with minimal modification, but untested)
For MAC users, you need the Linux/GNU commands gfind and gwc, which can be installed with brew install findutils coreutils.

Getting Started

Installation

Install torch and dependencies from https://github.com/torch/distro
Install torch packages nngraph, class, display

luarocks install nngraph
luarocks install class
luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec

Clone this repo:

git clone https://github.com/junyanz/CycleGAN
cd CycleGAN

Apply a Pre-trained Model

Download the test photos (taken by Alexei Efros):

bash ./datasets/download_dataset.sh ae_photos

Download the pre-trained model style_cezanne (For CPU model, use style_cezanne_cpu):

bash ./pretrained_models/download_model.sh style_cezanne

DATA_ROOT=./datasets/ae_photos name=style_cezanne_pretrained model=one_direction_test phase=test loadSize=256 fineSize=256 resize_or_crop="scale_width" th test.lua

The test results will be saved to ./results/style_cezanne_pretrained/latest_test/index.html.
Please refer to Model Zoo for more pre-trained models. ./examples/test_vangogh_style_on_ae_photos.sh is an example script that downloads the pretrained Van Gogh style network and runs it on Efros's photos.

Train

Download a dataset (e.g. zebra and horse images from ImageNet):

bash ./datasets/download_dataset.sh horse2zebra

Train a model:

DATA_ROOT=./datasets/horse2zebra name=horse2zebra_model th train.lua

(CPU only) The same training command without using a GPU or CUDNN. Setting the environment variables gpu=0 cudnn=0 forces CPU only

DATA_ROOT=./datasets/horse2zebra name=horse2zebra_model gpu=0 cudnn=0 th train.lua

(Optionally) start the display server to view results as the model trains. (See Display UI for more details):

th -ldisplay.start 8000 0.0.0.0

Test

Finally, test the model:

DATA_ROOT=./datasets/horse2zebra name=horse2zebra_model phase=test th test.lua

The test results will be saved to an HTML file here: ./results/horse2zebra_model/latest_test/index.html.

Model Zoo

Download the pre-trained models with the following script. The model will be saved to ./checkpoints/model_name/latest_net_G.t7.

bash ./pretrained_models/download_model.sh model_name

orange2apple (orange -> apple) and apple2orange: trained on ImageNet categories apple and orange.
horse2zebra (horse -> zebra) and zebra2horse (zebra -> horse): trained on ImageNet categories horse and zebra.
style_monet (landscape photo -> Monet painting style), style_vangogh (landscape photo -> Van Gogh painting style), style_ukiyoe (landscape photo -> Ukiyo-e painting style), style_cezanne (landscape photo -> Cezanne painting style): trained on paintings and Flickr landscape photos.
monet2photo (Monet paintings -> real landscape): trained on paintings and Flickr landscape photographs.
cityscapes_photo2label (street scene -> label) and cityscapes_label2photo (label -> street scene): trained on the Cityscapes dataset.
map2sat (map -> aerial photo) and sat2map (aerial photo -> map): trained on Google maps.
iphone2dslr_flower (iPhone photos of flowers -> DSLR photos of flowers): trained on Flickr photos.

CPU models can be downloaded using:

bash pretrained_models/download_model.sh <name>_cpu

, where <name> can be horse2zebra, style_monet, etc. You just need to append _cpu to the target model.

Training and Test Details

To train a model,

DATA_ROOT=/path/to/data/ name=expt_name th train.lua

Models are saved to ./checkpoints/expt_name (can be changed by passing checkpoint_dir=your_dir in train.lua).
See opt_train in options.lua for additional training options.

To test the model,

DATA_ROOT=/path/to/data/ name=expt_name phase=test th test.lua

This will run the model named expt_name in both directions on all images in /path/to/data/testA and /path/to/data/testB.
A webpage with result images will be saved to ./results/expt_name (can be changed by passing results_dir=your_dir in test.lua).
See opt_test in options.lua for additional test options. Please use model=one_direction_test if you only would like to generate outputs of the trained network in only one direction, and specify which_direction=AtoB or which_direction=BtoA to set the direction.

There are other options that can be used. For example, you can specify resize_or_crop=crop option to avoid resizing the image to squares. This is indeed how we trained GTA2Cityscapes model in the projet webpage and Cycada model. We prepared the images at 1024px resolution, and used resize_or_crop=crop fineSize=360 to work with the cropped images of size 360x360. We also used lambda_identity=1.0.

Datasets

Download the datasets using the following script. Many of the datasets were collected by other researchers. Please cite their papers if you use the data.

bash ./datasets/download_dataset.sh dataset_name

facades: 400 images from the CMP Facades dataset. [Citation]
cityscapes: 2975 images from the Cityscapes training set. [Citation]. Note: Due to license issue, we do not host the dataset on our repo. Please download the dataset directly from the Cityscapes webpage. Please refer to ./datasets/prepare_cityscapes_dataset.py for more detail.
maps: 1096 training images scraped from Google Maps.
horse2zebra: 939 horse images and 1177 zebra images downloaded from ImageNet using the keywords wild horse and zebra
apple2orange: 996 apple images and 1020 orange images downloaded from ImageNet using the keywords apple and navel orange.
summer2winter_yosemite: 1273 summer Yosemite images and 854 winter Yosemite images were downloaded using Flickr API. See more details in our paper.
monet2photo, vangogh2photo, ukiyoe2photo, cezanne2photo: The art images were downloaded from Wikiart. The real photos are downloaded from Flickr using the combination of the tags landscape and landscapephotography. The training set size of each class is Monet:1074, Cezanne:584, Van Gogh:401, Ukiyo-e:1433, Photographs:6853.
iphone2dslr_flower: both classes of images were downloaded from Flickr. The training set size of each class is iPhone:1813, DSLR:3316. See more details in our paper.

Display UI

Optionally, for displaying images during training and test, use the display package.

Install it with: luarocks install https://raw.githubusercontent.com/szym/display/master/display-scm-0.rockspec
Then start the server with: th -ldisplay.start
Open this URL in your browser: http://localhost:8000

By default, the server listens on localhost. Pass 0.0.0.0 to allow external connections on any interface:

th -ldisplay.start 8000 0.0.0.0

Then open http://(hostname):(port)/ in your browser to load the remote desktop.

Setup Training and Test data

To train CycleGAN model on your own datasets, you need to create a data folder with two subdirectories trainA and trainB that contain images from domain A and B. You can test your model on your training set by setting phase='train' in test.lua. You can also create subdirectories testA and testB if you have test data.

You should not expect our method to work on just any random combination of input and output datasets (e.g. cats<->keyboards). From our experiments, we find it works better if two datasets share similar visual content. For example, landscape painting<->landscape photographs works much better than portrait painting <-> landscape photographs. zebras<->horses achieves compelling results while cats<->dogs completely fails. See the following section for more discussion.

Failure cases

Our model does not work well when the test image is rather different from the images on which the model is trained, as is the case in the figure to the left (we trained on horses and zebras without riders, but test here one a horse with a rider). See additional typical failure cases here. On translation tasks that involve color and texture changes, like many of those reported above, the method often succeeds. We have also explored tasks that require geometric changes, with little success. For example, on the task of dog<->cat transfiguration, the learned translation degenerates into making minimal changes to the input. We also observe a lingering gap between the results achievable with paired training data and those achieved by our unpaired method. In some cases, this gap may be very hard -- or even impossible,-- to close: for example, our method sometimes permutes the labels for tree and building in the output of the cityscapes photos->labels task.

Citation

If you use this code for your research, please cite our paper:

@inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
  year={2017}
}

Related Projects:

Cat Paper Collection

If you love cats, and love reading cool graphics, vision, and ML papers, please check out the Cat Paper Collection.

Acknowledgments

Code borrows from pix2pix and DCGAN. The data loader is modified from DCGAN and Context-Encoder. The generative network is adopted from neural-style with Instance Normalization.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot