BicycleGAN

Toward Multimodal Image-to-Image Translation

1,504

253

1,504

View on GitHub

Top Related Projects

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

UGATIT

6,161

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

Quick Overview

The BicycleGAN project is a generative adversarial network (GAN) model that can generate diverse and realistic images from a given input image. It is designed to address the problem of mode collapse in traditional GAN models, where the generator tends to produce similar outputs for different inputs.

Pros

Diverse Image Generation: BicycleGAN can generate a diverse set of plausible output images for a given input, addressing the mode collapse issue in traditional GANs.
Conditional Image Generation: The model can generate images conditioned on an input image, allowing for more controlled and targeted image generation.
Unsupervised Learning: BicycleGAN can be trained in an unsupervised manner, without the need for paired input-output data.
Flexible Architecture: The model can be adapted to different image-to-image translation tasks, such as image colorization, super-resolution, and style transfer.

Cons

Computational Complexity: Training BicycleGAN can be computationally intensive, especially for high-resolution images, due to the complexity of the model and the need for multiple training stages.
Hyperparameter Tuning: The model requires careful hyperparameter tuning to achieve optimal performance, which can be time-consuming and challenging.
Limited Scalability: While BicycleGAN can generate diverse outputs, the number of possible outputs may still be limited compared to the true diversity of the data distribution.
Potential Bias: Like other GAN models, BicycleGAN may learn and amplify biases present in the training data, which can lead to undesirable outputs.

Code Examples

The BicycleGAN project is implemented in PyTorch, and the following code examples demonstrate some key aspects of the model:

Defining the BicycleGAN Model:

from models.bicycle_gan_model import BicycleGANModel

model = BicycleGANModel(
    input_nc=3,
    output_nc=3,
    ngf=64,
    ndf=64,
    n_layers_D=3,
    norm='batch',
    use_dropout=False,
    init_type='normal',
    init_gain=0.02,
    no_antialias=False,
    no_antialias_up=False,
    gpu_ids=[]
)

This code creates a BicycleGANModel instance with the specified hyperparameters, which can be used for training and inference.

Training the Model:

from data.custom_dataset_data_loader import CustomDatasetDataLoader

dataset = CustomDatasetDataLoader(
    dataroot='./datasets/your_dataset',
    load_size=286,
    crop_size=256,
    serial_batches=False,
    num_threads=4,
    max_dataset_size=float('inf'),
    shuffle=True
)
model.setup(dataset)
model.train()

This code sets up the data loader and the BicycleGANModel instance, and then starts the training process.

Generating Diverse Outputs:

import torch

# Assume 'input_image' is a PyTorch tensor of shape (1, 3, 256, 256)
z_random = torch.randn(1, model.opt.nz, 1, 1)
outputs = model.generate_fake(input_image, z_random)

This code generates diverse output images by passing a random noise vector z_random along with the input image to the generate_fake method of the BicycleGANModel instance.

Getting Started

To get started with the BicycleGAN project, follow these steps:

Clone the GitHub repository:

git clone https://github.com/junyanz/BicycleGAN.git

Install the required dependencies:

cd BicycleGAN
pip install -r requirements.txt

Prepare your dataset:
- The project supports various image-to-image translation tasks, so you'll need to prepare your dataset accordingly.
- Refer to the project's README for

Competitor Comparisons

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

Pros of pytorch-CycleGAN-and-pix2pix

Supports a wider range of image-to-image translation tasks, including style transfer, super-resolution, and colorization.
Provides a more user-friendly interface with pre-defined models and training scripts.
Includes a comprehensive set of example datasets and pre-trained models for quick experimentation.

Cons of pytorch-CycleGAN-and-pix2pix

May be less flexible than BicycleGAN in terms of customizing the network architecture and training process.
Requires more computational resources for training, especially for larger or more complex datasets.
May not be as well-suited for tasks that require more diverse or multi-modal outputs, which is a strength of BicycleGAN.

Code Comparison

BicycleGAN

from models.bicycle_gan_model import BicycleGANModel
model = BicycleGANModel()
model.setup()
model.train()

pytorch-CycleGAN-and-pix2pix

from options.train_options import TrainOptions
from models import create_model
opt = TrainOptions().parse()
model = create_model(opt)
model.setup()
model.train()

The main difference in the code is the way the model is created and configured. BicycleGAN uses a custom BicycleGANModel class, while pytorch-CycleGAN-and-pix2pix uses a more generic create_model function that selects the appropriate model based on the provided options.

UGATIT

6,161

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

Pros of UGATIT

UGATIT is a more recent and advanced model, utilizing a novel Attention-Guided Unsupervised Image-to-Image Translation (UGATIT) architecture.
UGATIT has been shown to produce higher-quality and more realistic image translations compared to BicycleGAN, particularly in challenging domains like anime-to-photo translation.
The UGATIT model is more flexible, allowing for diverse and multi-modal image translation, which can be useful in various applications.

Cons of UGATIT

The UGATIT model is more complex and may require more computational resources and training time compared to BicycleGAN.
The UGATIT codebase is less well-documented and may be more challenging for newcomers to understand and use effectively.
The UGATIT model has not been as widely adopted and tested as BicycleGAN, which has a larger user community and more established track record.

Code Comparison

BicycleGAN (junyanz/BicycleGAN):

def forward(self, input, z=None, c=None, mode='enc_dec'):
    if mode == 'enc_dec':
        return self.encode_and_decode(input, z, c)
    elif mode == 'enc':
        return self.encode(input)
    elif mode == 'dec':
        return self.decode(input, z, c)
    else:
        raise ValueError('Unrecognized mode {}'.format(mode))

UGATIT (taki0112/UGATIT):

def forward(self, real, fake, label):
    # Generator
    g_loss_init, g_loss_rec, g_loss_cam, g_loss_adv, g_loss = \
        self.generator.train_on_batch([real, fake, label], [real, label, label])

    # Discriminator
    d_loss_real, d_loss_fake, d_loss = \
        self.discriminator.train_on_batch([real, fake, label], label)

    return g_loss_init, g_loss_rec, g_loss_cam, g_loss_adv, g_loss, d_loss_real, d_loss_fake, d_loss

The main difference in the code is the structure of the forward function, which reflects the different architectures and training approaches used by BicycleGAN and UGATIT.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

BicycleGAN

Project Page | Paper | Video

Pytorch implementation for multimodal image-to-image translation. For example, given the same night image, our model is able to synthesize possible day images with different types of lighting, sky and clouds. The training requires paired data.

Note: The current software works well with PyTorch 0.41+. Check out the older branch that supports PyTorch 0.1-0.3.

Toward Multimodal Image-to-Image Translation.
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman.
UC Berkeley and Adobe Research
In Neural Information Processing Systems, 2017.

Example results

Other Implementations

[Tensorflow] by Youngwoon Lee (USC CLVR Lab).
[Tensorflow] by Kv Manohar.

Prerequisites

Linux or macOS
Python 3
CPU or NVIDIA GPU + CUDA CuDNN

Getting Started

Installation

Clone this repo:

git clone -b master --single-branch https://github.com/junyanz/BicycleGAN.git
cd BicycleGAN

Install PyTorch and dependencies from http://pytorch.org
Install python libraries visdom, dominate, and moviepy.

For pip users:

bash ./scripts/install_pip.sh

For conda users:

bash ./scripts/install_conda.sh

Use a Pre-trained Model

Download some test photos (e.g., edges2shoes):

bash ./datasets/download_testset.sh edges2shoes

Download a pre-trained model (e.g., edges2shoes):

bash ./pretrained_models/download_model.sh edges2shoes

Generate results with the model

bash ./scripts/test_edges2shoes.sh

The test results will be saved to a html file here: ./results/edges2shoes/val/index.html.

Generate results with synchronized latent vectors

bash ./scripts/test_edges2shoes.sh --sync

Results can be found at ./results/edges2shoes/val_sync/index.html.

Generate Morphing Videos

We can also produce a morphing video similar to this GIF and Youtube video.

bash ./scripts/video_edges2shoes.sh

Results can be found at ./videos/edges2shoes/.

Model Training

To train a model, download the training images (e.g., edges2shoes).

bash ./datasets/download_dataset.sh edges2shoes

Train a model:

bash ./scripts/train_edges2shoes.sh

To view training results and loss plots, run python -m visdom.server and click the URL http://localhost:8097. To see more intermediate results, check out ./checkpoints/edges2shoes_bicycle_gan/web/index.html
See more training details for other datasets in ./scripts/train.sh.

Datasets (from pix2pix)

Download the datasets using the following script. Many of the datasets are collected by other researchers. Please cite their papers if you use the data.

Download the testset.

bash ./datasets/download_testset.sh dataset_name

Download the training and testset.

bash ./datasets/download_dataset.sh dataset_name

facades: 400 images from CMP Facades dataset. [Citation]
maps: 1096 training images scraped from Google Maps
edges2shoes: 50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. [Citation]
edges2handbags: 137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. [Citation]
night2day: around 20K natural scene images from Transient Attributes dataset [Citation]

Models

Download the pre-trained models with the following script.

bash ./pretrained_models/download_model.sh model_name

edges2shoes (edge -> photo) trained on UT Zappos50K dataset.
edges2handbags (edge -> photo) trained on Amazon handbags images..

bash ./pretrained_models/download_model.sh edges2handbags
bash ./datasets/download_testset.sh edges2handbags
bash ./scripts/test_edges2handbags.sh

night2day (nighttime scene -> daytime scene) trained on around 100 webcams.

bash ./pretrained_models/download_model.sh night2day
bash ./datasets/download_testset.sh night2day
bash ./scripts/test_night2day.sh

facades (facade label -> facade photo) trained on the CMP Facades dataset.

bash ./pretrained_models/download_model.sh facades
bash ./datasets/download_testset.sh facades
bash ./scripts/test_facades.sh

maps (map photo -> aerial photo) trained on 1096 training images scraped from Google Maps.

bash ./pretrained_models/download_model.sh maps
bash ./datasets/download_testset.sh maps
bash ./scripts/test_maps.sh

Metrics

Figure 6 shows realism vs diversity of our method.

Realism We use the Amazon Mechanical Turk (AMT) Real vs Fake test from this repository, first introduced in this work.
Diversity For each input image, we produce 20 translations by randomly sampling 20 z vectors. We compute LPIPS distance between consecutive pairs to get 19 paired distances. You can compute this by putting the 20 images into a directory and using this script (note that we used version 0.0 rather than default 0.1, so use flag -v 0.0). This is done for 100 input images. This results in 1900 total distances (100 images X 19 paired distances each), which are averaged together. A larger number means higher diversity.

Citation

If you find this useful for your research, please use the following.

@inproceedings{zhu2017toward,
  title={Toward multimodal image-to-image translation},
  author={Zhu, Jun-Yan and Zhang, Richard and Pathak, Deepak and Darrell, Trevor and Efros, Alexei A and Wang, Oliver and Shechtman, Eli},
  booktitle={Advances in Neural Information Processing Systems},
  year={2017}
}

If you use modules from CycleGAN or pix2pix paper, please use the following:

@inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
  year={2017}
}


@inproceedings{isola2017image,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
  year={2017}
}

Acknowledgements

This code borrows heavily from the pytorch-CycleGAN-and-pix2pix repository.

Top Related Projects

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

UGATIT

6,161

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot