UGATIT

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

6,168

1,043

6,168

View on GitHub

Top Related Projects

CycleGAN

12,306

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Quick Overview

The UGATIT project is a PyTorch implementation of the Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization (UGATIT) model, which is a state-of-the-art image-to-image translation framework. It allows for high-quality and diverse image-to-image translation tasks, such as converting sketches to realistic images or transforming day-to-night scenes.

Pros

High-Quality Translations: The UGATIT model produces high-quality and realistic image translations, outperforming many existing image-to-image translation methods.
Diverse Outputs: The model can generate diverse and varied output images, capturing the multi-modal nature of the translation task.
Unsupervised Learning: The UGATIT framework is an unsupervised learning approach, which means it can be trained without the need for paired training data.
Adaptive Layer-Instance Normalization: The model's use of Adaptive Layer-Instance Normalization (AdaLIN) allows for better preservation of local and global features during the translation process.

Cons

Computational Complexity: The UGATIT model can be computationally expensive, especially for high-resolution image translations, which may limit its real-time applications.
Hyperparameter Tuning: The model's performance can be sensitive to the choice of hyperparameters, which may require extensive experimentation and tuning.
Limited Datasets: The model's performance may be limited by the availability of suitable datasets for specific image-to-image translation tasks.
Potential Bias: Like many deep learning models, the UGATIT framework may learn and perpetuate biases present in the training data.

Code Examples

Since this is a code library, here are a few short code examples to demonstrate its usage:

import torch
from ugatit import UGATIT

# Initialize the UGATIT model
model = UGATIT(lr=0.0001, ch=64, n_res=4, n_dis=6, img_size=256)

# Load pre-trained weights
model.load_state_dict(torch.load('pretrained_model.pth'))

# Perform image-to-image translation
source_image = torch.randn(1, 3, 256, 256)
translated_image = model.test(source_image)

This code snippet demonstrates how to initialize the UGATIT model, load pre-trained weights, and perform image-to-image translation on a random input image.

# Train the UGATIT model
train_loader = # Load training data
model.train(train_loader, num_epochs=100)

This code snippet shows how to train the UGATIT model using a PyTorch data loader and training for 100 epochs.

# Evaluate the UGATIT model
val_loader = # Load validation data
metrics = model.evaluate(val_loader)
print(f'Validation Metrics: {metrics}')

This code snippet demonstrates how to evaluate the UGATIT model on a validation dataset and print the resulting metrics.

Getting Started

To get started with the UGATIT project, follow these steps:

Clone the repository:

git clone https://github.com/taki0112/UGATIT.git

Install the required dependencies:

cd UGATIT
pip install -r requirements.txt

Download the pre-trained model weights:

wget https://drive.google.com/uc?id=1rSoeq6RwMj8C4FqmDXvmh_4RvuL1OSXE -O pretrained_model.pth

Run the inference script to translate an image:

python inference.py --img_path path/to/your/image.jpg --result_dir output_dir

(Optional) Train the UGATIT model on your own dataset:

python main.py --dataset_name your_dataset --img_size 256 --ch 64 --n_res 4 --n_dis 6 --iteration 200

Competitor Comparisons

CycleGAN

12,306

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

CycleGAN is a more general-purpose image-to-image translation framework, capable of handling a wider range of tasks beyond just style transfer.
The CycleGAN paper has been highly influential in the field of generative adversarial networks (GANs) and has been widely cited.
CycleGAN has a larger and more active community, with more resources and support available.

Cons of CycleGAN

UGATIT is specifically designed for high-quality style transfer, and may produce better results for this particular task.
CycleGAN can be more complex to set up and configure, especially for users new to the field of GANs.
The training process for CycleGAN can be more computationally intensive and time-consuming compared to UGATIT.

Code Comparison

UGATIT:

def generator(self, x, is_training=True, reuse=False):
    with tf.variable_scope("generator", reuse=reuse):
        ch = self.ch
        x = conv(x, ch, kernel=7, stride=1, pad=3, use_bias=False, scope='conv')
        x = instance_norm(x, scope='ins_norm')
        x = relu(x)

        x = conv(x, ch*2, kernel=3, stride=2, pad=1, use_bias=False, scope='conv_0')
        x = instance_norm(x, scope='ins_norm_0')
        x = relu(x)

CycleGAN:

def build_generator(self, input_nc, output_nc, ngf=64, n_downsampling=2, n_blocks=9, norm_layer=nn.BatchNorm2d, use_dropout=False):
    if type(norm_layer) == functools.partial:
        use_bias = norm_layer.func == nn.InstanceNorm2d
    else:
        use_bias = norm_layer == nn.InstanceNorm2d

    model = [nn.ReflectionPad2d(3),
             nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0, bias=use_bias),
             norm_layer(ngf),
             nn.ReLU(True)]

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

U-GAT-IT — Official TensorFlow Implementation (ICLR 2020)

: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Paper | Official Pytorch code

This repository provides the official Tensorflow implementation of the following paper:

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
Junho Kim (NCSOFT), Minjae Kim (NCSOFT), Hyeonwoo Kang (NCSOFT), Kwanghee Lee (Boeing Korea)

Abstract We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based methods which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters.

Requirements

python == 3.6
tensorflow == 1.14

Pretrained model

We released 50 epoch and 100 epoch checkpoints so that people could test more widely.

Dataset

selfie2anime dataset

Web page

Selfie2Anime by Nathan Glover
Selfie2Waifu by creke

Telegram Bot

Selfie2AnimeBot by Alex Spirin

Usage

âââ dataset
Â Â  âââ YOUR_DATASET_NAME
Â Â      âââ trainA
 Â  Â  Â  Â  Â  âââ xxx.jpg (name, format doesn't matter)
           âââ yyy.png
           âââ ...
Â Â      âââ trainB
           âââ zzz.jpg
           âââ www.png
           âââ ...
Â Â      âââ testA
        Â  Â âââ aaa.jpg 
           âââ bbb.png
           âââ ...
Â Â      âââ testB
           âââ ccc.jpg 
           âââ ddd.png
           âââ ...

Train

> python main.py --dataset selfie2anime

If the memory of gpu is not sufficient, set --light to True
- But it may not perform well
- paper version is --light to False

Test

> python main.py --dataset selfie2anime --phase test

Architecture

Results

Ablation study

User study

Kernel Inception Distance (KID)

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{
Kim2020U-GAT-IT:,
title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=BJlZ5ySKPH}
}

Author

Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwanghee Lee

Top Related Projects

CycleGAN

12,306

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot