UGATIT
Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)
Top Related Projects
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Quick Overview
The UGATIT project is a PyTorch implementation of the Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization (UGATIT) model, which is a state-of-the-art image-to-image translation framework. It allows for high-quality and diverse image-to-image translation tasks, such as converting sketches to realistic images or transforming day-to-night scenes.
Pros
- High-Quality Translations: The UGATIT model produces high-quality and realistic image translations, outperforming many existing image-to-image translation methods.
- Diverse Outputs: The model can generate diverse and varied output images, capturing the multi-modal nature of the translation task.
- Unsupervised Learning: The UGATIT framework is an unsupervised learning approach, which means it can be trained without the need for paired training data.
- Adaptive Layer-Instance Normalization: The model's use of Adaptive Layer-Instance Normalization (AdaLIN) allows for better preservation of local and global features during the translation process.
Cons
- Computational Complexity: The UGATIT model can be computationally expensive, especially for high-resolution image translations, which may limit its real-time applications.
- Hyperparameter Tuning: The model's performance can be sensitive to the choice of hyperparameters, which may require extensive experimentation and tuning.
- Limited Datasets: The model's performance may be limited by the availability of suitable datasets for specific image-to-image translation tasks.
- Potential Bias: Like many deep learning models, the UGATIT framework may learn and perpetuate biases present in the training data.
Code Examples
Since this is a code library, here are a few short code examples to demonstrate its usage:
import torch
from ugatit import UGATIT
# Initialize the UGATIT model
model = UGATIT(lr=0.0001, ch=64, n_res=4, n_dis=6, img_size=256)
# Load pre-trained weights
model.load_state_dict(torch.load('pretrained_model.pth'))
# Perform image-to-image translation
source_image = torch.randn(1, 3, 256, 256)
translated_image = model.test(source_image)
This code snippet demonstrates how to initialize the UGATIT model, load pre-trained weights, and perform image-to-image translation on a random input image.
# Train the UGATIT model
train_loader = # Load training data
model.train(train_loader, num_epochs=100)
This code snippet shows how to train the UGATIT model using a PyTorch data loader and training for 100 epochs.
# Evaluate the UGATIT model
val_loader = # Load validation data
metrics = model.evaluate(val_loader)
print(f'Validation Metrics: {metrics}')
This code snippet demonstrates how to evaluate the UGATIT model on a validation dataset and print the resulting metrics.
Getting Started
To get started with the UGATIT project, follow these steps:
-
Clone the repository:
git clone https://github.com/taki0112/UGATIT.git
-
Install the required dependencies:
cd UGATIT pip install -r requirements.txt
-
Download the pre-trained model weights:
wget https://drive.google.com/uc?id=1rSoeq6RwMj8C4FqmDXvmh_4RvuL1OSXE -O pretrained_model.pth
-
Run the inference script to translate an image:
python inference.py --img_path path/to/your/image.jpg --result_dir output_dir
-
(Optional) Train the UGATIT model on your own dataset:
python main.py --dataset_name your_dataset --img_size 256 --ch 64 --n_res 4 --n_dis 6 --iteration 200
Competitor Comparisons
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Pros of CycleGAN
- CycleGAN is a more general-purpose image-to-image translation framework, capable of handling a wider range of tasks beyond just style transfer.
- The CycleGAN paper has been highly influential in the field of generative adversarial networks (GANs) and has been widely cited.
- CycleGAN has a larger and more active community, with more resources and support available.
Cons of CycleGAN
- UGATIT is specifically designed for high-quality style transfer, and may produce better results for this particular task.
- CycleGAN can be more complex to set up and configure, especially for users new to the field of GANs.
- The training process for CycleGAN can be more computationally intensive and time-consuming compared to UGATIT.
Code Comparison
UGATIT:
def generator(self, x, is_training=True, reuse=False):
with tf.variable_scope("generator", reuse=reuse):
ch = self.ch
x = conv(x, ch, kernel=7, stride=1, pad=3, use_bias=False, scope='conv')
x = instance_norm(x, scope='ins_norm')
x = relu(x)
x = conv(x, ch*2, kernel=3, stride=2, pad=1, use_bias=False, scope='conv_0')
x = instance_norm(x, scope='ins_norm_0')
x = relu(x)
CycleGAN:
def build_generator(self, input_nc, output_nc, ngf=64, n_downsampling=2, n_blocks=9, norm_layer=nn.BatchNorm2d, use_dropout=False):
if type(norm_layer) == functools.partial:
use_bias = norm_layer.func == nn.InstanceNorm2d
else:
use_bias = norm_layer == nn.InstanceNorm2d
model = [nn.ReflectionPad2d(3),
nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0, bias=use_bias),
norm_layer(ngf),
nn.ReLU(True)]
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
U-GAT-IT — Official TensorFlow Implementation (ICLR 2020)
: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
Paper | Official Pytorch code
This repository provides the official Tensorflow implementation of the following paper:
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
Junho Kim (NCSOFT), Minjae Kim (NCSOFT), Hyeonwoo Kang (NCSOFT), Kwanghee Lee (Boeing Korea)Abstract We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based methods which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters.
Requirements
- python == 3.6
- tensorflow == 1.14
Pretrained model
We released 50 epoch and 100 epoch checkpoints so that people could test more widely.
Dataset
Web page
Telegram Bot
Usage
âââ dataset
  âââ YOUR_DATASET_NAME
  âââ trainA
     âââ xxx.jpg (name, format doesn't matter)
âââ yyy.png
âââ ...
  âââ trainB
âââ zzz.jpg
âââ www.png
âââ ...
  âââ testA
  âââ aaa.jpg
âââ bbb.png
âââ ...
  âââ testB
âââ ccc.jpg
âââ ddd.png
âââ ...
Train
> python main.py --dataset selfie2anime
- If the memory of gpu is not sufficient, set
--light
to True- But it may not perform well
- paper version is
--light
to False
Test
> python main.py --dataset selfie2anime --phase test
Architecture
Results
Ablation study
User study
Kernel Inception Distance (KID)
Citation
If you find this code useful for your research, please cite our paper:
@inproceedings{
Kim2020U-GAT-IT:,
title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=BJlZ5ySKPH}
}
Author
Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwanghee Lee
Top Related Projects
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot