Top Related Projects
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Image-to-Image Translation in PyTorch
Multimodal Unsupervised Image-to-Image Translation
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
StarGAN - Official PyTorch Implementation (CVPR 2018)
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Quick Overview
The UNIT project is a PyTorch implementation of the UNIT (Unsupervised Image-to-Image Translation) framework, which is a generative adversarial network (GAN) model for unsupervised image-to-image translation. It can be used to translate images from one domain to another without the need for paired training data.
Pros
- Unsupervised Learning: The UNIT framework can perform image-to-image translation without requiring paired training data, which is a significant advantage over traditional supervised learning approaches.
- Versatile Applications: The UNIT model can be applied to a wide range of image-to-image translation tasks, such as style transfer, domain adaptation, and cross-modal translation.
- High-Quality Results: The UNIT model has been shown to produce high-quality, realistic-looking translated images in various applications.
- Modular Design: The UNIT implementation in this repository is designed in a modular way, making it easy to customize and extend the model for specific use cases.
Cons
- Computational Complexity: Training the UNIT model can be computationally intensive, especially for high-resolution images, which may limit its practical applicability in some scenarios.
- Hyperparameter Tuning: The UNIT model has several hyperparameters that need to be carefully tuned to achieve optimal performance, which can be a time-consuming process.
- Limited Documentation: The project's documentation could be more comprehensive, making it potentially challenging for newcomers to get started with the UNIT framework.
- Lack of Pre-trained Models: The repository does not provide any pre-trained UNIT models, which means users need to train the model from scratch for their specific use cases.
Code Examples
Here are a few code examples from the UNIT project:
- Training the UNIT Model:
from unit.trainer import Trainer
from unit.config import get_config
config = get_config()
trainer = Trainer(config)
trainer.train()
This code sets up the UNIT trainer and starts the training process based on the configuration specified in the config.py
file.
- Performing Image-to-Image Translation:
from unit.tester import Tester
from unit.config import get_config
config = get_config()
tester = Tester(config)
tester.test()
This code sets up the UNIT tester and performs image-to-image translation on the test data, based on the configuration specified in the config.py
file.
- Visualizing Results:
from unit.visualizer import Visualizer
from unit.config import get_config
config = get_config()
visualizer = Visualizer(config)
visualizer.visualize()
This code sets up the UNIT visualizer and generates visualizations of the translated images, based on the configuration specified in the config.py
file.
Getting Started
To get started with the UNIT project, follow these steps:
- Clone the repository:
git clone https://github.com/mingyuliutw/UNIT.git
- Install the required dependencies:
cd UNIT
pip install -r requirements.txt
-
Prepare your dataset:
- The UNIT framework requires unpaired training data, which means you need to have two sets of images representing the source and target domains.
- Organize your dataset in the appropriate directory structure, as specified in the project's documentation.
-
Configure the UNIT model:
- Modify the
config.py
file to set the appropriate hyperparameters and paths to your dataset.
- Modify the
-
Train the UNIT model:
from unit.trainer import Trainer
from unit.config import get_config
config = get_config()
trainer = Trainer(config)
trainer.train()
- Evaluate the model:
from unit.tester import Tester
from unit.config import get_config
config = get_config()
tester = Tester(config)
tester.test()
- Visualize the results:
from unit.visualizer import Visualizer
from unit.config import get_config
config = get_config()
visualizer = Visualizer(config)
visualizer.visual
Competitor Comparisons
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Pros of CycleGAN
- CycleGAN is a well-established and widely-used framework for unpaired image-to-image translation, with a large and active community.
- The project has extensive documentation, tutorials, and pre-trained models available, making it easier to get started and apply to various tasks.
- CycleGAN has been successfully applied to a wide range of applications, from style transfer to domain adaptation, demonstrating its versatility.
Cons of CycleGAN
- CycleGAN can be computationally expensive, especially for high-resolution images, due to its use of two generator-discriminator pairs.
- The training process of CycleGAN can be sensitive to hyperparameter tuning and may require more effort to achieve optimal performance.
- CycleGAN is primarily focused on image-to-image translation and may not be as well-suited for other types of unsupervised learning tasks.
Code Comparison
CycleGAN (junyanz/CycleGAN):
# Define the generator network
def define_G(input_shape, output_shape, ngf=64):
def conv2d(layer_input, filters, f_size=4, normalize=True):
"""Layers used during downsampling"""
d = Conv2D(filters, kernel_size=f_size, strides=2, padding='same')(layer_input)
if normalize:
d = InstanceNormalization()(d)
d = Activation('relu')(d)
return d
UNIT (mingyuliutw/UNIT):
# Define the encoder network
def encoder(input_shape, z_dim, ngf=64):
def conv2d(layer_input, filters, f_size=4, stride=2, padding='same', normalize=True):
"""Layers used during downsampling"""
x = Conv2D(filters, kernel_size=f_size, strides=stride, padding=padding)(layer_input)
if normalize:
x = InstanceNormalization()(x)
x = LeakyReLU(alpha=0.2)(x)
return x
The main difference between the two code snippets is the way the convolutional layers are defined. CycleGAN uses a simpler approach with a single conv2d
function, while UNIT has a more detailed conv2d
function that allows for more control over the layer parameters, such as stride and padding.
Image-to-Image Translation in PyTorch
Pros of PyTorch-CycleGAN-and-pix2pix
- Provides a well-documented and easy-to-use implementation of the CycleGAN and pix2pix models, which are popular for image-to-image translation tasks.
- Supports a wide range of datasets and applications, including photo-to-painting, summer-to-winter, and many others.
- Includes pre-trained models for several common tasks, allowing users to quickly apply the models to their own data.
Cons of PyTorch-CycleGAN-and-pix2pix
- The codebase is primarily focused on image-to-image translation, and may not be as flexible for other types of generative tasks.
- The training process can be computationally intensive, especially for larger datasets or higher-resolution images.
- The model architecture and hyperparameters may not be optimal for all types of data and tasks, requiring some experimentation and tuning.
Code Comparison
PyTorch-CycleGAN-and-pix2pix:
# Define the generator network
netG = define_G(opt.input_nc, opt.output_nc, opt.ngf, opt.netG, opt.norm,
not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)
# Define the discriminator network
netD = define_D(opt.input_nc + opt.output_nc, opt.ndf, opt.netD,
opt.n_layers_D, opt.norm, opt.init_type, opt.init_gain, self.gpu_ids)
UNIT:
# Define the encoder network
self.enc = Encoder(self.input_dim, self.enc_dim, self.n_layers, self.norm, self.activ, self.pad_type)
# Define the decoder network
self.dec = Decoder(self.enc_dim, self.output_dim, self.n_layers, self.norm, self.activ, self.pad_type)
The key differences are that UNIT uses separate encoder and decoder networks, while PyTorch-CycleGAN-and-pix2pix uses a single generator network. Additionally, UNIT allows for more customization of the network architecture and hyperparameters.
Multimodal Unsupervised Image-to-Image Translation
Pros of MUNIT
- MUNIT (Multi-Modal Unsupervised Image-to-image Translation) is a more advanced and flexible framework for image-to-image translation tasks, supporting multiple modalities and diverse applications.
- MUNIT leverages a modular architecture, allowing for easier customization and integration with other components.
- MUNIT has been shown to produce higher-quality and more diverse translated images compared to UNIT in certain tasks.
Cons of MUNIT
- MUNIT is a more complex and computationally intensive framework, which may require more resources and expertise to set up and train effectively.
- The MUNIT codebase is less well-documented and has a steeper learning curve compared to the relatively simpler UNIT implementation.
Code Comparison
UNIT (5 lines):
def encode(self, x):
h = self.fc(x.view(x.size(0), -1))
mu = self.mu(h)
logvar = self.logvar(h)
z = self.reparameterize(mu, logvar)
return z, mu, logvar
MUNIT (5 lines):
def encode(self, x, get_latent=False):
h_content, h_style = self.content_encoder(x), self.style_encoder(x)
if get_latent:
return h_content, h_style
else:
return self.reparameterize(h_content, h_style)
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
Pros of contrastive-unpaired-translation
- Utilizes a contrastive learning approach, which can potentially lead to better performance in unpaired image-to-image translation tasks.
- Provides a modular and extensible codebase, allowing for easy customization and experimentation.
- Includes pre-trained models for several translation tasks, making it easier to get started.
Cons of contrastive-unpaired-translation
- The documentation and setup instructions may not be as comprehensive as UNIT, potentially making it more challenging for new users to get started.
- The project is relatively newer and may not have the same level of community support and active development as UNIT.
- The performance of the contrastive learning approach may be dependent on the specific task and dataset, and may not always outperform other methods.
Code Comparison
UNIT:
class UNIT(nn.Module):
def __init__(self, opt):
super(UNIT, self).__init__()
self.opt = opt
self.encoder = Encoder(opt)
self.decoder = Decoder(opt)
self.discriminator = Discriminator(opt)
self.criterionGAN = GANLoss(use_lsgan=not opt.no_lsgan)
self.criterionCycle = nn.L1Loss()
self.criterionIdt = nn.L1Loss()
contrastive-unpaired-translation:
class ContrastiveTranslator(nn.Module):
def __init__(self, opt):
super(ContrastiveTranslator, self).__init__()
self.opt = opt
self.encoder = Encoder(opt)
self.decoder = Decoder(opt)
self.discriminator = Discriminator(opt)
self.contrastive_loss = ContrastiveLoss(opt)
self.cycle_loss = CycleLoss(opt)
self.identity_loss = IdentityLoss(opt)
The main differences are the use of a contrastive loss function in contrastive-unpaired-translation, and the different loss functions used (ContrastiveLoss, CycleLoss, IdentityLoss) compared to UNIT (GANLoss, L1Loss).
StarGAN - Official PyTorch Implementation (CVPR 2018)
Pros of StarGAN
- StarGAN supports multi-domain image-to-image translation, allowing for more versatile and flexible image manipulation.
- The code is well-documented and easy to understand, making it more accessible for researchers and developers.
- The project has a larger community and more active development, with regular updates and bug fixes.
Cons of StarGAN
- UNIT has a more comprehensive set of features, including unsupervised image-to-image translation and domain adaptation.
- The training process for StarGAN can be more complex and time-consuming, especially for larger datasets.
- The performance of StarGAN may be slightly lower than UNIT in certain tasks, depending on the specific use case.
Code Comparison
UNIT (mingyuliutw/UNIT):
def forward(self, x, y=None, mode='forward'):
if mode == 'forward':
z_a = self.encode_a(x)
z_b = self.encode_b(y)
x_recon = self.decode_a(z_a)
y_recon = self.decode_b(z_b)
x_fake = self.decode_b(z_a)
y_fake = self.decode_a(z_b)
return x_recon, y_recon, x_fake, y_fake
elif mode == 'encode':
z_a = self.encode_a(x)
z_b = self.encode_b(y)
return z_a, z_b
else:
raise Exception('Unrecognized mode: %s' % mode)
StarGAN (yunjey/stargan):
def forward(self, x, c_trg, alpha=1.0):
"""Forward pass.
x: source image
c_trg: target domain labels
alpha: strength of domain translation
"""
c_org = self.encode(x)
c_mix = torch.clamp(c_org + alpha * (c_trg - c_org), 0, 1)
x_recon = self.decode(x, c_org)
x_fake = self.decode(x, c_mix)
return x_recon, x_fake
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Pros of StarGAN-v2
- Supports multi-domain image-to-image translation, allowing for more diverse and flexible transformations.
- Employs a more advanced architecture with improved performance and visual quality compared to UNIT.
- Provides a comprehensive implementation with detailed documentation and pre-trained models.
Cons of StarGAN-v2
- Requires more computational resources and training time due to the increased complexity of the model.
- May be more challenging to customize or adapt to specific use cases compared to the simpler UNIT architecture.
- Lacks the unsupervised learning capabilities of UNIT, which can be beneficial in certain scenarios.
Code Comparison
UNIT (mingyuliutw/UNIT):
def get_encoder(self, image, reuse=False, name='encoder'):
with tf.variable_scope(name, reuse=reuse):
net = image
net = conv(net, 64, 4, 2, name='conv1')
net = conv(net, 128, 4, 2, name='conv2')
net = conv(net, 256, 4, 2, name='conv3')
net = conv(net, 512, 4, 2, name='conv4')
net = flatten(net)
net = dense(net, 1024, name='fc')
return net
StarGAN-v2 (clovaai/stargan-v2):
def build_generator(self, x, c, mode='train'):
"""Builds the generator network."""
channels = self.img_channels
assert c.shape[1] == self.c_dim
net = x
net = conv(net, 64, 4, 2, use_bias=False, sn=True, name='conv1')
net = lrelu(net, 0.01)
net = conv(net, 128, 4, 2, use_bias=False, sn=True, name='conv2')
net = lrelu(net, 0.01)
net = conv(net, 256, 4, 2, use_bias=False, sn=True, name='conv3')
net = lrelu(net, 0.01)
net = conv(net, 512, 4, 2, use_bias=False, sn=True, name='conv4')
net = lrelu(net, 0.01)
net = conv(net, 1024, 4, 2, use_bias=False, sn=True, name='conv5')
net = lrelu(net, 0.01)
net = conv(net, 1024, 4, 1, use_bias=False, sn=True, name='conv6')
net = lrelu(net, 0.01)
net = conv(net, channels, 3, 1, use_bias=False, sn=True, name='conv7')
net = tanh(net)
return net
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
UNIT: UNsupervised Image-to-image Translation Networks
New implementation available at imaginaire repository
We have a reimplementation of the UNIT method that is more performant. It is avaiable at Imaginaire
License
Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
Code usage
-
Please check out our tutorial.
-
For multimodal (or many-to-many) image translation, please check out our new work on MUNIT.
What's new.
-
05-02-2018: We now adapt MUNIT code structure. For reproducing experiment results in the NIPS paper, please check out version_02 branch.
-
12-21-2017: Release pre-trained synthia-to-cityscape image translation model. See USAGE.md for usage examples.
-
12-14-2017: Added multi-scale discriminators described in the pix2pixHD paper. To use it simply make the name of the discriminator COCOMsDis.
Paper
Two Minute Paper Summary
(We thank the Two Minute Papers channel for summarizing our work.)
The Shared Latent Space Assumption
Result Videos
More image results are available in the Google Photo Album.
Left: input. Right: neural network generated. Resolution: 640x480
Left: input. Right: neural network generated. Resolution: 640x480
- Snowy2Summery-01
- Snowy2Summery-02
- Day2Night-01
- Day2Night-02
- Translation Between 5 dog breeds
- Translation Between 6 cat species
Street Scene Image Translation
From the first row to the fourth row, we show example results on day to night, sunny to rainy, summery to snowy, and real to synthetic image translation (two directions).
For each image pair, left is the input image; right is the machine generated image.
Dog Breed Image Translation
Cat Species Image Translation
Attribute-based Face Image Translation
Top Related Projects
Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
Image-to-Image Translation in PyTorch
Multimodal Unsupervised Image-to-Image Translation
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
StarGAN - Official PyTorch Implementation (CVPR 2018)
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot