UNIT

Unsupervised Image-to-Image Translation

1,994

363

1,994

View on GitHub

Top Related Projects

CycleGAN

12,438

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

pytorch-CycleGAN-and-pix2pix

23,436

Image-to-Image Translation in PyTorch

MUNIT

2,656

Multimodal Unsupervised Image-to-Image Translation

contrastive-unpaired-translation

2,256

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

stargan

5,236

StarGAN - Official PyTorch Implementation (CVPR 2018)

stargan-v2

3,515

StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

Quick Overview

The UNIT project is a PyTorch implementation of the UNIT (Unsupervised Image-to-Image Translation) framework, which is a generative adversarial network (GAN) model for unsupervised image-to-image translation. It can be used to translate images from one domain to another without the need for paired training data.

Pros

Unsupervised Learning: The UNIT framework can perform image-to-image translation without requiring paired training data, which is a significant advantage over traditional supervised learning approaches.
Versatile Applications: The UNIT model can be applied to a wide range of image-to-image translation tasks, such as style transfer, domain adaptation, and cross-modal translation.
High-Quality Results: The UNIT model has been shown to produce high-quality, realistic-looking translated images in various applications.
Modular Design: The UNIT implementation in this repository is designed in a modular way, making it easy to customize and extend the model for specific use cases.

Cons

Computational Complexity: Training the UNIT model can be computationally intensive, especially for high-resolution images, which may limit its practical applicability in some scenarios.
Hyperparameter Tuning: The UNIT model has several hyperparameters that need to be carefully tuned to achieve optimal performance, which can be a time-consuming process.
Limited Documentation: The project's documentation could be more comprehensive, making it potentially challenging for newcomers to get started with the UNIT framework.
Lack of Pre-trained Models: The repository does not provide any pre-trained UNIT models, which means users need to train the model from scratch for their specific use cases.

Code Examples

Here are a few code examples from the UNIT project:

Training the UNIT Model:

from unit.trainer import Trainer
from unit.config import get_config

config = get_config()
trainer = Trainer(config)
trainer.train()

This code sets up the UNIT trainer and starts the training process based on the configuration specified in the config.py file.

Performing Image-to-Image Translation:

from unit.tester import Tester
from unit.config import get_config

config = get_config()
tester = Tester(config)
tester.test()

This code sets up the UNIT tester and performs image-to-image translation on the test data, based on the configuration specified in the config.py file.

Visualizing Results:

from unit.visualizer import Visualizer
from unit.config import get_config

config = get_config()
visualizer = Visualizer(config)
visualizer.visualize()

This code sets up the UNIT visualizer and generates visualizations of the translated images, based on the configuration specified in the config.py file.

Getting Started

To get started with the UNIT project, follow these steps:

Clone the repository:

git clone https://github.com/mingyuliutw/UNIT.git

Install the required dependencies:

cd UNIT
pip install -r requirements.txt

Prepare your dataset:
- The UNIT framework requires unpaired training data, which means you need to have two sets of images representing the source and target domains.
- Organize your dataset in the appropriate directory structure, as specified in the project's documentation.
Configure the UNIT model:
- Modify the config.py file to set the appropriate hyperparameters and paths to your dataset.
Train the UNIT model:

from unit.trainer import Trainer
from unit.config import get_config

config = get_config()
trainer = Trainer(config)
trainer.train()

Evaluate the model:

from unit.tester import Tester
from unit.config import get_config

config = get_config()
tester = Tester(config)
tester.test()

Visualize the results:

from unit.visualizer import Visualizer
from unit.config import get_config

config = get_config()
visualizer = Visualizer(config)
visualizer.visual

Competitor Comparisons

CycleGAN

12,438

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

CycleGAN is a well-established and widely-used framework for unpaired image-to-image translation, with a large and active community.
The project has extensive documentation, tutorials, and pre-trained models available, making it easier to get started and apply to various tasks.
CycleGAN has been successfully applied to a wide range of applications, from style transfer to domain adaptation, demonstrating its versatility.

Cons of CycleGAN

CycleGAN can be computationally expensive, especially for high-resolution images, due to its use of two generator-discriminator pairs.
The training process of CycleGAN can be sensitive to hyperparameter tuning and may require more effort to achieve optimal performance.
CycleGAN is primarily focused on image-to-image translation and may not be as well-suited for other types of unsupervised learning tasks.

Code Comparison

CycleGAN (junyanz/CycleGAN):

# Define the generator network
def define_G(input_shape, output_shape, ngf=64):
    def conv2d(layer_input, filters, f_size=4, normalize=True):
        """Layers used during downsampling"""
        d = Conv2D(filters, kernel_size=f_size, strides=2, padding='same')(layer_input)
        if normalize:
            d = InstanceNormalization()(d)
        d = Activation('relu')(d)
        return d

UNIT (mingyuliutw/UNIT):

# Define the encoder network
def encoder(input_shape, z_dim, ngf=64):
    def conv2d(layer_input, filters, f_size=4, stride=2, padding='same', normalize=True):
        """Layers used during downsampling"""
        x = Conv2D(filters, kernel_size=f_size, strides=stride, padding=padding)(layer_input)
        if normalize:
            x = InstanceNormalization()(x)
        x = LeakyReLU(alpha=0.2)(x)
        return x

The main difference between the two code snippets is the way the convolutional layers are defined. CycleGAN uses a simpler approach with a single conv2d function, while UNIT has a more detailed conv2d function that allows for more control over the layer parameters, such as stride and padding.

pytorch-CycleGAN-and-pix2pix

23,436

Image-to-Image Translation in PyTorch

Pros of PyTorch-CycleGAN-and-pix2pix

Provides a well-documented and easy-to-use implementation of the CycleGAN and pix2pix models, which are popular for image-to-image translation tasks.
Supports a wide range of datasets and applications, including photo-to-painting, summer-to-winter, and many others.
Includes pre-trained models for several common tasks, allowing users to quickly apply the models to their own data.

Cons of PyTorch-CycleGAN-and-pix2pix

The codebase is primarily focused on image-to-image translation, and may not be as flexible for other types of generative tasks.
The training process can be computationally intensive, especially for larger datasets or higher-resolution images.
The model architecture and hyperparameters may not be optimal for all types of data and tasks, requiring some experimentation and tuning.

Code Comparison

PyTorch-CycleGAN-and-pix2pix:

# Define the generator network
netG = define_G(opt.input_nc, opt.output_nc, opt.ngf, opt.netG, opt.norm,
                not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)

# Define the discriminator network
netD = define_D(opt.input_nc + opt.output_nc, opt.ndf, opt.netD,
                opt.n_layers_D, opt.norm, opt.init_type, opt.init_gain, self.gpu_ids)

UNIT:

# Define the encoder network
self.enc = Encoder(self.input_dim, self.enc_dim, self.n_layers, self.norm, self.activ, self.pad_type)

# Define the decoder network
self.dec = Decoder(self.enc_dim, self.output_dim, self.n_layers, self.norm, self.activ, self.pad_type)

The key differences are that UNIT uses separate encoder and decoder networks, while PyTorch-CycleGAN-and-pix2pix uses a single generator network. Additionally, UNIT allows for more customization of the network architecture and hyperparameters.

MUNIT

2,656

Multimodal Unsupervised Image-to-Image Translation

Pros of MUNIT

MUNIT (Multi-Modal Unsupervised Image-to-image Translation) is a more advanced and flexible framework for image-to-image translation tasks, supporting multiple modalities and diverse applications.
MUNIT leverages a modular architecture, allowing for easier customization and integration with other components.
MUNIT has been shown to produce higher-quality and more diverse translated images compared to UNIT in certain tasks.

Cons of MUNIT

MUNIT is a more complex and computationally intensive framework, which may require more resources and expertise to set up and train effectively.
The MUNIT codebase is less well-documented and has a steeper learning curve compared to the relatively simpler UNIT implementation.

Code Comparison

UNIT (5 lines):

def encode(self, x):
    h = self.fc(x.view(x.size(0), -1))
    mu = self.mu(h)
    logvar = self.logvar(h)
    z = self.reparameterize(mu, logvar)
    return z, mu, logvar

MUNIT (5 lines):

def encode(self, x, get_latent=False):
    h_content, h_style = self.content_encoder(x), self.style_encoder(x)
    if get_latent:
        return h_content, h_style
    else:
        return self.reparameterize(h_content, h_style)

contrastive-unpaired-translation

2,256

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Pros of contrastive-unpaired-translation

Utilizes a contrastive learning approach, which can potentially lead to better performance in unpaired image-to-image translation tasks.
Provides a modular and extensible codebase, allowing for easy customization and experimentation.
Includes pre-trained models for several translation tasks, making it easier to get started.

Cons of contrastive-unpaired-translation

The documentation and setup instructions may not be as comprehensive as UNIT, potentially making it more challenging for new users to get started.
The project is relatively newer and may not have the same level of community support and active development as UNIT.
The performance of the contrastive learning approach may be dependent on the specific task and dataset, and may not always outperform other methods.

Code Comparison

UNIT:

class UNIT(nn.Module):
    def __init__(self, opt):
        super(UNIT, self).__init__()
        self.opt = opt
        self.encoder = Encoder(opt)
        self.decoder = Decoder(opt)
        self.discriminator = Discriminator(opt)
        self.criterionGAN = GANLoss(use_lsgan=not opt.no_lsgan)
        self.criterionCycle = nn.L1Loss()
        self.criterionIdt = nn.L1Loss()

contrastive-unpaired-translation:

class ContrastiveTranslator(nn.Module):
    def __init__(self, opt):
        super(ContrastiveTranslator, self).__init__()
        self.opt = opt
        self.encoder = Encoder(opt)
        self.decoder = Decoder(opt)
        self.discriminator = Discriminator(opt)
        self.contrastive_loss = ContrastiveLoss(opt)
        self.cycle_loss = CycleLoss(opt)
        self.identity_loss = IdentityLoss(opt)

The main differences are the use of a contrastive loss function in contrastive-unpaired-translation, and the different loss functions used (ContrastiveLoss, CycleLoss, IdentityLoss) compared to UNIT (GANLoss, L1Loss).

stargan

5,236

StarGAN - Official PyTorch Implementation (CVPR 2018)

Pros of StarGAN

StarGAN supports multi-domain image-to-image translation, allowing for more versatile and flexible image manipulation.
The code is well-documented and easy to understand, making it more accessible for researchers and developers.
The project has a larger community and more active development, with regular updates and bug fixes.

Cons of StarGAN

UNIT has a more comprehensive set of features, including unsupervised image-to-image translation and domain adaptation.
The training process for StarGAN can be more complex and time-consuming, especially for larger datasets.
The performance of StarGAN may be slightly lower than UNIT in certain tasks, depending on the specific use case.

Code Comparison

UNIT (mingyuliutw/UNIT):

def forward(self, x, y=None, mode='forward'):
    if mode == 'forward':
        z_a = self.encode_a(x)
        z_b = self.encode_b(y)
        x_recon = self.decode_a(z_a)
        y_recon = self.decode_b(z_b)
        x_fake = self.decode_b(z_a)
        y_fake = self.decode_a(z_b)
        return x_recon, y_recon, x_fake, y_fake
    elif mode == 'encode':
        z_a = self.encode_a(x)
        z_b = self.encode_b(y)
        return z_a, z_b
    else:
        raise Exception('Unrecognized mode: %s' % mode)

StarGAN (yunjey/stargan):

def forward(self, x, c_trg, alpha=1.0):
    """Forward pass.
        x: source image
        c_trg: target domain labels
        alpha: strength of domain translation
    """
    c_org = self.encode(x)
    c_mix = torch.clamp(c_org + alpha * (c_trg - c_org), 0, 1)
    x_recon = self.decode(x, c_org)
    x_fake = self.decode(x, c_mix)
    return x_recon, x_fake

stargan-v2

3,515

StarGAN v2 - Official PyTorch Implementation (CVPR 2020)

Pros of StarGAN-v2

Supports multi-domain image-to-image translation, allowing for more diverse and flexible transformations.
Employs a more advanced architecture with improved performance and visual quality compared to UNIT.
Provides a comprehensive implementation with detailed documentation and pre-trained models.

Cons of StarGAN-v2

Requires more computational resources and training time due to the increased complexity of the model.
May be more challenging to customize or adapt to specific use cases compared to the simpler UNIT architecture.
Lacks the unsupervised learning capabilities of UNIT, which can be beneficial in certain scenarios.

Code Comparison

UNIT (mingyuliutw/UNIT):

def get_encoder(self, image, reuse=False, name='encoder'):
    with tf.variable_scope(name, reuse=reuse):
        net = image
        net = conv(net, 64, 4, 2, name='conv1')
        net = conv(net, 128, 4, 2, name='conv2')
        net = conv(net, 256, 4, 2, name='conv3')
        net = conv(net, 512, 4, 2, name='conv4')
        net = flatten(net)
        net = dense(net, 1024, name='fc')
        return net

StarGAN-v2 (clovaai/stargan-v2):

def build_generator(self, x, c, mode='train'):
    """Builds the generator network."""
    channels = self.img_channels
    assert c.shape[1] == self.c_dim

    net = x
    net = conv(net, 64, 4, 2, use_bias=False, sn=True, name='conv1')
    net = lrelu(net, 0.01)
    net = conv(net, 128, 4, 2, use_bias=False, sn=True, name='conv2')
    net = lrelu(net, 0.01)
    net = conv(net, 256, 4, 2, use_bias=False, sn=True, name='conv3')
    net = lrelu(net, 0.01)
    net = conv(net, 512, 4, 2, use_bias=False, sn=True, name='conv4')
    net = lrelu(net, 0.01)
    net = conv(net, 1024, 4, 2, use_bias=False, sn=True, name='conv5')
    net = lrelu(net, 0.01)
    net = conv(net, 1024, 4, 1, use_bias=False, sn=True, name='conv6')
    net = lrelu(net, 0.01)
    net = conv(net, channels, 3, 1, use_bias=False, sn=True, name='conv7')
    net = tanh(net)
    return net

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

UNIT: UNsupervised Image-to-image Translation Networks

New implementation available at imaginaire repository

We have a reimplementation of the UNIT method that is more performant. It is avaiable at Imaginaire

License

Code usage

Please check out our tutorial.
For multimodal (or many-to-many) image translation, please check out our new work on MUNIT.

What's new.

05-02-2018: We now adapt MUNIT code structure. For reproducing experiment results in the NIPS paper, please check out version_02 branch.
12-21-2017: Release pre-trained synthia-to-cityscape image translation model. See USAGE.md for usage examples.
12-14-2017: Added multi-scale discriminators described in the pix2pixHD paper. To use it simply make the name of the discriminator COCOMsDis.

Paper

Ming-Yu Liu, Thomas Breuel, Jan Kautz, "Unsupervised Image-to-Image Translation Networks" NIPS 2017 Spotlight, arXiv:1703.00848 2017

Two Minute Paper Summary

(We thank the Two Minute Papers channel for summarizing our work.)

The Shared Latent Space Assumption

Result Videos

More image results are available in the Google Photo Album.

Left: input. Right: neural network generated. Resolution: 640x480

Street Scene Image Translation

From the first row to the fourth row, we show example results on day to night, sunny to rainy, summery to snowy, and real to synthetic image translation (two directions).

For each image pair, left is the input image; right is the machine generated image.

Dog Breed Image Translation

Cat Species Image Translation

Attribute-based Face Image Translation

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot