Top Related Projects
Toward Multimodal Image-to-Image Translation
Unsupervised Image-to-Image Translation
Translate images to unseen domains in the test time with few example images.
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
StarGAN - Official PyTorch Implementation (CVPR 2018)
Quick Overview
MUNIT (Multimodal UNsupervised Image-to-image Translation) is a framework for unsupervised image-to-image translation. It allows for generating diverse outputs without paired training data, enabling the translation between different image domains while preserving content and style separately.
Pros
- Enables unsupervised learning without paired training data
- Supports diverse and multimodal outputs for a single input image
- Provides fine-grained control over the generated images by manipulating content and style separately
- Achieves high-quality results across various image translation tasks
Cons
- Requires significant computational resources for training
- May struggle with complex scenes or highly diverse image domains
- Can sometimes produce unrealistic or artifacts in the generated images
- Limited by the quality and diversity of the training dataset
Code Examples
- Loading a pre-trained MUNIT model:
from MUNIT import MUNIT
model = MUNIT.load_from_checkpoint('path/to/checkpoint.ckpt')
- Performing image translation:
import torch
from PIL import Image
input_image = Image.open('input.jpg')
content, _ = model.encode(input_image)
style = torch.randn(1, model.style_dim, 1, 1)
output_image = model.decode(content, style)
output_image.save('output.jpg')
- Extracting and combining content and style:
content_image = Image.open('content.jpg')
style_image = Image.open('style.jpg')
content, _ = model.encode(content_image)
_, style = model.encode(style_image)
combined_image = model.decode(content, style)
combined_image.save('combined.jpg')
Getting Started
-
Clone the repository:
git clone https://github.com/NVlabs/MUNIT.git cd MUNIT
-
Install dependencies:
pip install -r requirements.txt
-
Download a pre-trained model or train your own:
python train.py --config configs/edges2shoes_folder.yaml
-
Use the model for image translation:
from MUNIT import MUNIT model = MUNIT.load_from_checkpoint('checkpoints/edges2shoes.ckpt') input_image = Image.open('input.jpg') content, _ = model.encode(input_image) style = torch.randn(1, model.style_dim, 1, 1) output_image = model.decode(content, style) output_image.save('output.jpg')
Competitor Comparisons
Toward Multimodal Image-to-Image Translation
Pros of BicycleGAN
- Supports diverse image-to-image translation with explicit style control
- Combines conditional VAE-GAN and conditional Latent Regressor GAN
- Provides better mode coverage and sample diversity
Cons of BicycleGAN
- Limited to paired image-to-image translation tasks
- May struggle with complex, high-resolution images
- Requires paired training data, which can be challenging to obtain
Code Comparison
MUNIT:
def forward(self, x_a, x_b):
c_a = self.gen_a.encode(x_a)
c_b = self.gen_b.encode(x_b)
s_a = self.gen_a.encode(x_a)
s_b = self.gen_b.encode(x_b)
x_ba = self.gen_a.decode(c_b, s_a)
x_ab = self.gen_b.decode(c_a, s_b)
BicycleGAN:
def forward(self, input, z):
z_encoded = self.netE(self.real_B)
z_random = self.get_z_random(input.size(0), self.nz)
fake_B = self.netG(input, z_encoded)
fake_B_random = self.netG(input, z_random)
Unsupervised Image-to-Image Translation
Pros of UNIT
- Pioneered the concept of unsupervised image-to-image translation
- Simpler architecture, potentially easier to understand and implement
- Effective for tasks with similar domain structures
Cons of UNIT
- Limited flexibility in handling multi-modal translations
- May struggle with more complex domain mappings
- Less control over style transfer compared to MUNIT
Code Comparison
UNIT (VAE-GAN architecture):
def forward(self, x_a, x_b):
h_a, n_a = self.gen_a.encode(x_a)
h_b, n_b = self.gen_b.encode(x_b)
x_ba = self.gen_a.decode(h_b + n_b)
x_ab = self.gen_b.decode(h_a + n_a)
return x_ab, x_ba
MUNIT (AdaIN-based architecture):
def forward(self, x_a, x_b):
c_a = self.enc_c_a(x_a)
s_a = self.enc_s_a(x_a)
c_b = self.enc_c_b(x_b)
s_b = self.enc_s_b(x_b)
x_ba = self.gen_b(c_a, s_b)
x_ab = self.gen_a(c_b, s_a)
return x_ab, x_ba
The key difference is MUNIT's separate content and style encoders, allowing for more flexible style transfer and multi-modal translations.
Translate images to unseen domains in the test time with few example images.
Pros of FUNIT
- Supports few-shot unsupervised image-to-image translation
- Can generalize to unseen target classes with just a few examples
- Achieves higher quality results for novel classes compared to MUNIT
Cons of FUNIT
- Requires class-labeled images for training, unlike MUNIT
- May struggle with fine-grained details in some cases
- More complex architecture, potentially harder to implement and train
Code Comparison
MUNIT:
def forward(self, x_a, x_b):
c_a = self.enc_c_a(x_a)
s_a = self.enc_s_a(x_a)
c_b = self.enc_c_b(x_b)
s_b = self.enc_s_b(x_b)
return c_a, s_a, c_b, s_b
FUNIT:
def forward(self, x_s, x_c):
c_s = self.content_encoder(x_s)
s_c = self.class_encoder(x_c)
x_f = self.decoder(c_s, s_c)
return x_f
Both MUNIT and FUNIT are image-to-image translation frameworks developed by NVIDIA Labs. MUNIT focuses on multimodal unsupervised translation between two domains, while FUNIT extends this concept to few-shot unsupervised translation across multiple classes. FUNIT's ability to generalize to unseen classes with limited examples makes it more versatile for certain applications, but it requires class-labeled data for training. MUNIT, on the other hand, offers a simpler approach for two-domain translation without the need for class labels.
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Pros of StarGAN v2
- Supports multi-domain image-to-image translation with a single generator
- Produces higher quality and more diverse outputs than MUNIT
- Better preserves content details while changing style
Cons of StarGAN v2
- More complex architecture, potentially harder to implement and train
- May require more computational resources due to its larger model size
- Less flexibility in controlling specific attributes independently
Code Comparison
StarGAN v2:
def compute_d_loss(self, x_real, y_org, y_trg, z_trg=None, x_ref=None):
assert (z_trg is None) != (x_ref is None)
# ... (implementation details)
return d_loss, d_losses_latent
MUNIT:
def forward(self, x_a, x_b):
c_a, s_a = self.gen_a.encode(x_a)
c_b, s_b = self.gen_b.encode(x_b)
x_ba = self.gen_a.decode(c_b, s_a)
x_ab = self.gen_b.decode(c_a, s_b)
return x_ab, x_ba
The code snippets show that StarGAN v2 focuses on discriminator loss computation, while MUNIT emphasizes encoder-decoder architecture for style transfer between two domains.
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
Pros of contrastive-unpaired-translation
- Improved image quality and diversity in translations
- Better preservation of content and style during translation
- More stable training process with contrastive learning
Cons of contrastive-unpaired-translation
- Potentially higher computational requirements
- May require more fine-tuning for specific datasets
- Slightly more complex implementation
Code Comparison
MUNIT:
def forward(self, x_a, x_b):
c_a = self.gen_a.encode(x_a)
c_b = self.gen_b.encode(x_b)
x_ba = self.gen_a.decode(c_b)
x_ab = self.gen_b.decode(c_a)
return x_ab, x_ba
contrastive-unpaired-translation:
def forward(self, real_A, real_B):
fake_B = self.netG_A(real_A)
rec_A = self.netG_B(fake_B)
fake_A = self.netG_B(real_B)
rec_B = self.netG_A(fake_A)
return fake_B, rec_A, fake_A, rec_B
Both repositories focus on unpaired image-to-image translation, but contrastive-unpaired-translation introduces contrastive learning to improve results. MUNIT uses a multi-modal approach, while contrastive-unpaired-translation emphasizes content preservation and style transfer. The code comparison shows differences in the forward pass implementation, reflecting their distinct approaches to image translation.
StarGAN - Official PyTorch Implementation (CVPR 2018)
Pros of StarGAN
- Simpler architecture, making it easier to implement and train
- Capable of performing multi-domain image-to-image translation with a single model
- Generally faster inference time due to its unified structure
Cons of StarGAN
- Limited flexibility in handling diverse and complex transformations
- May struggle with preserving fine details in some translation tasks
- Less control over specific style attributes compared to MUNIT
Code Comparison
MUNIT uses separate content and style encoders:
content = self.content_encoder(x_a)
style = self.style_encoder(x_b)
images = self.decoder(content, style)
StarGAN uses a single generator with domain labels:
c_trg = self.label2onehot(target_domain, self.c_dim)
x_fake = self.G(x_real, c_trg)
Summary
StarGAN offers a more straightforward approach to multi-domain image translation, while MUNIT provides greater flexibility and control over style attributes. StarGAN is generally faster and easier to implement, but may struggle with complex transformations. MUNIT's separate content and style encoders allow for more nuanced manipulations but at the cost of increased complexity.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
The code base is no longer maintained.
Please check here for an improved implementation of MUNIT: https://github.com/NVlabs/imaginaire/tree/master/projects/munit
MUNIT: Multimodal UNsupervised Image-to-image Translation
License
Copyright (C) 2018 NVIDIA Corporation. All rights reserved. Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
For commercial use, please consult NVIDIA Research Inquiries.
Code usage
Please check out the user manual page.
Paper
Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz, "Multimodal Unsupervised Image-to-Image Translation", ECCV 2018
Results Video
Edges to Shoes/handbags Translation
Animal Image Translation
Street Scene Translation
Yosemite Summer to Winter Translation (HD)
Example-guided Image Translation
Other Implementations
Citation
If you find this code useful for your research, please cite our paper:
@inproceedings{huang2018munit,
title={Multimodal Unsupervised Image-to-image Translation},
author={Huang, Xun and Liu, Ming-Yu and Belongie, Serge and Kautz, Jan},
booktitle={ECCV},
year={2018}
}
Top Related Projects
Toward Multimodal Image-to-Image Translation
Unsupervised Image-to-Image Translation
Translate images to unseen domains in the test time with few example images.
StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)
StarGAN - Official PyTorch Implementation (CVPR 2018)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot