Convert Figma logo to code with AI

NVlabs logoFUNIT

Translate images to unseen domains in the test time with few example images.

1,574
238
1,574
35

Top Related Projects

12,306

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Toward Multimodal Image-to-Image Translation

Image-to-Image Translation in PyTorch

Quick Overview

FUNIT (Few-shot Unsupervised Image-to-Image Translation) is a deep learning-based framework for performing unsupervised image-to-image translation tasks using only a few reference images. It can be used to transfer the style or appearance of one image to another, without requiring paired training data.

Pros

  • Unsupervised Learning: FUNIT can perform image-to-image translation without the need for paired training data, which is often difficult and expensive to obtain.
  • Few-shot Learning: The model can adapt to new domains using only a few reference images, making it more flexible and practical for real-world applications.
  • Diverse Outputs: The framework can generate diverse and high-quality translated images, capturing the style and appearance of the reference images.
  • Versatile Applications: FUNIT can be applied to a wide range of image-to-image translation tasks, such as style transfer, domain adaptation, and image editing.

Cons

  • Computational Complexity: The training and inference of FUNIT can be computationally intensive, especially for high-resolution images.
  • Limited Controllability: While the model can generate diverse outputs, the user may have limited control over the specific aspects of the translation process.
  • Potential Bias: The quality and diversity of the translated images may be influenced by the distribution of the training data, which could introduce biases.
  • Lack of Interpretability: The inner workings of the deep learning model can be difficult to interpret, making it challenging to understand the reasoning behind the generated outputs.

Getting Started

To get started with FUNIT, you can follow the instructions provided in the project's README file on GitHub. Here's a brief overview:

  1. Clone the repository:
git clone https://github.com/NVlabs/FUNIT.git
cd FUNIT
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Prepare your dataset:

    • FUNIT supports various image datasets, such as CelebA, AFHQ, and Landscapes.
    • Ensure that your dataset is organized in the appropriate directory structure as described in the README.
  2. Train the FUNIT model:

python train.py --config configs/your_config.yaml
  • Customize the configuration file (your_config.yaml) to specify the dataset, training parameters, and other settings.
  1. Perform image-to-image translation:
from funit.inference import FUNITInference
funit = FUNITInference(config_path='configs/your_config.yaml')
funit.translate(source_image, reference_images)
  • Provide the source image and reference images to the translate() method to generate the translated output.
  1. Evaluate the results:

    • The project includes various evaluation metrics and visualization tools to assess the quality and diversity of the translated images.

Competitor Comparisons

12,306

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

  • CycleGAN is a more general-purpose image-to-image translation model, capable of learning mappings between various domains (e.g., horses to zebras, summer to winter).
  • CycleGAN has been widely adopted and has a large community, with many pre-trained models and applications available.
  • CycleGAN is well-documented and has a user-friendly codebase, making it easier for researchers and developers to work with.

Cons of CycleGAN

  • CycleGAN requires paired training data, which can be difficult to obtain in some cases.
  • CycleGAN may struggle with more complex or high-resolution image translations compared to specialized models like FUNIT.
  • CycleGAN can be computationally expensive, especially for large-scale or high-resolution tasks.

Code Comparison

CycleGAN (junyanz/CycleGAN):

# Define the generator network
gen_A = build_generator(input_shape=(256, 256, 3))
gen_B = build_generator(input_shape=(256, 256, 3))

# Define the discriminator networks
disc_A = build_discriminator(input_shape=(256, 256, 3))
disc_B = build_discriminator(input_shape=(256, 256, 3))

FUNIT (NVlabs/FUNIT):

# Define the generator network
generator = Generator(input_shape=(256, 256, 3), num_classes=10)

# Define the discriminator network
discriminator = Discriminator(input_shape=(256, 256, 3), num_classes=10)

Toward Multimodal Image-to-Image Translation

Pros of BicycleGAN

  • BicycleGAN is capable of generating diverse and realistic images, which can be useful for applications such as image-to-image translation and data augmentation.
  • The model is trained in an unsupervised manner, which means it can be applied to a wide range of datasets without the need for extensive labeling.
  • The code is well-documented and easy to use, with clear instructions for training and inference.

Cons of BicycleGAN

  • BicycleGAN may not be as effective as FUNIT for few-shot learning tasks, as it does not explicitly model the relationship between different domains.
  • The model can be computationally expensive to train, especially on large datasets, due to the complexity of the network architecture.
  • The quality of the generated images may not be as high as some other state-of-the-art image generation models, particularly for certain types of images.

Code Comparison

Here's a brief comparison of the code structure between BicycleGAN and FUNIT:

BicycleGAN:

class BicycleGANModel(BaseModel):
    def name(self):
        return 'BicycleGANModel'

    def initialize(self, opt):
        BaseModel.initialize(self, opt)
        self.isTrain = opt.isTrain
        # define tensors
        self.input_A = self.Tensor(opt.batchSize, opt.input_nc,
                                  opt.fineSize, opt.fineSize)
        self.input_B = self.Tensor(opt.batchSize, opt.output_nc,
                                  opt.fineSize, opt.fineSize)

FUNIT:

class FUNIT(nn.Module):
    def __init__(self, args):
        super(FUNIT, self).__init__()
        self.args = args
        self.content_encoder = ContentEncoder(args)
        self.style_encoder = StyleEncoder(args)
        self.generator = Generator(args)
        self.discriminator = Discriminator(args)
        self.vgg = VGGFeatureExtractor(args)

The key difference is that FUNIT has a more modular design, with separate encoder and generator components, while BicycleGAN has a more monolithic architecture. This allows FUNIT to be more flexible and adaptable to different few-shot learning tasks.

Image-to-Image Translation in PyTorch

Pros of PyTorch-CycleGAN-and-pix2pix

  • Supports a wider range of image-to-image translation tasks, including photo-to-painting, horse-to-zebra, and more.
  • Provides pre-trained models for several common tasks, making it easier to get started.
  • Includes detailed documentation and examples, making it more accessible for beginners.

Cons of PyTorch-CycleGAN-and-pix2pix

  • May not be as performant as FUNIT for specific tasks like few-shot image translation.
  • Requires more computational resources, as it uses a more general-purpose architecture.
  • May not be as flexible or customizable as FUNIT for advanced users.

Code Comparison

PyTorch-CycleGAN-and-pix2pix:

# Train the model
train_loader = create_dataset(opt)
model = create_model(opt)
model.setup(opt)
model.train()
for epoch in range(opt.epoch_count, opt.niter + opt.niter_decay + 1):
    model.update_learning_rate()
    for i, data in enumerate(train_loader):
        model.set_input(data)
        model.optimize_parameters()

FUNIT:

# Train the model
train_loader = get_dataloader(opt)
model = FUNIT(opt)
for epoch in range(opt.num_epochs):
    for batch in train_loader:
        model.train_on_batch(batch)
    model.save_checkpoint(epoch)

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

License CC BY-NC-SA 4.0 Python 3.7

FUNIT: Few-Shot Unsupervised Image-to-Image Translation

animal swap gif

Project page | Paper | FUNIT Explained | GANimal Demo Video | Have fun with GANimal

Few-shot Unsueprvised Image-to-Image Translation
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz.
In arXiv 2019.

License

Copyright (C) 2019 NVIDIA Corporation.

All rights reserved. Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International)

The code is released for academic research use only. For commercial use, please contact researchinquiries@nvidia.com.

For press and other inquiries, please contact Hector Marinez

Installation

  • Clone this repo git clone https://github.com/NVlabs/FUNIT.git
  • Install CUDA10.0+
  • Install cuDNN7.5
  • Install Anaconda3
  • Install required python pakcages
    • conda install -y pytorch torchvision cudatoolkit=10.0 -c pytorch
    • conda install -y -c anaconda pip
    • pip install pyyaml tensorboardX
    • conda install -y -c menpo opencv3

To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs.

Hardware Requirement

To reproduce the experiment results reported in our ICCV paper, you would need to have an NVIDIA DGX1 machine with 8 V100 32GB GPUs. The training will use all 8 GPUS and take almost all of the GPU memory. It would take about 2 weeks to finish the training.

Dataset Preparation

Animal Face Dataset

We are releasing the Animal Face dataset. If you use this dataset in your publication, please cite the FUNIT paper.

cd dataset
wget http://www.image-net.org/challenges/LSVRC/2012/nnoupb/ILSVRC2012_img_train.tar
tar xvf ILSVRC2012_img_train.tar
  • The training images should be in datasets/ILSVRC/Data/CLS-LOC/train. Now, extract the animal face images by running
python tools/extract_animalfaces.py datasets/ILSVRC/Data/CLS-LOC/train --output_folder datasets/animals --coor_file datasets/animalface_coordinates.txt
  • The animal face images should be in datasets/animals. Note there are 149 folders. Each folder contains images of one animal kind. The number of images of the dataset is 117,484.
  • We use 119 animal kinds for training and the ramining 30 animal kinds for evaluation.

Training

Once the animal face dataset is prepared, you can train an animal face translation model by running.

python train.py --config configs/funit_animals.yaml --multigpus

The training results including the checkpoints and intermediate results will be stored in outputs/funit_animals.

For custom dataset, you would need to write an new configuration file. Please create one based on the example config file.

Testing pretrained model

To test the pretrained model, please first create a folder pretrained under the root folder. Then, we need to downlowad the pretrained models via the link and save it in pretrained. Untar the file tar xvf pretrained.tar.gz.

Now, we can test the translation

python test_k_shot.py --config configs/funit_animals.yaml --ckpt pretrained/animal149_gen.pt --input images/input_content.jpg --class_image_folder images/n02138411 --output images/output.jpg

The above command with translate the input image

images/input_content.jpg

input image

to an output meerkat image

output image

by using a set of 5 example meerkat images

Citation

If you use this code for your research, please cite our papers.

@inproceedings{liu2019few,
  title={Few-shot Unsueprvised Image-to-Image Translation},
  author={Ming-Yu Liu and Xun Huang and Arun Mallya and Tero Karras and Timo Aila and Jaakko Lehtinen and Jan Kautz.},
  booktitle={arxiv},
  year={2019}
}