pix2pix-tensorflow

Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets https://phillipi.github.io/pix2pix/

5,096

1,296

5,096

144

View on GitHub

Top Related Projects

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

pix2pixHD

6,789

Synthesizing and manipulating 2048x1024 images with conditional GANs

pix2pix

10,444

Image-to-image translation with conditional adversarial nets

Keras-GAN

9,237

Keras implementations of Generative Adversarial Networks.

Quick Overview

The affinelayer/pix2pix-tensorflow repository is an implementation of the pix2pix image-to-image translation model using TensorFlow. It allows users to train and use models for various image translation tasks, such as converting sketches to photos, colorizing black and white images, or generating maps from aerial photographs.

Pros

Implements the popular pix2pix model in TensorFlow, making it accessible to a wide range of developers
Includes pre-trained models for quick experimentation and testing
Provides comprehensive documentation and examples for training and using the models
Supports both training and testing on CPU and GPU

Cons

Requires significant computational resources for training, especially for large datasets
May produce inconsistent results for complex image translation tasks
Limited to 256x256 pixel images, which may not be suitable for all applications
Depends on older versions of TensorFlow, which may cause compatibility issues with newer systems

Code Examples

Loading a pre-trained model and generating an output image:

import tensorflow as tf
from pix2pix import Pix2Pix

model = Pix2Pix()
model.load("pretrained_model")

input_image = tf.io.read_file("input.png")
input_image = tf.image.decode_png(input_image, channels=3)
input_image = tf.image.resize(input_image, [256, 256])

output_image = model.generate(input_image)
tf.io.write_file("output.png", tf.image.encode_png(output_image[0]))

Training a new pix2pix model:

from pix2pix import Pix2Pix

model = Pix2Pix()
model.train(input_dir="train_data", 
            output_dir="output", 
            max_epochs=200, 
            input_prefix="input", 
            output_prefix="output")

Applying style transfer to a batch of images:

import tensorflow as tf
from pix2pix import Pix2Pix

model = Pix2Pix()
model.load("style_transfer_model")

input_images = tf.data.Dataset.from_tensor_slices(["image1.png", "image2.png", "image3.png"])
input_images = input_images.map(lambda x: tf.io.read_file(x))
input_images = input_images.map(lambda x: tf.image.decode_png(x, channels=3))
input_images = input_images.map(lambda x: tf.image.resize(x, [256, 256]))

output_images = model.generate(input_images.batch(3))

for i, img in enumerate(output_images):
    tf.io.write_file(f"output_{i}.png", tf.image.encode_png(img))

Getting Started

Clone the repository:

git clone https://github.com/affinelayer/pix2pix-tensorflow.git
cd pix2pix-tensorflow

Install dependencies:
```
pip install -r requirements.txt
```
Download a pre-trained model or prepare your dataset for training:
```
python tools/download-dataset.py facades
```

Train a new model or use a pre-trained one:

python pix2pix.py \
  --mode train \
  --output_dir facades_train \
  --max_epochs 200 \
  --input_dir facades/train \
  --which_direction BtoA

Generate output images:

python pix2pix.py \
  --mode test \
  --output_dir facades_test \
  --input_dir facades/val \
  --checkpoint facades_train

Competitor Comparisons

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

Pros of pytorch-CycleGAN-and-pix2pix

Implements both Pix2Pix and CycleGAN in a single repository
Uses PyTorch, which offers dynamic computational graphs and easier debugging
More actively maintained with recent updates and contributions

Cons of pytorch-CycleGAN-and-pix2pix

May have a steeper learning curve for those more familiar with TensorFlow
Potentially slower training speed compared to TensorFlow implementation
Requires PyTorch installation, which might not be as widely used as TensorFlow

Code Comparison

pix2pix-tensorflow:

def discriminator(self, image, y=None, reuse=False):
    with tf.variable_scope("discriminator"):
        # Layers defined here
        return output

pytorch-CycleGAN-and-pix2pix:

class Discriminator(nn.Module):
    def __init__(self, input_nc):
        super(Discriminator, self).__init__()
        # Layers defined here
    
    def forward(self, input):
        # Forward pass defined here
        return output

The main difference in code structure is that pix2pix-tensorflow uses TensorFlow's lower-level API with explicit variable scopes, while pytorch-CycleGAN-and-pix2pix uses PyTorch's higher-level nn.Module class for defining network architectures.

pix2pixHD

6,789

Synthesizing and manipulating 2048x1024 images with conditional GANs

Pros of pix2pixHD

Higher resolution output (up to 2048x1024 pixels)
Improved image quality with multi-scale generator and discriminator
Instance-level feature embedding for better object details

Cons of pix2pixHD

More complex architecture, potentially harder to understand and modify
Requires more computational resources due to higher resolution and additional features
May be overkill for simpler image-to-image translation tasks

Code Comparison

pix2pix-tensorflow:

def generator(input):
    # Simple U-Net architecture
    return unet(input)

def discriminator(input, target):
    # PatchGAN discriminator
    return patch_gan(input, target)

pix2pixHD:

def generator(input, instance_map):
    # Multi-scale generator with instance-level features
    return multiscale_generator(input, instance_map)

def discriminator(input, target):
    # Multi-scale discriminator
    return multiscale_discriminator(input, target)

Both repositories implement image-to-image translation using GANs, but pix2pixHD offers higher resolution output and improved image quality at the cost of increased complexity and computational requirements. The code comparison highlights the architectural differences, with pix2pixHD utilizing multi-scale components and instance-level features.

pix2pix

10,444

Image-to-image translation with conditional adversarial nets

Pros of pix2pix

Implemented in PyTorch, which offers dynamic computational graphs and easier debugging
Original implementation by the authors of the pix2pix paper
Includes pre-trained models for quick experimentation

Cons of pix2pix

Less actively maintained compared to pix2pix-tensorflow
Fewer additional features and optimizations beyond the original paper implementation
Limited documentation for advanced usage and customization

Code Comparison

pix2pix (PyTorch):

class UnetGenerator(nn.Module):
    def __init__(self, input_nc, output_nc, num_downs, ngf=64):
        super(UnetGenerator, self).__init__()
        # Encoder and decoder implementation

pix2pix-tensorflow:

def create_generator(generator_inputs, generator_outputs_channels):
    layers = []
    # Encoder and decoder implementation using tf.keras.layers

The main difference in code structure is the use of PyTorch's nn.Module in pix2pix versus TensorFlow's functional API in pix2pix-tensorflow. The latter offers more flexibility in model architecture but may be less intuitive for some developers.

Both implementations follow the original pix2pix architecture, but pix2pix-tensorflow includes additional optimizations and features that may improve performance and ease of use in certain scenarios.

Keras-GAN

9,237

Keras implementations of Generative Adversarial Networks.

Pros of Keras-GAN

Implements multiple GAN architectures, providing a broader range of options for different tasks
Uses Keras, which offers a more user-friendly and high-level API compared to TensorFlow
Includes more recent GAN variants, such as WGAN and CGAN

Cons of Keras-GAN

Less focused on image-to-image translation tasks compared to pix2pix-tensorflow
May require more setup and configuration for specific use cases
Documentation and examples might be less comprehensive for each individual GAN architecture

Code Comparison

pix2pix-tensorflow:

def create_generator(generator_inputs, generator_outputs_channels):
    layers = []
    # ... (encoder-decoder architecture implementation)
    return tf.keras.Model(inputs=generator_inputs, outputs=x)

Keras-GAN (DCGAN example):

def build_generator():
    model = Sequential()
    # ... (generator architecture implementation)
    return Model(noise, img)

Both repositories use similar approaches to define generator architectures, but Keras-GAN tends to use the Sequential API more frequently, while pix2pix-tensorflow often uses functional API for more complex architectures.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pix2pix-tensorflow

Based on pix2pix by Isola et al.

Article about this implemention

Interactive Demo

Tensorflow implementation of pix2pix. Learns a mapping from input images to output images, like these examples from the original paper:

This port is based directly on the torch implementation, and not on an existing Tensorflow implementation. It is meant to be a faithful implementation of the original work and so does not add anything. The processing speed on a GPU with cuDNN was equivalent to the Torch implementation in testing.

Setup

Prerequisites

Tensorflow 1.4.1

Getting Started

# clone this repo
git clone https://github.com/affinelayer/pix2pix-tensorflow.git
cd pix2pix-tensorflow
# download the CMP Facades dataset (generated from http://cmp.felk.cvut.cz/~tylecr1/facade/)
python tools/download-dataset.py facades
# train the model (this may take 1-8 hours depending on GPU, on CPU you will be waiting for a bit)
python pix2pix.py \
  --mode train \
  --output_dir facades_train \
  --max_epochs 200 \
  --input_dir facades/train \
  --which_direction BtoA
# test the model
python pix2pix.py \
  --mode test \
  --output_dir facades_test \
  --input_dir facades/val \
  --checkpoint facades_train

The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets.

If you have Docker installed, you can use the provided Docker image to run pix2pix without installing the correct version of Tensorflow:

# train the model
python tools/dockrun.py python pix2pix.py \
      --mode train \
      --output_dir facades_train \
      --max_epochs 200 \
      --input_dir facades/train \
      --which_direction BtoA
# test the model
python tools/dockrun.py python pix2pix.py \
      --mode test \
      --output_dir facades_test \
      --input_dir facades/val \
      --checkpoint facades_train

Datasets and Trained Models

The data format used by this program is the same as the original pix2pix format, which consists of images of input and desired output side by side like:

For example:

Some datasets have been made available by the authors of the pix2pix paper. To download those datasets, use the included script tools/download-dataset.py. There are also links to pre-trained models alongside each dataset, note that these pre-trained models require the current version of pix2pix.py:

dataset	example
`python tools/download-dataset.py facades` 400 images from CMP Facades dataset. (31MB) Pre-trained: BtoA
`python tools/download-dataset.py cityscapes` 2975 images from the Cityscapes training set. (113M) Pre-trained: AtoB BtoA
`python tools/download-dataset.py maps` 1096 training images scraped from Google Maps (246M) Pre-trained: AtoB BtoA
`python tools/download-dataset.py edges2shoes` 50k training images from UT Zappos50K dataset. Edges are computed by HED edge detector + post-processing. (2.2GB) Pre-trained: AtoB
`python tools/download-dataset.py edges2handbags` 137K Amazon Handbag images from iGAN project. Edges are computed by HED edge detector + post-processing. (8.6GB) Pre-trained: AtoB

The facades dataset is the smallest and easiest to get started with.

Creating your own dataset

Example: creating images with blank centers for inpainting

# Resize source images
python tools/process.py \
  --input_dir photos/original \
  --operation resize \
  --output_dir photos/resized
# Create images with blank centers
python tools/process.py \
  --input_dir photos/resized \
  --operation blank \
  --output_dir photos/blank
# Combine resized images with blanked images
python tools/process.py \
  --input_dir photos/resized \
  --b_dir photos/blank \
  --operation combine \
  --output_dir photos/combined
# Split into train/val set
python tools/split.py \
  --dir photos/combined

The folder photos/combined will now have train and val subfolders that you can use for training and testing.

Creating image pairs from existing images

If you have two directories a and b, with corresponding images (same name, same dimensions, different data) you can combine them with process.py:

python tools/process.py \
  --input_dir a \
  --b_dir b \
  --operation combine \
  --output_dir c

This puts the images in a side-by-side combined image that pix2pix.py expects.

Colorization

For colorization, your images should ideally all be the same aspect ratio. You can resize and crop them with the resize command:

python tools/process.py \
  --input_dir photos/original \
  --operation resize \
  --output_dir photos/resized

No other processing is required, the colorization mode (see Training section below) uses single images instead of image pairs.

Training

Image Pairs

For normal training with image pairs, you need to specify which directory contains the training images, and which direction to train on. The direction options are AtoB or BtoA

python pix2pix.py \
  --mode train \
  --output_dir facades_train \
  --max_epochs 200 \
  --input_dir facades/train \
  --which_direction BtoA

Colorization

pix2pix.py includes special code to handle colorization with single images instead of pairs, using that looks like this:

python pix2pix.py \
  --mode train \
  --output_dir photos_train \
  --max_epochs 200 \
  --input_dir photos/train \
  --lab_colorization

In this mode, image A is the black and white image (lightness only), and image B contains the color channels of that image (no lightness information).

Tips

You can look at the loss and computation graph using tensorboard:

tensorboard --logdir=facades_train

If you wish to write in-progress pictures as the network is training, use --display_freq 50. This will update facades_train/index.html every 50 steps with the current training inputs and outputs.

Testing

Testing is done with --mode test. You should specify the checkpoint to use with --checkpoint, this should point to the output_dir that you created previously with --mode train:

python pix2pix.py \
  --mode test \
  --output_dir facades_test \
  --input_dir facades/val \
  --checkpoint facades_train

The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify which_direction for instance.

The test run will output an HTML file at facades_test/index.html that shows input/output/target image sets:

Code Validation

Validation of the code was performed on a Linux machine with a ~1.3 TFLOPS Nvidia GTX 750 Ti GPU and an Azure NC6 instance with a K80 GPU.

git clone https://github.com/affinelayer/pix2pix-tensorflow.git
cd pix2pix-tensorflow
python tools/download-dataset.py facades
sudo nvidia-docker run \
  --volume $PWD:/prj \
  --workdir /prj \
  --env PYTHONUNBUFFERED=x \
  affinelayer/pix2pix-tensorflow \
    python pix2pix.py \
      --mode train \
      --output_dir facades_train \
      --max_epochs 200 \
      --input_dir facades/train \
      --which_direction BtoA
sudo nvidia-docker run \
  --volume $PWD:/prj \
  --workdir /prj \
  --env PYTHONUNBUFFERED=x \
  affinelayer/pix2pix-tensorflow \
    python pix2pix.py \
      --mode test \
      --output_dir facades_test \
      --input_dir facades/val \
      --checkpoint facades_train

Comparison on facades dataset:

Input	Tensorflow	Torch	Target

Unimplemented Features

The following models have not been implemented:

defineG_encoder_decoder
defineG_unet_128
defineD_pixelGAN

Citation

If you use this code for your research, please cite the paper this code is based on: Image-to-Image Translation Using Conditional Adversarial Networks:

@article{pix2pix2016,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  journal={arxiv},
  year={2016}
}

Acknowledgments

This is a port of pix2pix from Torch to Tensorflow. It also contains colorspace conversion code ported from Torch. Thanks to the Tensorflow team for making such a quality library! And special thanks to Phillip Isola for answering my questions about the pix2pix code.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of pytorch-CycleGAN-and-pix2pix

Cons of pytorch-CycleGAN-and-pix2pix

Code Comparison

Pros of pix2pixHD

Cons of pix2pixHD

Code Comparison

Pros of pix2pix

Cons of pix2pix

Code Comparison

Pros of Keras-GAN

Cons of Keras-GAN

Code Comparison

Convert designs to code with AI

README

pix2pix-tensorflow

Setup

Prerequisites

Recommended

Getting Started

Datasets and Trained Models

Creating your own dataset

Example: creating images with blank centers for inpainting

Creating image pairs from existing images

Colorization

Training

Image Pairs

Colorization

Tips

Testing

Code Validation

Unimplemented Features

Citation

Acknowledgments

Top Related Projects

Convert designs to code with AI