PhotographicImageSynthesis

Photographic Image Synthesis with Cascaded Refinement Networks

1,248

224

1,248

View on GitHub

Top Related Projects

pytorch-CycleGAN-and-pix2pix

23,952

Image-to-Image Translation in PyTorch

pix2pixHD

6,743

Synthesizing and manipulating 2048x1024 images with conditional GANs

SPADE

7,641

Semantic Image Synthesis with SPADE

pix2pix

10,345

Image-to-image translation with conditional adversarial nets

deep-photo-styletransfer

10,004

Code and data for paper "Deep Photo Style Transfer": https://arxiv.org/abs/1703.07511

Quick Overview

PhotographicImageSynthesis is a GitHub repository that implements a deep learning approach for photographic image synthesis from semantic layouts. The project aims to generate high-quality, realistic images from semantic segmentation maps using a combination of convolutional neural networks and adversarial training techniques.

Pros

Produces high-quality, photorealistic images from semantic layouts
Implements a novel architecture combining global and local adversarial losses
Provides pre-trained models for easy experimentation and inference
Includes a comprehensive dataset for training and evaluation

Cons

Requires significant computational resources for training
Limited to specific scene types and categories present in the training data
May struggle with complex or unusual semantic layouts
Dependency on older TensorFlow versions may cause compatibility issues

Code Examples

Loading a pre-trained model:

import tensorflow as tf
from model import GAN

# Load the pre-trained model
model = GAN(is_training=False)
saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, 'path/to/pretrained/model')

Generating an image from a semantic layout:

import numpy as np

# Load and preprocess the semantic layout
semantic_layout = load_semantic_layout('path/to/layout.png')
semantic_layout = preprocess_layout(semantic_layout)

# Generate the image
generated_image = sess.run(model.generated_images, 
                           feed_dict={model.input_layout: semantic_layout})

# Post-process and save the generated image
output_image = postprocess_image(generated_image[0])
save_image(output_image, 'output.png')

Training the model:

from data_loader import DataLoader

# Initialize data loader and optimizer
data_loader = DataLoader('path/to/dataset')
optimizer = tf.train.AdamOptimizer(learning_rate=0.0002)

# Training loop
for epoch in range(num_epochs):
    for batch in data_loader:
        _, g_loss = sess.run([model.g_train_op, model.g_loss],
                             feed_dict={model.input_layout: batch['layout'],
                                        model.real_images: batch['image']})
        print(f"Epoch {epoch}, G Loss: {g_loss}")

Getting Started

Clone the repository:

git clone https://github.com/CQFIO/PhotographicImageSynthesis.git
cd PhotographicImageSynthesis

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models and dataset:

python download_models.py
python download_dataset.py

Run inference on a sample image:

python inference.py --input_layout path/to/layout.png --output_image output.png

Competitor Comparisons

pytorch-CycleGAN-and-pix2pix

23,952

Image-to-Image Translation in PyTorch

Pros of pytorch-CycleGAN-and-pix2pix

Implements multiple image-to-image translation models (CycleGAN, pix2pix, etc.)
Provides a comprehensive PyTorch implementation with extensive documentation
Offers flexibility for various tasks like style transfer and image generation

Cons of pytorch-CycleGAN-and-pix2pix

May require more computational resources due to its complexity
Learning curve can be steeper for beginners due to multiple model implementations
Less focused on photorealistic image synthesis compared to PhotographicImageSynthesis

Code Comparison

PhotographicImageSynthesis:

def build_net(ntype,nin,nwb=None,name=None):
    if ntype=='conv':
        return tf.nn.relu(tf.nn.conv2d(nin,nwb[0],strides=[1,1,1,1],padding='SAME',name=name)+nwb[1])

pytorch-CycleGAN-and-pix2pix:

def build_conv(dim_in, dim_out, kernel_size=3, stride=1, padding=0, norm='none', activation='relu', pad_type='zero'):
    conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding=padding, padding_mode=pad_type)
    return conv

The code snippets show different approaches to building convolutional layers, with pytorch-CycleGAN-and-pix2pix using PyTorch's built-in modules and PhotographicImageSynthesis using TensorFlow operations directly.

pix2pixHD

6,743

Synthesizing and manipulating 2048x1024 images with conditional GANs

Pros of pix2pixHD

Higher resolution output (up to 2048x1024)
Multi-scale generator and discriminator architecture
Improved visual quality and realism in generated images

Cons of pix2pixHD

More complex model architecture, potentially requiring more computational resources
May require larger datasets for optimal performance
Limited to image-to-image translation tasks

Code Comparison

PhotographicImageSynthesis:

def build_net(ntype,nin,nwb=None,name=None):
    if ntype=='conv':
        return tf.nn.relu(tf.nn.conv2d(nin,nwb[0],strides=[1,1,1,1],padding='SAME',name=name)+nwb[1])
    elif ntype=='pool':
        return tf.nn.avg_pool(nin,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

pix2pixHD:

def define_G(input_nc, output_nc, ngf, netG, n_downsample_global=3, n_blocks_global=9, n_local_enhancers=1, n_blocks_local=3, norm='instance', gpu_ids=[]):
    norm_layer = get_norm_layer(norm_type=norm)    
    if netG == 'global':    
        netG = GlobalGenerator(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, norm_layer)       
    elif netG == 'local':
        netG = LocalEnhancer(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)

SPADE

7,641

Semantic Image Synthesis with SPADE

Pros of SPADE

More flexible architecture for diverse image synthesis tasks
Better handling of spatial information through spatially-adaptive normalization
Supports multi-modal synthesis and style manipulation

Cons of SPADE

May require more computational resources due to complex architecture
Potentially longer training time compared to PhotographicImageSynthesis
Less focus on photorealism in favor of versatility

Code Comparison

PhotographicImageSynthesis:

def build_generator(self):
    inputs = Input(shape=self.img_shape)
    x = Conv2D(64, kernel_size=3, strides=1, padding='same')(inputs)
    x = LeakyReLU(alpha=0.2)(x)
    # ... more layers

SPADE:

def forward(self, input, seg):
    x = F.interpolate(input, size=self.sh)
    x = self.head_0(x, seg)
    x = self.up_0(x, seg)
    x = self.up_1(x, seg)
    # ... more layers

The code snippets show that SPADE uses a more complex architecture with specialized normalization layers, while PhotographicImageSynthesis follows a more traditional convolutional approach. SPADE's forward method takes both input and segmentation map, allowing for more controlled synthesis based on semantic information.

pix2pix

10,345

Image-to-image translation with conditional adversarial nets

Pros of pix2pix

More versatile, supporting various image-to-image translation tasks
Easier to train on custom datasets
Extensive documentation and community support

Cons of pix2pix

Generally produces lower resolution outputs
May struggle with fine details and photorealistic textures
Less specialized for photographic image synthesis

Code Comparison

pix2pix:

def create_model(self, opt):
    model = networks.define_G(opt.input_nc, opt.output_nc, opt.ngf, opt.netG, opt.norm,
                              not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)
    return model

PhotographicImageSynthesis:

def build_generator(self):
    inputs = Input(shape=self.input_shape)
    x = Conv2D(64, (3, 3), padding='same')(inputs)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    # ... (more layers)
    return Model(inputs, x)

The code snippets show different approaches to model creation. pix2pix uses a more modular approach with separate network definitions, while PhotographicImageSynthesis builds the model directly using Keras layers.

deep-photo-styletransfer

10,004

Code and data for paper "Deep Photo Style Transfer": https://arxiv.org/abs/1703.07511

Pros of deep-photo-styletransfer

Focuses specifically on photorealistic style transfer
Preserves the structure and semantics of the original image better
Includes a CUDA implementation for faster processing

Cons of deep-photo-styletransfer

Limited to photo-to-photo style transfer
Requires more manual input (segmentation masks) for optimal results
Less versatile in terms of input and output image types

Code Comparison

PhotographicImageSynthesis:

def build_net(ntype,nin,nwb=None,name=None):
    if ntype=='conv':
        return tf.nn.conv2d(nin,nwb[0],strides=[1,1,1,1],padding='SAME',name=name)
    elif ntype=='pool':
        return tf.nn.avg_pool(nin,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

deep-photo-styletransfer:

function build_network(net, input)
  local conv = nn.SpatialConvolution
  local relu = nn.ReLU
  net:add(conv(3,3,1,1,0,0))
  net:add(relu())
  return net

The code snippets show different approaches to building neural network layers, with PhotographicImageSynthesis using TensorFlow and deep-photo-styletransfer using Torch/Lua.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Photographic Image Synthesis with Cascaded Refinement Networks

This is a Tensorflow implementation of cascaded refinement networks to synthesize photographic images from semantic layouts.

Setup

Requirement

Required python libraries: Tensorflow (>=1.0) + Scipy + Numpy + Pillow.

Tested in Ubuntu + Intel i7 CPU + Nvidia Titan X (Pascal) with Cuda (>=8.0) and CuDNN (>=5.0). CPU mode should also work with minor changes.

Quick Start (Testing)

Clone this repository.
Download the pretrained models from Google Drive by running "python download_models.py". It takes several minutes to download all the models.
Run "python demo_512p.py" or "python demo_1024p.py" (requires large GPU memory) to synthesize images.
The synthesized images are saved in "result_512p/final" or "result_1024p/final".

Training

To train a model at 256p resolution, please set "is_training=True" and change the file paths for training and test sets accordingly in "demo_256p.py". Then run "demo_256p.py".

To train a model at 512p resolution, we fine-tune the pretrained model at 256p using "demo_512p.py". Also change "is_training=True" and file paths accordingly.

To train a model at 1024p resolution, we fine-tune the pretrained model at 512p using "demo_1024p.py". Also change "is_training=True" and file paths accordingly.

Video

https://youtu.be/0fhUJT21-bs

Citation

If you use our code for research, please cite our paper:

Qifeng Chen and Vladlen Koltun. Photographic Image Synthesis with Cascaded Refinement Networks. In ICCV 2017.

Amazon Turk Scripts

The scripts are put in the folder "mturk_scripts".

Todo List

Add the code and models for the GTA dataset.

Question

If you have any question or request about the code and data, please email me at chenqifeng22@gmail.com. If you need the pretrained model on NYU, please send an email to me.

License

MIT License

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot