Convert Figma logo to code with AI

CQFIO logoPhotographicImageSynthesis

Photographic Image Synthesis with Cascaded Refinement Networks

1,248
226
1,248
14

Top Related Projects

Image-to-Image Translation in PyTorch

Synthesizing and manipulating 2048x1024 images with conditional GANs

7,618

Semantic Image Synthesis with SPADE

10,208

Image-to-image translation with conditional adversarial nets

Code and data for paper "Deep Photo Style Transfer": https://arxiv.org/abs/1703.07511

Quick Overview

PhotographicImageSynthesis is a GitHub repository that implements a deep learning approach for photographic image synthesis from semantic layouts. The project aims to generate high-quality, realistic images from semantic segmentation maps using a combination of convolutional neural networks and adversarial training techniques.

Pros

  • Produces high-quality, photorealistic images from semantic layouts
  • Implements a novel architecture combining global and local adversarial losses
  • Provides pre-trained models for easy experimentation and inference
  • Includes a comprehensive dataset for training and evaluation

Cons

  • Requires significant computational resources for training
  • Limited to specific scene types and categories present in the training data
  • May struggle with complex or unusual semantic layouts
  • Dependency on older TensorFlow versions may cause compatibility issues

Code Examples

  1. Loading a pre-trained model:
import tensorflow as tf
from model import GAN

# Load the pre-trained model
model = GAN(is_training=False)
saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, 'path/to/pretrained/model')
  1. Generating an image from a semantic layout:
import numpy as np

# Load and preprocess the semantic layout
semantic_layout = load_semantic_layout('path/to/layout.png')
semantic_layout = preprocess_layout(semantic_layout)

# Generate the image
generated_image = sess.run(model.generated_images, 
                           feed_dict={model.input_layout: semantic_layout})

# Post-process and save the generated image
output_image = postprocess_image(generated_image[0])
save_image(output_image, 'output.png')
  1. Training the model:
from data_loader import DataLoader

# Initialize data loader and optimizer
data_loader = DataLoader('path/to/dataset')
optimizer = tf.train.AdamOptimizer(learning_rate=0.0002)

# Training loop
for epoch in range(num_epochs):
    for batch in data_loader:
        _, g_loss = sess.run([model.g_train_op, model.g_loss],
                             feed_dict={model.input_layout: batch['layout'],
                                        model.real_images: batch['image']})
        print(f"Epoch {epoch}, G Loss: {g_loss}")

Getting Started

  1. Clone the repository:

    git clone https://github.com/CQFIO/PhotographicImageSynthesis.git
    cd PhotographicImageSynthesis
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models and dataset:

    python download_models.py
    python download_dataset.py
    
  4. Run inference on a sample image:

    python inference.py --input_layout path/to/layout.png --output_image output.png
    

Competitor Comparisons

Image-to-Image Translation in PyTorch

Pros of pytorch-CycleGAN-and-pix2pix

  • Implements multiple image-to-image translation models (CycleGAN, pix2pix, etc.)
  • Provides a comprehensive PyTorch implementation with extensive documentation
  • Offers flexibility for various tasks like style transfer and image generation

Cons of pytorch-CycleGAN-and-pix2pix

  • May require more computational resources due to its complexity
  • Learning curve can be steeper for beginners due to multiple model implementations
  • Less focused on photorealistic image synthesis compared to PhotographicImageSynthesis

Code Comparison

PhotographicImageSynthesis:

def build_net(ntype,nin,nwb=None,name=None):
    if ntype=='conv':
        return tf.nn.relu(tf.nn.conv2d(nin,nwb[0],strides=[1,1,1,1],padding='SAME',name=name)+nwb[1])

pytorch-CycleGAN-and-pix2pix:

def build_conv(dim_in, dim_out, kernel_size=3, stride=1, padding=0, norm='none', activation='relu', pad_type='zero'):
    conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding=padding, padding_mode=pad_type)
    return conv

The code snippets show different approaches to building convolutional layers, with pytorch-CycleGAN-and-pix2pix using PyTorch's built-in modules and PhotographicImageSynthesis using TensorFlow operations directly.

Synthesizing and manipulating 2048x1024 images with conditional GANs

Pros of pix2pixHD

  • Higher resolution output (up to 2048x1024)
  • Multi-scale generator and discriminator architecture
  • Improved visual quality and realism in generated images

Cons of pix2pixHD

  • More complex model architecture, potentially requiring more computational resources
  • May require larger datasets for optimal performance
  • Limited to image-to-image translation tasks

Code Comparison

PhotographicImageSynthesis:

def build_net(ntype,nin,nwb=None,name=None):
    if ntype=='conv':
        return tf.nn.relu(tf.nn.conv2d(nin,nwb[0],strides=[1,1,1,1],padding='SAME',name=name)+nwb[1])
    elif ntype=='pool':
        return tf.nn.avg_pool(nin,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

pix2pixHD:

def define_G(input_nc, output_nc, ngf, netG, n_downsample_global=3, n_blocks_global=9, n_local_enhancers=1, n_blocks_local=3, norm='instance', gpu_ids=[]):
    norm_layer = get_norm_layer(norm_type=norm)    
    if netG == 'global':    
        netG = GlobalGenerator(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, norm_layer)       
    elif netG == 'local':
        netG = LocalEnhancer(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
7,618

Semantic Image Synthesis with SPADE

Pros of SPADE

  • More flexible architecture for diverse image synthesis tasks
  • Better handling of spatial information through spatially-adaptive normalization
  • Supports multi-modal synthesis and style manipulation

Cons of SPADE

  • May require more computational resources due to complex architecture
  • Potentially longer training time compared to PhotographicImageSynthesis
  • Less focus on photorealism in favor of versatility

Code Comparison

PhotographicImageSynthesis:

def build_generator(self):
    inputs = Input(shape=self.img_shape)
    x = Conv2D(64, kernel_size=3, strides=1, padding='same')(inputs)
    x = LeakyReLU(alpha=0.2)(x)
    # ... more layers

SPADE:

def forward(self, input, seg):
    x = F.interpolate(input, size=self.sh)
    x = self.head_0(x, seg)
    x = self.up_0(x, seg)
    x = self.up_1(x, seg)
    # ... more layers

The code snippets show that SPADE uses a more complex architecture with specialized normalization layers, while PhotographicImageSynthesis follows a more traditional convolutional approach. SPADE's forward method takes both input and segmentation map, allowing for more controlled synthesis based on semantic information.

10,208

Image-to-image translation with conditional adversarial nets

Pros of pix2pix

  • More versatile, supporting various image-to-image translation tasks
  • Easier to train on custom datasets
  • Extensive documentation and community support

Cons of pix2pix

  • Generally produces lower resolution outputs
  • May struggle with fine details and photorealistic textures
  • Less specialized for photographic image synthesis

Code Comparison

pix2pix:

def create_model(self, opt):
    model = networks.define_G(opt.input_nc, opt.output_nc, opt.ngf, opt.netG, opt.norm,
                              not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)
    return model

PhotographicImageSynthesis:

def build_generator(self):
    inputs = Input(shape=self.input_shape)
    x = Conv2D(64, (3, 3), padding='same')(inputs)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    # ... (more layers)
    return Model(inputs, x)

The code snippets show different approaches to model creation. pix2pix uses a more modular approach with separate network definitions, while PhotographicImageSynthesis builds the model directly using Keras layers.

Code and data for paper "Deep Photo Style Transfer": https://arxiv.org/abs/1703.07511

Pros of deep-photo-styletransfer

  • Focuses specifically on photorealistic style transfer
  • Preserves the structure and semantics of the original image better
  • Includes a CUDA implementation for faster processing

Cons of deep-photo-styletransfer

  • Limited to photo-to-photo style transfer
  • Requires more manual input (segmentation masks) for optimal results
  • Less versatile in terms of input and output image types

Code Comparison

PhotographicImageSynthesis:

def build_net(ntype,nin,nwb=None,name=None):
    if ntype=='conv':
        return tf.nn.conv2d(nin,nwb[0],strides=[1,1,1,1],padding='SAME',name=name)
    elif ntype=='pool':
        return tf.nn.avg_pool(nin,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')

deep-photo-styletransfer:

function build_network(net, input)
  local conv = nn.SpatialConvolution
  local relu = nn.ReLU
  net:add(conv(3,3,1,1,0,0))
  net:add(relu())
  return net

The code snippets show different approaches to building neural network layers, with PhotographicImageSynthesis using TensorFlow and deep-photo-styletransfer using Torch/Lua.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Photographic Image Synthesis with Cascaded Refinement Networks

This is a Tensorflow implementation of cascaded refinement networks to synthesize photographic images from semantic layouts.

Setup

Requirement

Required python libraries: Tensorflow (>=1.0) + Scipy + Numpy + Pillow.

Tested in Ubuntu + Intel i7 CPU + Nvidia Titan X (Pascal) with Cuda (>=8.0) and CuDNN (>=5.0). CPU mode should also work with minor changes.

Quick Start (Testing)

  1. Clone this repository.
  2. Download the pretrained models from Google Drive by running "python download_models.py". It takes several minutes to download all the models.
  3. Run "python demo_512p.py" or "python demo_1024p.py" (requires large GPU memory) to synthesize images.
  4. The synthesized images are saved in "result_512p/final" or "result_1024p/final".

Training

To train a model at 256p resolution, please set "is_training=True" and change the file paths for training and test sets accordingly in "demo_256p.py". Then run "demo_256p.py".

To train a model at 512p resolution, we fine-tune the pretrained model at 256p using "demo_512p.py". Also change "is_training=True" and file paths accordingly.

To train a model at 1024p resolution, we fine-tune the pretrained model at 512p using "demo_1024p.py". Also change "is_training=True" and file paths accordingly.

Video

https://youtu.be/0fhUJT21-bs

Citation

If you use our code for research, please cite our paper:

Qifeng Chen and Vladlen Koltun. Photographic Image Synthesis with Cascaded Refinement Networks. In ICCV 2017.

Amazon Turk Scripts

The scripts are put in the folder "mturk_scripts".

Todo List

  1. Add the code and models for the GTA dataset.

Question

If you have any question or request about the code and data, please email me at chenqifeng22@gmail.com. If you need the pretrained model on NYU, please send an email to me.

License

MIT License