neural-style-tf

TensorFlow (Python API) implementation of Neural Style

3,110

822

3,110

View on GitHub

Top Related Projects

neural-style

18,313

Torch implementation of neural style algorithm

fast-style-transfer

10,965

TensorFlow CNN for fast style transfer ⚡🖥🎨🖼

deep-photo-styletransfer

10,008

Code and data for paper "Deep Photo Style Transfer": https://arxiv.org/abs/1703.07511

FastPhotoStyle

11,184

Style transfer, deep learning, feature transform

Quick Overview

Neural-style-tf is a TensorFlow implementation of the neural style transfer algorithm. It allows users to apply the artistic style of one image to the content of another, creating unique and visually appealing results. This project provides a flexible and customizable approach to neural style transfer.

Pros

Implements multiple loss functions and provides options for customization
Supports video stylization in addition to still images
Includes pre-trained VGG19 model for feature extraction
Offers various optimization methods and style transfer techniques

Cons

Requires significant computational resources for optimal performance
May produce inconsistent results depending on input images and parameters
Limited documentation and examples for advanced usage
Lacks a user-friendly interface for non-technical users

Code Examples

Basic style transfer:

import neural_style

content_img = 'path/to/content.jpg'
style_img = 'path/to/style.jpg'
output_img = 'path/to/output.jpg'

neural_style.stylize(content_img, style_img, output_img)

Customizing style transfer parameters:

neural_style.stylize(
    content_img,
    style_img,
    output_img,
    content_weight=1.0,
    style_weight=10.0,
    tv_weight=1e-3,
    content_layers=['conv4_2'],
    style_layers=['conv1_1', 'conv2_1', 'conv3_1', 'conv4_1', 'conv5_1']
)

Video stylization:

neural_style.stylize_video(
    'path/to/input_video.mp4',
    'path/to/style_image.jpg',
    'path/to/output_video.mp4',
    fps=30
)

Getting Started

Clone the repository:

git clone https://github.com/cysmith/neural-style-tf.git
cd neural-style-tf

Install dependencies:
```
pip install -r requirements.txt
```
Download the pre-trained VGG19 model:
```
python download_vgg.py
```

Run the style transfer:

python neural_style.py --content path/to/content.jpg --style path/to/style.jpg --output path/to/output.jpg

For more advanced usage and customization options, refer to the project's README and documentation.

Competitor Comparisons

neural-style

18,313

Torch implementation of neural style algorithm

Pros of neural-style

Implemented in Lua and Torch, which can be more efficient for certain deep learning tasks
Extensive documentation and examples provided in the README
Supports multiple GPU usage for faster processing

Cons of neural-style

Requires Torch installation, which may be less familiar to some users
Less actively maintained compared to neural-style-tf
Limited flexibility in terms of customizing the neural network architecture

Code Comparison

neural-style:

local cmd = torch.CmdLine()
cmd:option('-style_image', 'examples/inputs/seated-nude.jpg', 'Style target image')
cmd:option('-content_image', 'examples/inputs/tubingen.jpg', 'Content target image')
cmd:option('-output_image', 'out.png', 'Output image')

neural-style-tf:

parser.add_argument('--style_imgs', nargs='+', type=str,
                    help='Filenames of the style images (example: starry-night.jpg)', 
                    required=True)
parser.add_argument('--content_img', type=str,
                    help='Filename of the content image (example: lion.jpg)')

Both repositories implement neural style transfer, but neural-style-tf is written in Python using TensorFlow, making it more accessible to a wider range of developers. It also offers more customization options and is more actively maintained. However, neural-style may have performance advantages in certain scenarios due to its Torch implementation.

fast-style-transfer

10,965

TensorFlow CNN for fast style transfer ⚡🖥🎨🖼

Pros of fast-style-transfer

Significantly faster inference time, allowing for real-time style transfer
Supports video style transfer out of the box
Includes pre-trained models for immediate use

Cons of fast-style-transfer

Limited to styles it has been trained on, less flexible than neural-style-tf
May produce lower quality results in some cases compared to neural-style-tf
Requires separate training for each new style

Code Comparison

neural-style-tf:

stylized_image = model.stylize(
    content_image,
    style_image,
    content_weight=1.0,
    style_weight=10.0
)

fast-style-transfer:

stylized_image = transform_net(content_image)

The fast-style-transfer approach uses a pre-trained transform network, resulting in simpler code and faster execution. However, neural-style-tf allows for more fine-grained control over the stylization process by adjusting weights and using arbitrary style images at runtime.

deep-photo-styletransfer

10,008

Code and data for paper "Deep Photo Style Transfer": https://arxiv.org/abs/1703.07511

Pros of deep-photo-styletransfer

Focuses on photorealistic style transfer, preserving the structure of the original image
Implements a photorealistic regularization method for improved results
Includes a MATLAB implementation for certain preprocessing steps

Cons of deep-photo-styletransfer

Requires more complex setup with dependencies on MATLAB and Torch
Limited to photorealistic style transfer, less versatile for artistic styles
May require more computational resources due to additional processing steps

Code Comparison

deep-photo-styletransfer:

local content_image = image.load(params.content_image, 3)
local style_image = image.load(params.style_image, 3)
local content_layers = params.content_layers
local style_layers = params.style_layers

neural-style-tf:

content_img = get_img(args.content)
style_img = get_img(args.style)
content_features = get_features(content_img, model)
style_features = get_features(style_img, model)

Both repositories implement neural style transfer, but deep-photo-styletransfer focuses on photorealistic results, while neural-style-tf offers a more general approach to artistic style transfer. The code snippets show differences in implementation languages and image loading methods, reflecting their distinct approaches to the task.

FastPhotoStyle

11,184

Style transfer, deep learning, feature transform

Pros of FastPhotoStyle

Faster processing time due to optimized algorithms and GPU acceleration
Produces more photorealistic results with better preservation of content details
Supports both photo-to-photo and sketch-to-photo style transfer

Cons of FastPhotoStyle

Requires more powerful hardware (NVIDIA GPU) for optimal performance
Less flexibility in terms of customization and parameter tuning
Limited to photo-realistic style transfer, may not be suitable for artistic or abstract styles

Code Comparison

FastPhotoStyle:

from photo_style import stylization
stylization(content='input.jpg', style='style.jpg', output='output.jpg')

neural-style-tf:

from neural_style import stylize
stylize('input.jpg', 'style.jpg', 'output.jpg', iterations=1000, content_weight=5, style_weight=100)

Summary

FastPhotoStyle offers faster processing and more photorealistic results, making it ideal for photo-to-photo style transfer. However, it requires more powerful hardware and has less flexibility compared to neural-style-tf. The latter provides more customization options and can handle a wider range of artistic styles, but at the cost of longer processing times and potentially less photorealistic outputs.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

neural-style-tf

This is a TensorFlow implementation of several techniques described in the papers:

Image Style Transfer Using Convolutional Neural Networks by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
Artistic style transfer for videos by Manuel Ruder, Alexey Dosovitskiy, Thomas Brox
Preserving Color in Neural Artistic Style Transfer by Leon A. Gatys, Matthias Bethge, Aaron Hertzmann, Eli Shechtman

Additionally, techniques are presented for semantic segmentation and multiple style transfer.

The Neural Style algorithm synthesizes a pastiche by separating and combining the content of one image with the style of another image using convolutional neural networks (CNN). Below is an example of transferring the artistic style of The Starry Night onto a photograph of an African lion:

Transferring the style of various artworks to the same content image produces qualitatively convincing results:

Here we reproduce Figure 3 from the first paper, which renders a photograph of the Neckarfront in TÃ¼bingen, Germany in the style of 5 different iconic paintings The Shipwreck of the Minotaur, The Starry Night, Composition VII, The Scream, Seated Nude:

Content / Style Tradeoff

The relative weight of the style and content can be controlled.

Here we render with an increasing style weight applied to Red Canna:

Multiple Style Images

More than one style image can be used to blend multiple artistic styles.

Top row (left to right): The Starry Night + The Scream, The Scream + Composition VII, Seated Nude + Composition VII
Bottom row (left to right): Seated Nude + The Starry Night, Oversoul + Freshness of Cold, David Bowie + Skull

Style Interpolation

When using multiple style images, the degree of blending between the images can be controlled.

Top row (left to right): content image, .2 The Starry Night + .8 The Scream, .8 The Starry Night + .2 The Scream
Bottom row (left to right): .2 Oversoul + .8 Freshness of Cold, .5 Oversoul + .5 Freshness of Cold, .8 Oversoul + .2 Freshness of Cold

Transfer style but not color

The color scheme of the original image can be preserved by including the flag --original_colors. Colors are transferred using either the YUV, YCrCb, CIE L*a*b*, or CIE L*u*v* color spaces.

Here we reproduce Figure 1 and Figure 2 in the third paper using luminance-only transfer:

Left to right: content image, stylized image, stylized image with the original colors of the content image

Textures

The algorithm is not constrained to artistic painting styles. It can also be applied to photographic textures to create pareidolic images.

Segmentation

Style can be transferred to semantic segmentations in the content image.

Multiple styles can be transferred to the foreground and background of the content image.

Left to right: content image, foreground style, background style, foreground mask, background mask, stylized image

Video

Animations can be rendered by applying the algorithm to each source frame. For the best results, the gradient descent is initialized with the previously stylized frame warped to the current frame according to the optical flow between the pair of frames. Loss functions for temporal consistency are used to penalize pixels excluding disoccluded regions and motion boundaries.

Top row (left to right): source frames, ground-truth optical flow visualized
Bottom row (left to right): disoccluded regions and motion boundaries, stylized frames

Big thanks to Mike Burakoff for finding a bug in the video rendering.

Gradient Descent Initialization

The initialization of the gradient descent is controlled using --init_img_type for single images and --init_frame_type or --first_frame_type for video frames. White noise allows an arbitrary number of distinct images to be generated. Whereas, initializing with a fixed image always converges to the same output.

Here we reproduce Figure 6 from the first paper:

Top row (left to right): Initialized with the content image, the style image, white noise (RNG seed 1)
Bottom row (left to right): Initialized with white noise (RNG seeds 2, 3, 4)

Layer Representations

The feature complexities and receptive field sizes increase down the CNN heirarchy.

Here we reproduce Figure 3 from the original paper:

	1 x 10^-5	1 x 10^-4	1 x 10^-3	1 x 10^-2
conv1_1
conv2_1
conv3_1
conv4_1
conv5_1

Rows: increasing subsets of CNN layers; i.e. 'conv4_1' means using 'conv1_1', 'conv2_1', 'conv3_1', 'conv4_1'.
Columns: alpha/beta ratio of the the content and style reconstruction (see Content / Style Tradeoff).

Setup

Dependencies:

Optional (but recommended) dependencies:

CUDA 7.5+
cuDNN 5.0+

After installing the dependencies:

Download the VGG-19 model weights (see the "VGG-VD models from the Very Deep Convolutional Networks for Large-Scale Visual Recognition project" section). More info about the VGG-19 network can be found here.
After downloading, copy the weights file imagenet-vgg-verydeep-19.mat to the project directory.

Usage

Basic Usage

Single Image

Copy 1 content image to the default image content directory ./image_input
Copy 1 or more style images to the default style directory ./styles
Run the command:

bash stylize_image.sh <path_to_content_image> <path_to_style_image>

Example:

bash stylize_image.sh ./image_input/lion.jpg ./styles/kandinsky.jpg

Note: Supported image formats include: .png, .jpg, .ppm, .pgm

Note: Paths to images should not contain the ~ character to represent your home directory; you should instead use a relative path or the absolute path.

Video Frames

Copy 1 content video to the default video content directory ./video_input
Copy 1 or more style images to the default style directory ./styles
Run the command:

bash stylize_video.sh <path_to_video> <path_to_style_image>

Example:

bash stylize_video.sh ./video_input/video.mp4 ./styles/kandinsky.jpg

Note: Supported video formats include: .mp4, .mov, .mkv

Advanced Usage

Single Image or Video Frames

Copy content images to the default image content directory ./image_input or copy video frames to the default video content directory ./video_input
Copy 1 or more style images to the default style directory ./styles
Run the command with specific arguments:

python neural_style.py <arguments>

Example (Single Image):

python neural_style.py --content_img golden_gate.jpg \
                       --style_imgs starry-night.jpg \
                       --max_size 1000 \
                       --max_iterations 100 \
                       --original_colors \
                       --device /cpu:0 \
                       --verbose;

To use multiple style images, pass a space-separated list of the image names and image weights like this:

--style_imgs starry_night.jpg the_scream.jpg --style_imgs_weights 0.5 0.5

Example (Video Frames):

python neural_style.py --video \
                       --video_input_dir ./video_input/my_video_frames \
                       --style_imgs starry-night.jpg \
                       --content_weight 5 \
                       --style_weight 1000 \
                       --temporal_weight 1000 \
                       --start_frame 1 \
                       --end_frame 50 \
                       --max_size 1024 \
                       --first_frame_iterations 3000 \
                       --verbose;

Note: When using --init_frame_type prev_warp you must have previously computed the backward and forward optical flow between the frames. See ./video_input/make-opt-flow.sh and ./video_input/run-deepflow.sh

Arguments

--content_img: Filename of the content image. Example: lion.jpg
--content_img_dir: Relative or absolute directory path to the content image. Default: ./image_input
--style_imgs: Filenames of the style images. To use multiple style images, pass a space-separated list. Example: --style_imgs starry-night.jpg
--style_imgs_weights: The blending weights for each style image. Default: 1.0 (assumes only 1 style image)
--style_imgs_dir: Relative or absolute directory path to the style images. Default: ./styles
--init_img_type: Image used to initialize the network. Choices: content, random, style. Default: content
--max_size: Maximum width or height of the input images. Default: 512
--content_weight: Weight for the content loss function. Default: 5e0
--style_weight: Weight for the style loss function. Default: 1e4
--tv_weight: Weight for the total variational loss function. Default: 1e-3
--temporal_weight: Weight for the temporal loss function. Default: 2e2
--content_layers: Space-separated VGG-19 layer names used for the content image. Default: conv4_2
--style_layers: Space-separated VGG-19 layer names used for the style image. Default: relu1_1 relu2_1 relu3_1 relu4_1 relu5_1
--content_layer_weights: Space-separated weights of each content layer to the content loss. Default: 1.0
--style_layer_weights: Space-separated weights of each style layer to loss. Default: 0.2 0.2 0.2 0.2 0.2
--original_colors: Boolean flag indicating if the style is transferred but not the colors.
--color_convert_type: Color spaces (YUV, YCrCb, CIE L*u*v*, CIE L*a*b*) for luminance-matching conversion to original colors. Choices: yuv, ycrcb, luv, lab. Default: yuv
--style_mask: Boolean flag indicating if style is transferred to masked regions.
--style_mask_imgs: Filenames of the style mask images (example: face_mask.png). To use multiple style mask images, pass a space-separated list. Example: --style_mask_imgs face_mask.png face_mask_inv.png
--noise_ratio: Interpolation value between the content image and noise image if network is initialized with random. Default: 1.0
--seed: Seed for the random number generator. Default: 0
--model_weights: Weights and biases of the VGG-19 network. Download here. Default:imagenet-vgg-verydeep-19.mat
--pooling_type: Type of pooling in convolutional neural network. Choices: avg, max. Default: avg
--device: GPU or CPU device. GPU mode highly recommended but requires NVIDIA CUDA. Choices: /gpu:0 /cpu:0. Default: /gpu:0
--img_output_dir: Directory to write output to. Default: ./image_output
--img_name: Filename of the output image. Default: result
--verbose: Boolean flag indicating if statements should be printed to the console.

Optimization Arguments

--optimizer: Loss minimization optimizer. L-BFGS gives better results. Adam uses less memory. Choices: lbfgs, adam. Default: lbfgs
--learning_rate: Learning-rate parameter for the Adam optimizer. Default: 1e0

--max_iterations: Max number of iterations for the Adam or L-BFGS optimizer. Default: 1000
--print_iterations: Number of iterations between optimizer print statements. Default: 50
--content_loss_function: Different constants K in the content loss function. Choices: 1, 2, 3. Default: 1

Video Frame Arguments

--video: Boolean flag indicating if the user is creating a video.
--start_frame: First frame number. Default: 1
--end_frame: Last frame number. Default: 1
--first_frame_type: Image used to initialize the network during the rendering of the first frame. Choices: content, random, style. Default: random
--init_frame_type: Image used to initialize the network during the every rendering after the first frame. Choices: prev_warped, prev, content, random, style. Default: prev_warped
--video_input_dir: Relative or absolute directory path to input frames. Default: ./video_input
--video_output_dir: Relative or absolute directory path to write output frames to. Default: ./video_output
--content_frame_frmt: Format string of input frames. Default: frame_{}.png
--backward_optical_flow_frmt: Format string of backward optical flow files. Default: backward_{}_{}.flo
--forward_optical_flow_frmt: Format string of forward optical flow files. Default: forward_{}_{}.flo
--content_weights_frmt: Format string of optical flow consistency files. Default: reliable_{}_{}.txt
--prev_frame_indices: Previous frames to consider for longterm temporal consistency. Default: 1
--first_frame_iterations: Maximum number of optimizer iterations of the first frame. Default: 2000
--frame_iterations: Maximum number of optimizer iterations for each frame after the first frame. Default: 800

Questions and Errata

Send questions or issues:

Memory

By default, neural-style-tf uses the NVIDIA cuDNN GPU backend for convolutions and L-BFGS for optimization. These produce better and faster results, but can consume a lot of memory. You can reduce memory usage with the following:

Use Adam: Add the flag --optimizer adam to use Adam instead of L-BFGS. This should significantly reduce memory usage, but will require tuning of other parameters for good results; in particular you should experiment with different values of --learning_rate, --content_weight, --style_weight
Reduce image size: You can reduce the size of the generated image with the --max_size argument.

Implementation Details

All images were rendered on a machine with:

CPU: Intel Core i7-6800K @ 3.40GHz Ã 12
GPU: NVIDIA GeForce GTX 1080/PCIe/SSE2
OS: Linux Ubuntu 16.04.1 LTS 64-bit
CUDA: 8.0
python: 2.7.12
tensorflow: 0.10.0rc
opencv: 2.4.9.1

Acknowledgements

The implementation is based on the projects:

Torch (Lua) implementation 'neural-style' by jcjohnson
Torch (Lua) implementation 'artistic-videos' by manuelruder

Source video frames were obtained from:

MPI Sintel Flow Dataset

Artistic images were created by the modern artists:

Artistic images were created by the popular historical artists:

Bash shell scripts for testing were created by my brother Sheldon Smith.

Citation

If you find this code useful for your research, please cite:

@misc{Smith2016,
  author = {Smith, Cameron},
  title = {neural-style-tf},
  year = {2016},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/cysmith/neural-style-tf}},
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot