Convert Figma logo to code with AI

SystemErrorWang logoWhite-box-Cartoonization

Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

3,948
737
3,948
73

Top Related Projects

3,313

RepVGG: Making VGG-style ConvNets Great Again

35,503

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

A latent text-to-image diffusion model

1,415

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

Quick Overview

White-box-Cartoonization is a GitHub repository that implements a novel white-box cartoon style transfer algorithm. This project aims to transform real-world images and videos into cartoon-style renderings using a learning-based approach. The method provides interpretable results and allows for adjustable cartoonization effects.

Pros

  • High-quality cartoonization results with preserved details and structures
  • Supports both image and video cartoonization
  • Provides a TensorFlow implementation for easy integration and experimentation
  • Includes pre-trained models for quick testing and deployment

Cons

  • Requires significant computational resources for training and inference
  • Limited customization options for fine-tuning the cartoonization style
  • Dependency on specific versions of TensorFlow and other libraries
  • Lack of extensive documentation for advanced usage and modifications

Code Examples

  1. Loading and preprocessing an image:
import cv2
import numpy as np

def load_image(path):
    img = cv2.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(np.float32) / 127.5 - 1
    return img
  1. Performing cartoonization on an image:
def cartoonize(model, img):
    input_image = np.expand_dims(img, axis=0)
    output = model.signatures['serving_default'](tf.constant(input_image))
    cartoon = output['output_1'].numpy()
    cartoon = (cartoon[0] + 1) * 127.5
    cartoon = cartoon.astype(np.uint8)
    return cartoon
  1. Saving the cartoonized image:
def save_image(img, path):
    cv2.imwrite(path, cv2.cvtColor(img, cv2.COLOR_RGB2BGR))

Getting Started

  1. Clone the repository:

    git clone https://github.com/SystemErrorWang/White-box-Cartoonization.git
    cd White-box-Cartoonization
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models from the provided link in the repository's README.

  4. Run the cartoonization script:

    python test_code/cartoonize.py --input_path path/to/input/image --output_path path/to/output/image
    

Competitor Comparisons

3,313

RepVGG: Making VGG-style ConvNets Great Again

Pros of RepVGG

  • Focuses on efficient and scalable neural network architecture for image classification
  • Offers better inference speed and accuracy trade-off compared to many existing models
  • Provides a simple and flexible design that can be easily adapted to various tasks

Cons of RepVGG

  • Limited to image classification tasks, unlike White-box-Cartoonization's focus on image stylization
  • May require more computational resources for training compared to White-box-Cartoonization
  • Less visually appealing output for end-users, as it doesn't produce stylized images

Code Comparison

RepVGG:

class RepVGGBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False):
        super(RepVGGBlock, self).__init__()
        # ... (implementation details)

White-box-Cartoonization:

class CartoonizeNetwork(nn.Module):
    def __init__(self):
        super(CartoonizeNetwork, self).__init__()
        # ... (implementation details)

Both repositories provide PyTorch implementations of their respective neural network architectures. RepVGG focuses on efficient convolutional blocks, while White-box-Cartoonization implements a more complex network for image stylization.

35,503

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Pros of GFPGAN

  • Focuses on face restoration and enhancement, providing more detailed and realistic results for facial features
  • Utilizes a pre-trained model, allowing for faster processing and easier implementation
  • Supports both image and video processing, offering more versatility in applications

Cons of GFPGAN

  • Limited to face restoration, not suitable for full-body or non-facial image stylization
  • May produce less stylized or artistic results compared to White-box-Cartoonization
  • Requires more computational resources due to its complex neural network architecture

Code Comparison

GFPGAN:

from gfpgan import GFPGANer

restorer = GFPGANer(model_path='experiments/pretrained_models/GFPGANv1.3.pth', upscale=2)
restored_img, _ = restorer.enhance(img, has_aligned=False, only_center_face=False, paste_back=True)

White-box-Cartoonization:

from cartoonize import WB_Cartoonize

cartoonizer = WB_Cartoonize(os.path.abspath("saved_models/"), gpu=1)
cartoon_image = cartoonizer.infer(img)

Both repositories provide easy-to-use interfaces for their respective tasks. GFPGAN focuses on face restoration with a pre-trained model, while White-box-Cartoonization offers a more general image stylization approach. The choice between them depends on the specific use case and desired output style.

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Pros of Real-ESRGAN

  • Focuses on image super-resolution and enhancement, providing high-quality upscaling
  • Offers better performance in restoring details and textures in low-quality images
  • Supports both anime and real-world photo processing

Cons of Real-ESRGAN

  • Limited to image enhancement and upscaling, not designed for stylization or cartoonization
  • May require more computational resources due to its complex architecture

Code Comparison

Real-ESRGAN:

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
netscale = 4
model_path = 'experiments/pretrained_models/RealESRGAN_x4plus.pth'

White-box-Cartoonization:

guided_filter = GuidedFilter(r=1, eps=5e-3)
cartoon_generator = CartoonGenerator()
cartoon_generator.load_state_dict(torch.load('pretrained_model.pth'))

Both repositories provide pre-trained models and offer inference code for easy usage. Real-ESRGAN focuses on enhancing image quality, while White-box-Cartoonization aims to transform images into cartoon-style representations. The code snippets show the initialization of their respective models, highlighting the different approaches and architectures used by each project.

A latent text-to-image diffusion model

Pros of stable-diffusion

  • More versatile, capable of generating a wide range of image styles and content
  • Utilizes advanced machine learning techniques for high-quality image generation
  • Supports text-to-image generation, allowing for creative and customizable outputs

Cons of stable-diffusion

  • Requires more computational resources and longer processing times
  • May produce less consistent results compared to White-box-Cartoonization
  • More complex to set up and use, especially for beginners

Code Comparison

White-box-Cartoonization:

output = cartoonize(input_image)

stable-diffusion:

prompt = "A cartoon-style image of a cat"
image = pipe(prompt).images[0]

White-box-Cartoonization focuses on a specific task (cartoonization) with a simpler API, while stable-diffusion offers more flexibility but requires more detailed input and configuration.

Both projects have their strengths: White-box-Cartoonization excels in its specific task of cartoonization, while stable-diffusion provides a more versatile platform for various image generation and manipulation tasks. The choice between them depends on the specific requirements of the project and the desired level of control over the output.

1,415

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

Pros of JoJoGAN

  • Focuses on stylizing faces in the style of JoJo's Bizarre Adventure anime
  • Utilizes a GAN-based approach for more flexible and diverse outputs
  • Allows for fine-tuning on custom styles with limited data

Cons of JoJoGAN

  • Limited to face stylization, unlike White-box-Cartoonization's full-image approach
  • Requires more computational resources due to the GAN architecture
  • May produce less consistent results across different input images

Code Comparison

White-box-Cartoonization:

def cartoonize(img_path):
    input_photo = tf.io.read_file(img_path)
    input_photo = tf.image.decode_jpeg(input_photo, channels=3)
    input_photo = tf.image.resize(input_photo, [256, 256])
    input_photo = input_photo / 127.5 - 1
    output = network(input_photo)

JoJoGAN:

def stylize(img, model):
    img = transform(img).unsqueeze(0).to(device)
    with torch.no_grad():
        out = model(img)
    out = out.squeeze(0).permute(1, 2, 0).cpu().numpy()
    out = (out * 255).astype(np.uint8)

Both repositories provide image stylization capabilities, but they differ in their approach and focus. White-box-Cartoonization offers a more general cartoonization method for entire images, while JoJoGAN specializes in face stylization with a specific anime aesthetic. The code snippets demonstrate the different frameworks and preprocessing steps used in each project.

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

Pros of AnimeGANv2

  • Produces higher quality anime-style images with more vibrant colors and sharper details
  • Offers multiple pre-trained models for different anime styles
  • Includes a comprehensive training pipeline for custom datasets

Cons of AnimeGANv2

  • Requires more computational resources for inference and training
  • Less flexibility in controlling the cartoonization process compared to White-box-Cartoonization
  • Limited documentation and examples for customization

Code Comparison

White-box-Cartoonization:

output = cartoonize(input_image)

AnimeGANv2:

face_painter = AnimeGANv2(pretrained_model='generator_Hayao_weight.pt')
output = face_painter.inference(input_image)

White-box-Cartoonization uses a simpler function call, while AnimeGANv2 requires initializing a model object before inference. AnimeGANv2 allows for easy switching between different pre-trained models, offering more style options.

Both repositories provide Python-based implementations and support various input formats. White-box-Cartoonization focuses on a general cartoonization approach, while AnimeGANv2 specifically targets anime-style image generation. White-box-Cartoonization offers more interpretability and control over the transformation process, making it suitable for research and experimentation. AnimeGANv2, on the other hand, excels in producing high-quality anime-style images with less user intervention.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README




[CVPR2020]Learning to Cartoonize Using White-box Cartoon Representations

project page | paper | twitter | zhihu | bilibili | facial model

Use cases

Scenery

Food

Indoor Scenes

People

More Images Are Shown In The Supplementary Materials

Online demo

Prerequisites

  • Training code: Linux or Windows
  • NVIDIA GPU + CUDA CuDNN for performance
  • Inference code: Linux, Windows and MacOS

How To Use

Installation

  • Assume you already have NVIDIA GPU and CUDA CuDNN installed
  • Install tensorflow-gpu, we tested 1.12.0 and 1.13.0rc0
  • Install scikit-image==0.14.5, other versions may cause problems

Inference with Pre-trained Model

  • Store test images in /test_code/test_images
  • Run /test_code/cartoonize.py
  • Results will be saved in /test_code/cartoonized_images

Train

  • Place your training data in corresponding folders in /dataset
  • Run pretrain.py, results will be saved in /pretrain folder
  • Run train.py, results will be saved in /train_cartoon folder
  • Codes are cleaned from production environment and untested
  • There may be minor problems but should be easy to resolve
  • Pretrained VGG_19 model can be found at following url: https://drive.google.com/file/d/1j0jDENjdwxCDb36meP6-u5xDBzmKBOjJ/view?usp=sharing

Datasets

  • Due to copyright issues, we cannot provide cartoon images used for training
  • However, these training datasets are easy to prepare
  • Scenery images are collected from Shinkai Makoto, Miyazaki Hayao and Hosoda Mamoru films
  • Clip films into frames and random crop and resize to 256x256
  • Portrait images are from Kyoto animations and PA Works
  • We use this repo(https://github.com/nagadomi/lbpcascade_animeface) to detect facial areas
  • Manual data cleaning will greatly increace both datasets quality

Acknowledgement

We are grateful for the help from Lvmin Zhang and Style2Paints Research

License

Citation

If you use this code for your research, please cite our paper:

@InProceedings{Wang_2020_CVPR, author = {Wang, Xinrui and Yu, Jinze}, title = {Learning to Cartoonize Using White-Box Cartoon Representations}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} }

中文社区

我们有一个除了技术什么东西都聊的以技术交流为主的群。如果你一次加群失败,可以多次尝试: 816096787。