White-box-Cartoonization
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”
Top Related Projects
RepVGG: Making VGG-style ConvNets Great Again
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
A latent text-to-image diffusion model
Official PyTorch repo for JoJoGAN: One Shot Face Stylization
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime
Quick Overview
White-box-Cartoonization is a GitHub repository that implements a novel white-box cartoon style transfer algorithm. This project aims to transform real-world images and videos into cartoon-style renderings using a learning-based approach. The method provides interpretable results and allows for adjustable cartoonization effects.
Pros
- High-quality cartoonization results with preserved details and structures
- Supports both image and video cartoonization
- Provides a TensorFlow implementation for easy integration and experimentation
- Includes pre-trained models for quick testing and deployment
Cons
- Requires significant computational resources for training and inference
- Limited customization options for fine-tuning the cartoonization style
- Dependency on specific versions of TensorFlow and other libraries
- Lack of extensive documentation for advanced usage and modifications
Code Examples
- Loading and preprocessing an image:
import cv2
import numpy as np
def load_image(path):
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32) / 127.5 - 1
return img
- Performing cartoonization on an image:
def cartoonize(model, img):
input_image = np.expand_dims(img, axis=0)
output = model.signatures['serving_default'](tf.constant(input_image))
cartoon = output['output_1'].numpy()
cartoon = (cartoon[0] + 1) * 127.5
cartoon = cartoon.astype(np.uint8)
return cartoon
- Saving the cartoonized image:
def save_image(img, path):
cv2.imwrite(path, cv2.cvtColor(img, cv2.COLOR_RGB2BGR))
Getting Started
-
Clone the repository:
git clone https://github.com/SystemErrorWang/White-box-Cartoonization.git cd White-box-Cartoonization
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models from the provided link in the repository's README.
-
Run the cartoonization script:
python test_code/cartoonize.py --input_path path/to/input/image --output_path path/to/output/image
Competitor Comparisons
RepVGG: Making VGG-style ConvNets Great Again
Pros of RepVGG
- Focuses on efficient and scalable neural network architecture for image classification
- Offers better inference speed and accuracy trade-off compared to many existing models
- Provides a simple and flexible design that can be easily adapted to various tasks
Cons of RepVGG
- Limited to image classification tasks, unlike White-box-Cartoonization's focus on image stylization
- May require more computational resources for training compared to White-box-Cartoonization
- Less visually appealing output for end-users, as it doesn't produce stylized images
Code Comparison
RepVGG:
class RepVGGBlock(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False):
super(RepVGGBlock, self).__init__()
# ... (implementation details)
White-box-Cartoonization:
class CartoonizeNetwork(nn.Module):
def __init__(self):
super(CartoonizeNetwork, self).__init__()
# ... (implementation details)
Both repositories provide PyTorch implementations of their respective neural network architectures. RepVGG focuses on efficient convolutional blocks, while White-box-Cartoonization implements a more complex network for image stylization.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Pros of GFPGAN
- Focuses on face restoration and enhancement, providing more detailed and realistic results for facial features
- Utilizes a pre-trained model, allowing for faster processing and easier implementation
- Supports both image and video processing, offering more versatility in applications
Cons of GFPGAN
- Limited to face restoration, not suitable for full-body or non-facial image stylization
- May produce less stylized or artistic results compared to White-box-Cartoonization
- Requires more computational resources due to its complex neural network architecture
Code Comparison
GFPGAN:
from gfpgan import GFPGANer
restorer = GFPGANer(model_path='experiments/pretrained_models/GFPGANv1.3.pth', upscale=2)
restored_img, _ = restorer.enhance(img, has_aligned=False, only_center_face=False, paste_back=True)
White-box-Cartoonization:
from cartoonize import WB_Cartoonize
cartoonizer = WB_Cartoonize(os.path.abspath("saved_models/"), gpu=1)
cartoon_image = cartoonizer.infer(img)
Both repositories provide easy-to-use interfaces for their respective tasks. GFPGAN focuses on face restoration with a pre-trained model, while White-box-Cartoonization offers a more general image stylization approach. The choice between them depends on the specific use case and desired output style.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Pros of Real-ESRGAN
- Focuses on image super-resolution and enhancement, providing high-quality upscaling
- Offers better performance in restoring details and textures in low-quality images
- Supports both anime and real-world photo processing
Cons of Real-ESRGAN
- Limited to image enhancement and upscaling, not designed for stylization or cartoonization
- May require more computational resources due to its complex architecture
Code Comparison
Real-ESRGAN:
model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
netscale = 4
model_path = 'experiments/pretrained_models/RealESRGAN_x4plus.pth'
White-box-Cartoonization:
guided_filter = GuidedFilter(r=1, eps=5e-3)
cartoon_generator = CartoonGenerator()
cartoon_generator.load_state_dict(torch.load('pretrained_model.pth'))
Both repositories provide pre-trained models and offer inference code for easy usage. Real-ESRGAN focuses on enhancing image quality, while White-box-Cartoonization aims to transform images into cartoon-style representations. The code snippets show the initialization of their respective models, highlighting the different approaches and architectures used by each project.
A latent text-to-image diffusion model
Pros of stable-diffusion
- More versatile, capable of generating a wide range of image styles and content
- Utilizes advanced machine learning techniques for high-quality image generation
- Supports text-to-image generation, allowing for creative and customizable outputs
Cons of stable-diffusion
- Requires more computational resources and longer processing times
- May produce less consistent results compared to White-box-Cartoonization
- More complex to set up and use, especially for beginners
Code Comparison
White-box-Cartoonization:
output = cartoonize(input_image)
stable-diffusion:
prompt = "A cartoon-style image of a cat"
image = pipe(prompt).images[0]
White-box-Cartoonization focuses on a specific task (cartoonization) with a simpler API, while stable-diffusion offers more flexibility but requires more detailed input and configuration.
Both projects have their strengths: White-box-Cartoonization excels in its specific task of cartoonization, while stable-diffusion provides a more versatile platform for various image generation and manipulation tasks. The choice between them depends on the specific requirements of the project and the desired level of control over the output.
Official PyTorch repo for JoJoGAN: One Shot Face Stylization
Pros of JoJoGAN
- Focuses on stylizing faces in the style of JoJo's Bizarre Adventure anime
- Utilizes a GAN-based approach for more flexible and diverse outputs
- Allows for fine-tuning on custom styles with limited data
Cons of JoJoGAN
- Limited to face stylization, unlike White-box-Cartoonization's full-image approach
- Requires more computational resources due to the GAN architecture
- May produce less consistent results across different input images
Code Comparison
White-box-Cartoonization:
def cartoonize(img_path):
input_photo = tf.io.read_file(img_path)
input_photo = tf.image.decode_jpeg(input_photo, channels=3)
input_photo = tf.image.resize(input_photo, [256, 256])
input_photo = input_photo / 127.5 - 1
output = network(input_photo)
JoJoGAN:
def stylize(img, model):
img = transform(img).unsqueeze(0).to(device)
with torch.no_grad():
out = model(img)
out = out.squeeze(0).permute(1, 2, 0).cpu().numpy()
out = (out * 255).astype(np.uint8)
Both repositories provide image stylization capabilities, but they differ in their approach and focus. White-box-Cartoonization offers a more general cartoonization method for entire images, while JoJoGAN specializes in face stylization with a specific anime aesthetic. The code snippets demonstrate the different frameworks and preprocessing steps used in each project.
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime
Pros of AnimeGANv2
- Produces higher quality anime-style images with more vibrant colors and sharper details
- Offers multiple pre-trained models for different anime styles
- Includes a comprehensive training pipeline for custom datasets
Cons of AnimeGANv2
- Requires more computational resources for inference and training
- Less flexibility in controlling the cartoonization process compared to White-box-Cartoonization
- Limited documentation and examples for customization
Code Comparison
White-box-Cartoonization:
output = cartoonize(input_image)
AnimeGANv2:
face_painter = AnimeGANv2(pretrained_model='generator_Hayao_weight.pt')
output = face_painter.inference(input_image)
White-box-Cartoonization uses a simpler function call, while AnimeGANv2 requires initializing a model object before inference. AnimeGANv2 allows for easy switching between different pre-trained models, offering more style options.
Both repositories provide Python-based implementations and support various input formats. White-box-Cartoonization focuses on a general cartoonization approach, while AnimeGANv2 specifically targets anime-style image generation. White-box-Cartoonization offers more interpretability and control over the transformation process, making it suitable for research and experimentation. AnimeGANv2, on the other hand, excels in producing high-quality anime-style images with less user intervention.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
[CVPR2020]Learning to Cartoonize Using White-box Cartoon Representations
project page | paper | twitter | zhihu | bilibili | facial model
- Tensorflow implementation for CVPR2020 paper âLearning to Cartoonize Using White-box Cartoon Representationsâ.
- Improved method for facial images are now available:
- https://github.com/SystemErrorWang/FacialCartoonization
Use cases
Scenery
Food
Indoor Scenes
People
More Images Are Shown In The Supplementary Materials
Online demo
- Some kind people made online demo for this project
- Demo link: https://cartoonize-lkqov62dia-de.a.run.app/cartoonize
- Code: https://github.com/experience-ml/cartoonize
- Sample Demo: https://www.youtube.com/watch?v=GqduSLcmhto&feature=emb_title
Prerequisites
- Training code: Linux or Windows
- NVIDIA GPU + CUDA CuDNN for performance
- Inference code: Linux, Windows and MacOS
How To Use
Installation
- Assume you already have NVIDIA GPU and CUDA CuDNN installed
- Install tensorflow-gpu, we tested 1.12.0 and 1.13.0rc0
- Install scikit-image==0.14.5, other versions may cause problems
Inference with Pre-trained Model
- Store test images in /test_code/test_images
- Run /test_code/cartoonize.py
- Results will be saved in /test_code/cartoonized_images
Train
- Place your training data in corresponding folders in /dataset
- Run pretrain.py, results will be saved in /pretrain folder
- Run train.py, results will be saved in /train_cartoon folder
- Codes are cleaned from production environment and untested
- There may be minor problems but should be easy to resolve
- Pretrained VGG_19 model can be found at following url: https://drive.google.com/file/d/1j0jDENjdwxCDb36meP6-u5xDBzmKBOjJ/view?usp=sharing
Datasets
- Due to copyright issues, we cannot provide cartoon images used for training
- However, these training datasets are easy to prepare
- Scenery images are collected from Shinkai Makoto, Miyazaki Hayao and Hosoda Mamoru films
- Clip films into frames and random crop and resize to 256x256
- Portrait images are from Kyoto animations and PA Works
- We use this repo(https://github.com/nagadomi/lbpcascade_animeface) to detect facial areas
- Manual data cleaning will greatly increace both datasets quality
Acknowledgement
We are grateful for the help from Lvmin Zhang and Style2Paints Research
License
- Copyright (C) Xinrui Wang All rights reserved. Licensed under the CC BY-NC-SA 4.0
- license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
- Commercial application is prohibited, please remain this license if you clone this repo
Citation
If you use this code for your research, please cite our paper:
@InProceedings{Wang_2020_CVPR, author = {Wang, Xinrui and Yu, Jinze}, title = {Learning to Cartoonize Using White-Box Cartoon Representations}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} }
ä¸æ社åº
æ们æä¸ä¸ªé¤äºææ¯ä»ä¹ä¸è¥¿é½èç以ææ¯äº¤æµä¸ºä¸»ç群ãå¦æä½ ä¸æ¬¡å 群失败ï¼å¯ä»¥å¤æ¬¡å°è¯: 816096787ã
Top Related Projects
RepVGG: Making VGG-style ConvNets Great Again
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
A latent text-to-image diffusion model
Official PyTorch repo for JoJoGAN: One Shot Face Stylization
[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot