Convert Figma logo to code with AI

minivision-ai logophoto2cartoon

人像卡通化探索项目 (photo-to-cartoon translation project)

3,944
763
3,944
11

Top Related Projects

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

PhotoMaker [CVPR 2024]

Bringing Old Photo Back to Life (CVPR 2020 oral)

Quick Overview

Photo2Cartoon is an AI-powered project that transforms portrait photos into cartoon-style images. It uses a combination of face parsing, face detection, and generative adversarial networks (GANs) to create high-quality cartoon renderings while preserving the subject's key facial features and expressions.

Pros

  • Produces high-quality cartoon-style images from portrait photos
  • Preserves facial features and expressions effectively
  • Offers both a pre-trained model and the ability to train custom models
  • Provides a user-friendly web interface for easy use

Cons

  • Limited to portrait photos; doesn't work well with full-body images or non-human subjects
  • Requires specific hardware (NVIDIA GPU) for optimal performance
  • May struggle with certain facial features or complex backgrounds
  • Documentation is primarily in Chinese, which may be challenging for non-Chinese speakers

Code Examples

  1. Loading the pre-trained model:
from photo2cartoon import Photo2Cartoon

model = Photo2Cartoon()
  1. Converting a photo to a cartoon:
input_path = "path/to/input/photo.jpg"
output_path = "path/to/output/cartoon.jpg"

cartoon_image = model.inference(input_path)
cartoon_image.save(output_path)
  1. Training a custom model:
from photo2cartoon import Photo2CartoonTrainer

trainer = Photo2CartoonTrainer(
    photo_dir="path/to/photo/dataset",
    cartoon_dir="path/to/cartoon/dataset",
    epochs=100,
    batch_size=1
)
trainer.train()

Getting Started

To get started with Photo2Cartoon:

  1. Clone the repository:

    git clone https://github.com/minivision-ai/photo2cartoon.git
    cd photo2cartoon
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download the pre-trained model:

    wget https://github.com/minivision-ai/photo2cartoon/releases/download/v1.0/photo2cartoon_weights.zip
    unzip photo2cartoon_weights.zip
    
  4. Run the web interface:

    python ui_server.py
    
  5. Open a web browser and navigate to http://localhost:3000 to use the interface.

Competitor Comparisons

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

Pros of AnimeGANv2

  • Supports multiple anime styles, offering more versatility in output
  • Generally produces higher quality anime-style images with better detail preservation
  • Includes pre-trained models for easier implementation

Cons of AnimeGANv2

  • Requires more computational resources due to its complex architecture
  • Less focused on cartoon-style output, which may not suit all use cases
  • Documentation is less comprehensive, potentially making it harder for beginners

Code Comparison

AnimeGANv2:

from test import AnimeGANv2
model = AnimeGANv2()
output = model.inference('input.jpg')

photo2cartoon:

from photo2cartoon import Photo2Cartoon
p2c = Photo2Cartoon()
cartoon = p2c.inference('input.jpg')

Both repositories offer similar high-level APIs for inference, but AnimeGANv2 provides more options for style selection and fine-tuning. photo2cartoon has a simpler implementation focused specifically on cartoon-style conversion.

AnimeGANv2 excels in producing high-quality anime-style images with multiple style options, while photo2cartoon offers a more streamlined approach for cartoon-style conversion. The choice between the two depends on the specific requirements of the project, available computational resources, and the desired output style.

Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Pros of White-box-Cartoonization

  • More detailed and customizable cartoonization process
  • Provides a white-box approach, offering better interpretability
  • Supports both image and video cartoonization

Cons of White-box-Cartoonization

  • Requires more computational resources
  • Longer processing time for cartoonization
  • More complex setup and usage

Code Comparison

White-box-Cartoonization:

output = cartoonize(input_img, model)
guided_filter = GuidedFilter(r=5, eps=2e-1)
output = guided_filter(output, output)

photo2cartoon:

c2p = Photo2Cartoon()
cartoon_img = c2p.inference(img)

White-box-Cartoonization offers more control over the cartoonization process, allowing for fine-tuning of parameters and applying additional filters. photo2cartoon provides a simpler, more straightforward implementation with fewer customization options.

Both projects aim to transform photos into cartoon-style images, but White-box-Cartoonization offers a more comprehensive approach with additional features and flexibility. However, this comes at the cost of increased complexity and resource requirements. photo2cartoon, on the other hand, provides a more streamlined solution that may be easier to integrate into existing projects but with less control over the output.

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

Pros of DualStyleGAN

  • Offers more diverse and flexible style transfer options
  • Capable of generating high-quality, high-resolution outputs
  • Provides better preservation of facial details and expressions

Cons of DualStyleGAN

  • More complex implementation and potentially higher computational requirements
  • May require more fine-tuning and parameter adjustments for optimal results
  • Limited to facial style transfer, while Photo2Cartoon can handle full-body images

Code Comparison

DualStyleGAN:

from models.stylegan2_generator import Generator
from models.dual_generator import DualGenerator

generator = Generator(size, style_dim, n_mlp)
dual_generator = DualGenerator(generator, n_style_layers)

Photo2Cartoon:

from models.UGATIT import UGATIT
from utils.utils import *

model = UGATIT(args)
model.build_model()

DualStyleGAN focuses on a more sophisticated generator architecture, while Photo2Cartoon utilizes a UGATIT-based model. DualStyleGAN's implementation suggests a more flexible approach to style manipulation, potentially offering greater control over the output. Photo2Cartoon's code appears simpler and more straightforward, which may make it easier to use and integrate into existing projects.

PhotoMaker [CVPR 2024]

Pros of PhotoMaker

  • More versatile, capable of generating various styles beyond cartoons
  • Supports custom style inputs for personalized results
  • Offers more advanced features like multi-subject handling

Cons of PhotoMaker

  • Potentially more complex to use due to additional features
  • May require more computational resources
  • Less specialized for cartoon-style outputs compared to Photo2Cartoon

Code Comparison

PhotoMaker:

from photomaker import PhotoMaker

pm = PhotoMaker()
result = pm.generate(
    input_image="path/to/input.jpg",
    style_image="path/to/style.jpg",
    num_outputs=1
)

Photo2Cartoon:

from photo2cartoon import Photo2Cartoon

p2c = Photo2Cartoon()
result = p2c.transform(
    input_image="path/to/input.jpg"
)

Both repositories offer Python-based interfaces for image transformation. PhotoMaker provides more flexibility with style inputs and multiple output options, while Photo2Cartoon focuses specifically on cartoon-style conversions with a simpler API. PhotoMaker's approach allows for greater customization but may require more setup and configuration. Photo2Cartoon offers a more straightforward solution for users specifically interested in cartoon-style transformations.

Bringing Old Photo Back to Life (CVPR 2020 oral)

Pros of Bringing-Old-Photos-Back-to-Life

  • Focuses on restoring and enhancing old, damaged photos
  • Utilizes advanced AI techniques for face restoration and colorization
  • Provides a comprehensive solution for multiple photo restoration tasks

Cons of Bringing-Old-Photos-Back-to-Life

  • More complex setup and usage compared to Photo2Cartoon
  • Requires more computational resources due to its comprehensive approach
  • May have a steeper learning curve for users unfamiliar with deep learning frameworks

Code Comparison

Photo2Cartoon:

from photo2cartoon import Photo2Cartoon
p2c = Photo2Cartoon()
cartoon = p2c.inference(img)

Bringing-Old-Photos-Back-to-Life:

from bringing_old_photos_back_to_life import Restoration
restorer = Restoration()
restored_image = restorer.restore(image_path)

Both projects use Python and provide simple inference methods, but Bringing-Old-Photos-Back-to-Life offers a more comprehensive restoration process, while Photo2Cartoon focuses specifically on creating cartoon-style images from photos.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

人像卡通化 (Photo to Cartoon)

中文版 | English Version

该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

也可以前往我们的ai开放平台进行在线体验:https://ai.minivision.cn/#/coreability/cartoon

技术交流QQ群:937627932

Updates

简介

人像卡通风格渲染的目标是,在保持原图像ID信息和纹理细节的同时,将真实照片转换为卡通风格的非真实感图像。我们的思路是,从大量照片/卡通数据中习得照片到卡通画的映射。一般而言,基于成对数据的pix2pix方法能达到较好的图像转换效果,但本任务的输入输出轮廓并非一一对应,例如卡通风格的眼睛更大、下巴更瘦;且成对的数据绘制难度大、成本较高,因此我们采用unpaired image translation方法来实现。

Unpaired image translation流派最经典方法是CycleGAN,但原始CycleGAN的生成结果往往存在较为明显的伪影且不稳定。近期的论文U-GAT-IT提出了一种归一化方法——AdaLIN,能够自动调节Instance Norm和Layer Norm的比重,再结合attention机制能够实现精美的人像日漫风格转换。

与夸张的日漫风不同,我们的卡通风格更偏写实,要求既有卡通画的简洁Q萌,又有明确的身份信息。为此我们增加了Face ID Loss,使用预训练的人脸识别模型提取照片和卡通画的ID特征,通过余弦距离来约束生成的卡通画。

此外,我们提出了一种Soft-AdaLIN(Soft Adaptive Layer-Instance Normalization)归一化方法,在反规范化时将编码器的均值方差(照片特征)与解码器的均值方差(卡通特征)相融合。

模型结构方面,在U-GAT-IT的基础上,我们在编码器之前和解码器之后各增加了2个hourglass模块,渐进地提升模型特征抽象和重建能力。

由于实验数据较为匮乏,为了降低训练难度,我们将数据处理成固定的模式。首先检测图像中的人脸及关键点,根据人脸关键点旋转校正图像,并按统一标准裁剪,再将裁剪后的头像输入人像分割模型去除背景。

Start

安装依赖库

项目所需的主要依赖库如下:

  • python 3.6
  • pytorch 1.4
  • tensorflow-gpu 1.14
  • face-alignment
  • dlib
  • onnxruntime

Clone:

git clone https://github.com/minivision-ai/photo2cartoon.git
cd ./photo2cartoon

下载资源

谷歌网盘 | 百度网盘 提取码:y2ch

  1. 人像卡通化预训练模型:photo2cartoon_weights.pt(20200504更新),存放在models路径下。
  2. 头像分割模型:seg_model_384.pb,存放在utils路径下。
  3. 人脸识别预训练模型:model_mobilefacenet.pth,存放在models路径下。(From: InsightFace_Pytorch)
  4. 卡通画开源数据:cartoon_data,包含trainB和testB。
  5. 人像卡通化onnx模型:photo2cartoon_weights.onnx 谷歌网盘,存放在models路径下。

测试

将一张测试照片(亚洲年轻女性)转换为卡通风格:

python test.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png

测试onnx模型

python test_onnx.py --photo_path ./images/photo_test.jpg --save_path ./images/cartoon_result.png

训练

1.数据准备

训练数据包括真实照片和卡通画像,为降低训练复杂度,我们对两类数据进行了如下预处理:

  • 检测人脸及关键点。
  • 根据关键点旋转校正人脸。
  • 将关键点边界框按固定的比例扩张并裁剪出人脸区域。
  • 使用人像分割模型将背景置白。

我们开源了204张处理后的卡通画数据,您还需准备约1000张人像照片(为匹配卡通数据,尽量使用亚洲年轻女性照片,人脸大小最好超过200x200像素),使用以下命令进行预处理:

python data_process.py --data_path YourPhotoFolderPath --save_path YourSaveFolderPath

将处理后的数据按照以下层级存放,trainA、testA中存放照片头像数据,trainB、testB中存放卡通头像数据。

├── dataset
    └── photo2cartoon
        ├── trainA
            ├── xxx.jpg
            ├── yyy.png
            └── ...
        ├── trainB
            ├── zzz.jpg
            ├── www.png
            └── ...
        ├── testA
            ├── aaa.jpg 
            ├── bbb.png
            └── ...
        └── testB
            ├── ccc.jpg 
            ├── ddd.png
            └── ...

2.训练

重新训练:

python train.py --dataset photo2cartoon

加载预训练参数:

python train.py --dataset photo2cartoon --pretrained_weights models/photo2cartoon_weights.pt

多GPU训练(仍建议使用batch_size=1,单卡训练):

python train.py --dataset photo2cartoon --batch_size 4 --gpu_ids 0 1 2 3

Q&A

Q:为什么开源的卡通化模型与小程序中的效果有差异?

A:开源模型的训练数据收集自互联网,为了得到更加精美的效果,我们在训练小程序中卡通化模型时,采用了定制的卡通画数据(200多张),且增大了输入分辨率。此外,小程序中的人脸特征提取器采用自研的识别模型,效果优于本项目使用的开源识别模型。

Q:如何选取效果最好的模型?

A:首先训练模型200k iterations,然后使用FID指标挑选出最优模型,最终挑选出的模型为迭代90k iterations时的模型。

Q:关于人脸特征提取模型。

A:实验中我们发现,使用自研的识别模型计算Face ID Loss训练效果远好于使用开源识别模型,若训练效果出现鲁棒性问题,可尝试将Face ID Loss权重置零。

Q:人像分割模型是否能用与分割半身像?

A:不能。该模型是针对本项目训练的专用模型,需先裁剪出人脸区域再输入。

Tips

我们开源的模型是基于亚洲年轻女性训练的,对于其他人群覆盖不足,您可根据使用场景自行收集相应人群的数据进行训练。我们的开放平台提供了能够覆盖各类人群的卡通化服务,您可前往体验。如有定制卡通风格需求请联系商务:18852075216。

参考

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation [Paper][Code]

InsightFace_Pytorch