JoJoGAN

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

1,428

204

1,428

View on GitHub

Top Related Projects

Bringing-Old-Photos-Back-to-Life

15,454

Bringing Old Photo Back to Life (CVPR 2020 oral)

animegan2-pytorch

4,447

PyTorch implementation of AnimeGANv2

stylegan3

6,664

Official PyTorch implementation of StyleGAN3

stable-diffusion

71,028

A latent text-to-image diffusion model

Quick Overview

JoJoGAN is a project that implements a novel GAN-based method for one-shot face stylization. It allows users to transform facial images into various artistic styles, such as anime or cartoon characters, using only a single reference image. The project demonstrates impressive results in maintaining the identity of the original face while applying the target style.

Pros

Achieves high-quality face stylization with just one reference image
Preserves the identity and key features of the original face
Supports various artistic styles, including anime and cartoons
Provides pre-trained models for easy experimentation

Cons

Requires significant computational resources for training and inference
Limited to facial images and may not work well with other types of content
Potential ethical concerns regarding the manipulation of personal images
May produce inconsistent results with certain facial features or expressions

Code Examples

Loading a pre-trained model:

import torch
from models.stylegan2.model import Generator

ckpt = torch.load('pretrained_models/stylegan2-ffhq-config-f.pt', map_location=lambda storage, loc: storage)
generator = Generator(1024, 512, 8, channel_multiplier=2).to(device)
generator.load_state_dict(ckpt["g_ema"], strict=False)

Generating a stylized image:

from utils import align_face, swap_attribute

aligned_face = align_face(input_image)
w_pivot = generator.get_latent(aligned_face)
stylized_image = swap_attribute(generator, w_pivot, style_image, alpha=1.0)

Fine-tuning the model:

from train import train

train(generator, discriminator, g_optim, d_optim, g_ema, device, style_img, args)

Getting Started

Clone the repository:

git clone https://github.com/mchong6/JoJoGAN.git
cd JoJoGAN

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

mkdir pretrained_models
wget https://github.com/mchong6/JoJoGAN/releases/download/v1.0/stylegan2-ffhq-config-f.pt -O pretrained_models/stylegan2-ffhq-config-f.pt

Run the inference script:

python inference.py --input_image path/to/input.jpg --style_image path/to/style.jpg --output_image output.jpg

Competitor Comparisons

PhotoMaker

10,005

PhotoMaker [CVPR 2024]

Pros of PhotoMaker

Supports multiple reference images for more diverse style transfer
Offers fine-grained control over facial features and attributes
Includes a user-friendly web interface for easy experimentation

Cons of PhotoMaker

Requires more computational resources due to its advanced features
Has a steeper learning curve for beginners
May produce less stylized results in certain scenarios

Code Comparison

PhotoMaker:

from photomaker import PhotoMaker

model = PhotoMaker(device="cuda")
output = model.generate(
    prompt="A portrait of a person",
    num_samples=1,
    guidance_scale=7.5,
    num_inference_steps=50,
    reference_images=["image1.jpg", "image2.jpg"]
)

JoJoGAN:

from jojogan import JoJoGAN

model = JoJoGAN()
output = model.transfer_style(
    content_image="input.jpg",
    style_image="style.jpg",
    num_iterations=1000
)

PhotoMaker offers more advanced features and control, while JoJoGAN provides a simpler interface for quick style transfer. PhotoMaker's code allows for multiple reference images and fine-tuning of generation parameters, whereas JoJoGAN focuses on transferring style from a single image with fewer customization options.

Bringing-Old-Photos-Back-to-Life

15,454

Bringing Old Photo Back to Life (CVPR 2020 oral)

Pros of Bringing-Old-Photos-Back-to-Life

Specialized in restoring old, damaged photos
Comprehensive pipeline including face restoration, colorization, and inpainting
Backed by Microsoft research, potentially more robust and well-documented

Cons of Bringing-Old-Photos-Back-to-Life

More complex setup and usage compared to JoJoGAN
Focused on photo restoration, less versatile for style transfer or modern image manipulation

Code Comparison

Bringing-Old-Photos-Back-to-Life:

from run_face_enhancement import run_face_enhancement
from run_global import run_global
enhanced_image = run_face_enhancement(image_path)
final_result = run_global(enhanced_image)

JoJoGAN:

from style_transfer import style_transfer
styled_image = style_transfer(image_path, style='jojo')

Summary

Bringing-Old-Photos-Back-to-Life is a specialized tool for restoring old photographs, offering a comprehensive pipeline for various restoration tasks. It's backed by Microsoft research, potentially providing more robustness and documentation. However, it has a more complex setup and is less versatile for modern image manipulation compared to JoJoGAN.

JoJoGAN, on the other hand, is simpler to use and more focused on style transfer, particularly for anime-style transformations. It's more suitable for creative image manipulation but lacks the advanced restoration features of Bringing-Old-Photos-Back-to-Life.

The code comparison shows that Bringing-Old-Photos-Back-to-Life involves multiple steps for enhancement, while JoJoGAN offers a more straightforward style transfer function.

animegan2-pytorch

4,447

PyTorch implementation of AnimeGANv2

Pros of AnimeGAN2-PyTorch

More comprehensive documentation and usage instructions
Includes pre-trained models for various anime styles
Supports both image and video processing

Cons of AnimeGAN2-PyTorch

Less focus on fine-tuning for specific art styles
May require more setup and dependencies

Code Comparison

AnimeGAN2-PyTorch:

from model import Generator
model = Generator()
model.load_state_dict(torch.load('weights/paprika.pt'))
out = model(img)

JoJoGAN:

from models import Generator
model = Generator(1024, 512, 8, channel_multiplier=2)
model.load_state_dict(torch.load('models/stylegan2-ffhq-config-f.pt')['g_ema'])
out = model(img)

Both repositories focus on anime-style image generation, but AnimeGAN2-PyTorch offers a broader range of pre-trained models and more extensive documentation. JoJoGAN, on the other hand, emphasizes fine-tuning for specific art styles and may be more suitable for users looking to create custom anime-inspired effects. The code snippets show similar usage patterns, with AnimeGAN2-PyTorch potentially requiring less initial setup for out-of-the-box use.

stylegan3

6,664

Official PyTorch implementation of StyleGAN3

Pros of StyleGAN3

More advanced architecture with improved image quality and reduced artifacts
Better support for high-resolution image generation
Extensive documentation and official NVIDIA backing

Cons of StyleGAN3

Higher computational requirements and longer training times
More complex implementation, potentially harder for beginners to use
Less focused on specific style transfer tasks compared to JoJoGAN

Code Comparison

JoJoGAN:

def train(args):
    g_ema = Generator(args.size, 512, 8)
    g_ema.load_state_dict(torch.load(args.ckpt)["g_ema"], strict=False)
    g_ema.eval()
    g_ema = g_ema.to(device)

StyleGAN3:

def make_transform(translate, angle):
    m = np.eye(3)
    s = np.sin(angle/360.0*np.pi*2)
    c = np.cos(angle/360.0*np.pi*2)
    m[0][0] = c
    m[0][1] = s
    m[0][2] = translate[0]
    m[1][0] = -s
    m[1][1] = c
    m[1][2] = translate[1]
    return m

stable-diffusion

71,028

A latent text-to-image diffusion model

Pros of Stable-diffusion

More versatile, capable of generating a wide range of images
Larger community and more active development
Better documentation and resources for implementation

Cons of Stable-diffusion

Requires more computational resources
Longer training and inference times
More complex to fine-tune for specific tasks

Code Comparison

JoJoGAN:

def train(model, img_dir, num_epochs=1, batch_size=4):
    dataset = ImageDataset(img_dir)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
    for epoch in range(num_epochs):
        for batch in dataloader:
            # Training logic here

Stable-diffusion:

def train(model, data, num_epochs=1000, batch_size=16):
    for epoch in range(num_epochs):
        for i, (x, y) in enumerate(data):
            model_output = model(x)
            loss = F.mse_loss(model_output, y)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()

JoJoGAN is more focused on style transfer and face manipulation, while Stable-diffusion is a more general-purpose text-to-image model. JoJoGAN's code is simpler and more straightforward, while Stable-diffusion's implementation is more complex but offers greater flexibility and capabilities.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

JoJoGAN: One Shot Face Stylization

This is the PyTorch implementation of JoJoGAN: One Shot Face Stylization.

Abstract:
While there have been recent advances in few-shot image stylization, these methods fail to capture stylistic details that are obvious to humans. Details such as the shape of the eyes, the boldness of the lines, are especially difficult for a model to learn, especially so under a limited data setting. In this work, we aim to perform one-shot image stylization that gets the details right. Given a reference style image, we approximate paired real data using GAN inversion and finetune a pretrained StyleGAN using that approximate paired data. We then encourage the StyleGAN to generalize so that the learned style can be applied to all other images.

Updates

2021-12-22 Integrated into Replicate using cog. Try it out
2022-02-03 Updated the paper. Improved stylization quality using discriminator perceptual loss. Added sketch model
2021-12-26 Added wandb logging. Fixed finetuning bug which begins finetuning from previously loaded checkpoint instead of the base face model. Added art model
2021-12-25 Added arcane_multi model which is trained on 4 arcane faces instead of 1 (if anyone has more clean data, let me know!). Better preserves features
2021-12-23 Paper is uploaded to arxiv.
2021-12-22 Integrated into Huggingface Spaces ð¤ using Gradio. Try it out
2021-12-22 Added pydrive authentication to avoid download limits from gdrive! Fixed running on cpu on colab.

How to use

Everything to get started is in the colab notebook.

Citation

If you use this code or ideas from our paper, please cite our paper:

@article{chong2021jojogan,
  title={JoJoGAN: One Shot Face Stylization},
  author={Chong, Min Jin and Forsyth, David},
  journal={arXiv preprint arXiv:2112.11641},
  year={2021}
}

Acknowledgments

This code borrows from StyleGAN2 by rosalinity, e4e. Some snippets of colab code from StyleGAN-NADA

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot