Convert Figma logo to code with AI

mchong6 logoJoJoGAN

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

1,422
206
1,422
19

Top Related Projects

PhotoMaker [CVPR 2024]

Bringing Old Photo Back to Life (CVPR 2020 oral)

PyTorch implementation of AnimeGANv2

Official PyTorch implementation of StyleGAN3

A latent text-to-image diffusion model

Quick Overview

JoJoGAN is a project that implements a novel GAN-based method for one-shot face stylization. It allows users to transform facial images into various artistic styles, such as anime or cartoon characters, using only a single reference image. The project demonstrates impressive results in maintaining the identity of the original face while applying the target style.

Pros

  • Achieves high-quality face stylization with just one reference image
  • Preserves the identity and key features of the original face
  • Supports various artistic styles, including anime and cartoons
  • Provides pre-trained models for easy experimentation

Cons

  • Requires significant computational resources for training and inference
  • Limited to facial images and may not work well with other types of content
  • Potential ethical concerns regarding the manipulation of personal images
  • May produce inconsistent results with certain facial features or expressions

Code Examples

  1. Loading a pre-trained model:
import torch
from models.stylegan2.model import Generator

ckpt = torch.load('pretrained_models/stylegan2-ffhq-config-f.pt', map_location=lambda storage, loc: storage)
generator = Generator(1024, 512, 8, channel_multiplier=2).to(device)
generator.load_state_dict(ckpt["g_ema"], strict=False)
  1. Generating a stylized image:
from utils import align_face, swap_attribute

aligned_face = align_face(input_image)
w_pivot = generator.get_latent(aligned_face)
stylized_image = swap_attribute(generator, w_pivot, style_image, alpha=1.0)
  1. Fine-tuning the model:
from train import train

train(generator, discriminator, g_optim, d_optim, g_ema, device, style_img, args)

Getting Started

  1. Clone the repository:

    git clone https://github.com/mchong6/JoJoGAN.git
    cd JoJoGAN
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models:

    mkdir pretrained_models
    wget https://github.com/mchong6/JoJoGAN/releases/download/v1.0/stylegan2-ffhq-config-f.pt -O pretrained_models/stylegan2-ffhq-config-f.pt
    
  4. Run the inference script:

    python inference.py --input_image path/to/input.jpg --style_image path/to/style.jpg --output_image output.jpg
    

Competitor Comparisons

PhotoMaker [CVPR 2024]

Pros of PhotoMaker

  • Supports multiple reference images for more diverse style transfer
  • Offers fine-grained control over facial features and attributes
  • Includes a user-friendly web interface for easy experimentation

Cons of PhotoMaker

  • Requires more computational resources due to its advanced features
  • Has a steeper learning curve for beginners
  • May produce less stylized results in certain scenarios

Code Comparison

PhotoMaker:

from photomaker import PhotoMaker

model = PhotoMaker(device="cuda")
output = model.generate(
    prompt="A portrait of a person",
    num_samples=1,
    guidance_scale=7.5,
    num_inference_steps=50,
    reference_images=["image1.jpg", "image2.jpg"]
)

JoJoGAN:

from jojogan import JoJoGAN

model = JoJoGAN()
output = model.transfer_style(
    content_image="input.jpg",
    style_image="style.jpg",
    num_iterations=1000
)

PhotoMaker offers more advanced features and control, while JoJoGAN provides a simpler interface for quick style transfer. PhotoMaker's code allows for multiple reference images and fine-tuning of generation parameters, whereas JoJoGAN focuses on transferring style from a single image with fewer customization options.

Bringing Old Photo Back to Life (CVPR 2020 oral)

Pros of Bringing-Old-Photos-Back-to-Life

  • Specialized in restoring old, damaged photos
  • Comprehensive pipeline including face restoration, colorization, and inpainting
  • Backed by Microsoft research, potentially more robust and well-documented

Cons of Bringing-Old-Photos-Back-to-Life

  • More complex setup and usage compared to JoJoGAN
  • Focused on photo restoration, less versatile for style transfer or modern image manipulation

Code Comparison

Bringing-Old-Photos-Back-to-Life:

from run_face_enhancement import run_face_enhancement
from run_global import run_global
enhanced_image = run_face_enhancement(image_path)
final_result = run_global(enhanced_image)

JoJoGAN:

from style_transfer import style_transfer
styled_image = style_transfer(image_path, style='jojo')

Summary

Bringing-Old-Photos-Back-to-Life is a specialized tool for restoring old photographs, offering a comprehensive pipeline for various restoration tasks. It's backed by Microsoft research, potentially providing more robustness and documentation. However, it has a more complex setup and is less versatile for modern image manipulation compared to JoJoGAN.

JoJoGAN, on the other hand, is simpler to use and more focused on style transfer, particularly for anime-style transformations. It's more suitable for creative image manipulation but lacks the advanced restoration features of Bringing-Old-Photos-Back-to-Life.

The code comparison shows that Bringing-Old-Photos-Back-to-Life involves multiple steps for enhancement, while JoJoGAN offers a more straightforward style transfer function.

PyTorch implementation of AnimeGANv2

Pros of AnimeGAN2-PyTorch

  • More comprehensive documentation and usage instructions
  • Includes pre-trained models for various anime styles
  • Supports both image and video processing

Cons of AnimeGAN2-PyTorch

  • Less focus on fine-tuning for specific art styles
  • May require more setup and dependencies

Code Comparison

AnimeGAN2-PyTorch:

from model import Generator
model = Generator()
model.load_state_dict(torch.load('weights/paprika.pt'))
out = model(img)

JoJoGAN:

from models import Generator
model = Generator(1024, 512, 8, channel_multiplier=2)
model.load_state_dict(torch.load('models/stylegan2-ffhq-config-f.pt')['g_ema'])
out = model(img)

Both repositories focus on anime-style image generation, but AnimeGAN2-PyTorch offers a broader range of pre-trained models and more extensive documentation. JoJoGAN, on the other hand, emphasizes fine-tuning for specific art styles and may be more suitable for users looking to create custom anime-inspired effects. The code snippets show similar usage patterns, with AnimeGAN2-PyTorch potentially requiring less initial setup for out-of-the-box use.

Official PyTorch implementation of StyleGAN3

Pros of StyleGAN3

  • More advanced architecture with improved image quality and reduced artifacts
  • Better support for high-resolution image generation
  • Extensive documentation and official NVIDIA backing

Cons of StyleGAN3

  • Higher computational requirements and longer training times
  • More complex implementation, potentially harder for beginners to use
  • Less focused on specific style transfer tasks compared to JoJoGAN

Code Comparison

JoJoGAN:

def train(args):
    g_ema = Generator(args.size, 512, 8)
    g_ema.load_state_dict(torch.load(args.ckpt)["g_ema"], strict=False)
    g_ema.eval()
    g_ema = g_ema.to(device)

StyleGAN3:

def make_transform(translate, angle):
    m = np.eye(3)
    s = np.sin(angle/360.0*np.pi*2)
    c = np.cos(angle/360.0*np.pi*2)
    m[0][0] = c
    m[0][1] = s
    m[0][2] = translate[0]
    m[1][0] = -s
    m[1][1] = c
    m[1][2] = translate[1]
    return m

A latent text-to-image diffusion model

Pros of Stable-diffusion

  • More versatile, capable of generating a wide range of images
  • Larger community and more active development
  • Better documentation and resources for implementation

Cons of Stable-diffusion

  • Requires more computational resources
  • Longer training and inference times
  • More complex to fine-tune for specific tasks

Code Comparison

JoJoGAN:

def train(model, img_dir, num_epochs=1, batch_size=4):
    dataset = ImageDataset(img_dir)
    dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
    for epoch in range(num_epochs):
        for batch in dataloader:
            # Training logic here

Stable-diffusion:

def train(model, data, num_epochs=1000, batch_size=16):
    for epoch in range(num_epochs):
        for i, (x, y) in enumerate(data):
            model_output = model(x)
            loss = F.mse_loss(model_output, y)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()

JoJoGAN is more focused on style transfer and face manipulation, while Stable-diffusion is a more general-purpose text-to-image model. JoJoGAN's code is simpler and more straightforward, while Stable-diffusion's implementation is more complex but offers greater flexibility and capabilities.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

JoJoGAN: One Shot Face Stylization

arXiv Open In Colab Replicate Hugging Face Spaces Wandb Report

This is the PyTorch implementation of JoJoGAN: One Shot Face Stylization.

Abstract:
While there have been recent advances in few-shot image stylization, these methods fail to capture stylistic details that are obvious to humans. Details such as the shape of the eyes, the boldness of the lines, are especially difficult for a model to learn, especially so under a limited data setting. In this work, we aim to perform one-shot image stylization that gets the details right. Given a reference style image, we approximate paired real data using GAN inversion and finetune a pretrained StyleGAN using that approximate paired data. We then encourage the StyleGAN to generalize so that the learned style can be applied to all other images.

Updates

  • 2021-12-22 Integrated into Replicate using cog. Try it out Replicate

  • 2022-02-03 Updated the paper. Improved stylization quality using discriminator perceptual loss. Added sketch model

  • 2021-12-26 Added wandb logging. Fixed finetuning bug which begins finetuning from previously loaded checkpoint instead of the base face model. Added art model


  • 2021-12-25 Added arcane_multi model which is trained on 4 arcane faces instead of 1 (if anyone has more clean data, let me know!). Better preserves features

  • 2021-12-23 Paper is uploaded to arxiv.

  • 2021-12-22 Integrated into Huggingface Spaces 🤗 using Gradio. Try it out Hugging Face Spaces

  • 2021-12-22 Added pydrive authentication to avoid download limits from gdrive! Fixed running on cpu on colab.

How to use

Everything to get started is in the colab notebook.

Citation

If you use this code or ideas from our paper, please cite our paper:

@article{chong2021jojogan,
  title={JoJoGAN: One Shot Face Stylization},
  author={Chong, Min Jin and Forsyth, David},
  journal={arXiv preprint arXiv:2112.11641},
  year={2021}
}

Acknowledgments

This code borrows from StyleGAN2 by rosalinity, e4e. Some snippets of colab code from StyleGAN-NADA