zi2zi

Learning Chinese Character style with conditional GAN

2,671

481

2,671

View on GitHub

Top Related Projects

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

CycleGAN

12,755

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Keras-GAN

9,237

Keras implementations of Generative Adversarial Networks.

stargan

5,282

StarGAN - Official PyTorch Implementation (CVPR 2018)

Quick Overview

zi2zi is a project that implements a generative adversarial network (GAN) for Chinese character generation and style transfer. It allows users to generate Chinese characters in various styles based on input images, making it useful for font generation and character style transfer tasks.

Pros

Enables generation of Chinese characters in different styles
Supports style transfer between different character fonts
Provides a pre-trained model for quick experimentation
Includes a comprehensive dataset of Chinese characters

Cons

Limited to Chinese characters, not applicable to other writing systems
Requires significant computational resources for training
May produce artifacts or inconsistencies in generated characters
Documentation could be more detailed for easier setup and usage

Code Examples

Loading a pre-trained model:

from model.unet import UNet
from model.zi2zi_model import Zi2ZiModel

unet = UNet(input_dim=1, output_dim=1)
model = Zi2ZiModel(input_nc=1, embedding_num=40, embedding_dim=128, Unet=unet)
model.load_state_dict(torch.load('pretrained_model.pth'))

Generating a character:

import torch

input_char = torch.randn(1, 1, 256, 256)  # Random input
style_embedding = torch.randn(1, 128)  # Random style
generated_char = model(input_char, style_embedding)

Performing style transfer:

source_char = torch.randn(1, 1, 256, 256)  # Source character
target_style = torch.randn(1, 128)  # Target style embedding
transferred_char = model.transfer(source_char, target_style)

Getting Started

Clone the repository:

git clone https://github.com/kaonashi-tyc/zi2zi.git
cd zi2zi

Install dependencies:
```
pip install -r requirements.txt
```

Download the pre-trained model and dataset:

wget https://github.com/kaonashi-tyc/zi2zi/releases/download/v0.1/pretrained_model.pth
wget https://github.com/kaonashi-tyc/zi2zi/releases/download/v0.1/dataset.zip
unzip dataset.zip

Run the demo script:

python demo.py --model_path pretrained_model.pth --input_path dataset/test --output_path output

Competitor Comparisons

pytorch-CycleGAN-and-pix2pix

24,306

Image-to-Image Translation in PyTorch

Pros of pytorch-CycleGAN-and-pix2pix

More versatile, supporting multiple image-to-image translation tasks
Implements both CycleGAN and pix2pix architectures
Actively maintained with regular updates and improvements

Cons of pytorch-CycleGAN-and-pix2pix

Not specifically optimized for Chinese character generation
May require more computational resources due to its broader scope
Steeper learning curve for users focused solely on font generation

Code Comparison

zi2zi (TensorFlow):

def discriminator(self, inp, reuse=False):
    with tf.variable_scope("discriminator", reuse=reuse):
        conv = conv2d(inp, 64, kernel=4, stride=2, padding="SAME", name="conv1")
        conv = leaky_relu(conv)

pytorch-CycleGAN-and-pix2pix (PyTorch):

def __init__(self, input_nc, ndf=64, n_layers=3, norm_layer=nn.BatchNorm2d):
    super(NLayerDiscriminator, self).__init__()
    kw = 4
    padw = 1
    sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)]

The code snippets show differences in framework (TensorFlow vs. PyTorch) and implementation details. zi2zi uses a more straightforward approach, while pytorch-CycleGAN-and-pix2pix offers a more flexible and customizable architecture.

CycleGAN

12,755

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

More versatile, capable of handling various image-to-image translation tasks
Doesn't require paired training data, allowing for broader application
Implements cycle consistency loss for improved results

Cons of CycleGAN

May struggle with preserving fine details in complex transformations
Computationally more intensive due to its dual generator-discriminator architecture
Less specialized for character-based tasks compared to zi2zi

Code Comparison

zi2zi (character generation focus):

embedding = self.embedding(one_hot_ids)
embedding = embedding.view(embedding.size(0), embedding.size(1), 1, 1)
h = torch.cat([embedding, noise], 1)

CycleGAN (general image translation):

def forward(self, input):
    return self.model(input)

def backward_D_basic(self, netD, real, fake):
    pred_real = netD(real)
    loss_D_real = self.criterionGAN(pred_real, True)

The code snippets highlight zi2zi's focus on character embedding and noise integration, while CycleGAN emphasizes a more general approach to image translation with separate generator and discriminator networks.

Keras-GAN

9,237

Keras implementations of Generative Adversarial Networks.

Pros of Keras-GAN

Implements multiple GAN architectures in a single repository
Uses Keras, which is more beginner-friendly and has a simpler API
Provides a broader range of GAN applications beyond font generation

Cons of Keras-GAN

Less specialized for Chinese character generation
May require more customization for specific font-related tasks
Lacks some of the domain-specific features found in zi2zi

Code Comparison

zi2zi (PyTorch):

class UNet(nn.Module):
    def __init__(self, input_nc, output_nc, ngf=64):
        super(UNet, self).__init__()
        # UNet architecture implementation

Keras-GAN (Keras):

def build_generator(self):
    model = Sequential()
    model.add(Dense(256, input_dim=self.latent_dim))
    model.add(LeakyReLU(alpha=0.2))
    # Generator architecture implementation

The zi2zi repository uses PyTorch and implements a UNet architecture specifically for font generation, while Keras-GAN uses Keras and provides a more general GAN implementation that can be adapted for various tasks.

stargan

5,282

StarGAN - Official PyTorch Implementation (CVPR 2018)

Pros of StarGAN

More versatile, capable of handling multiple domains in a single model
Supports both image-to-image translation and attribute manipulation
Offers better scalability for diverse datasets

Cons of StarGAN

More complex architecture, potentially harder to implement and fine-tune
May require more computational resources for training and inference
Less specialized for specific tasks like Chinese character generation

Code Comparison

StarGAN:

def build_model(self):
    self.G = Generator(self.g_conv_dim, self.c_dim, self.g_repeat_num)
    self.D = Discriminator(self.image_size, self.d_conv_dim, self.c_dim, self.d_repeat_num)

zi2zi:

def build_model(self):
    self.generator = UNet(self.embedding_num, self.embedding_dim, self.Lout)
    self.discriminator = DCGANDiscriminator(self.embedding_num, self.embedding_dim, self.Lout)

StarGAN uses a more generic Generator and Discriminator structure, while zi2zi employs a UNet-based generator and a DCGAN-style discriminator, reflecting its focus on character generation.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks

animation

Introduction

Learning eastern asian language typefaces with GAN. zi2zi(åå°å, meaning from character to character) is an application and extension of the recent popular pix2pix model to Chinese characters.

Details could be found in this blog post.

Network Structure

Original Model

alt network

The network structure is based off pix2pix with the addition of category embedding and two other losses, category loss and constant loss, from AC-GAN and DTN respectively.

Updated Model with Label Shuffling

alt network

After sufficient training, d_loss will drop to near zero, and the model's performance plateaued. Label Shuffling mitigate this problem by presenting new challenges to the model.

Specifically, within a given minibatch, for the same set of source characters, we generate two sets of target characters: one with correct embedding labels, the other with the shuffled labels. The shuffled set likely will not have the corresponding target images to compute L1_Loss, but can be used as a good source for all other losses, forcing the model to further generalize beyond the limited set of provided examples. Empirically, label shuffling improves the model's generalization on unseen data with better details, and decrease the required number of characters.

You can enable label shuffling by setting flip_labels=1 option in train.py script. It is recommended that you enable this after d_loss flatlines around zero, for further tuning.

Gallery

Compare with Ground Truth

compare

Brush Writing Fonts

brush

Cursive Script (Requested by SNS audience)

cursive

Mingchao Style (å®ä½/ææä½)

gaussian

Korean

korean

Interpolation

animation

Animation

animation

easter egg

How to Use

Step Zero

Download tons of fonts as you please

Requirement

Python 2.7
CUDA
cudnn
Tensorflow >= 1.0.1
Pillow(PIL)
numpy >= 1.12.1
scipy >= 0.18.1
imageio

Preprocess

To avoid IO bottleneck, preprocessing is necessary to pickle your data into binary and persist in memory during training.

First run the below command to get the font images:

python font2img.py --src_font=src.ttf
                   --dst_font=tgt.otf
                   --charset=CN 
                   --sample_count=1000
                   --sample_dir=dir
                   --label=0
                   --filter=1
                   --shuffle=1

Four default charsets are offered: CN, CN_T(traditional), JP, KR. You can also point it to a one line file, it will generate the images of the characters in it. Note, filter option is highly recommended, it will pre sample some characters and filter all the images that have the same hash, usually indicating that character is missing. label indicating index in the category embeddings that this font associated with, default to 0.

After obtaining all images, run package.py to pickle the images and their corresponding labels into binary format:

python package.py --dir=image_directories
                  --save_dir=binary_save_directory
                  --split_ratio=[0,1]

After running this, you will find two objects train.obj and val.obj under the save_dir for training and validation, respectively.

Experiment Layout

experiment/
âââ data
    âââ train.obj
    âââ val.obj

Create a experiment directory under the root of the project, and a data directory within it to place the two binaries. Assuming a directory layout enforce bettet data isolation, especially if you have multiple experiments running.

Train

To start training run the following command

python train.py --experiment_dir=experiment 
                --experiment_id=0
                --batch_size=16 
                --lr=0.001
                --epoch=40 
                --sample_steps=50 
                --schedule=20 
                --L1_penalty=100 
                --Lconst_penalty=15

schedule here means in between how many epochs, the learning rate will decay by half. The train command will create sample,logs,checkpoint directory under experiment_dir if non-existed, where you can check and manage the progress of your training.

Infer and Interpolate

After training is done, run the below command to infer test data:

python infer.py --model_dir=checkpoint_dir/ 
                --batch_size=16 
                --source_obj=binary_obj_path 
                --embedding_ids=label[s] of the font, separate by comma
                --save_dir=save_dir/

Also you can do interpolation with this command:

python infer.py --model_dir= checkpoint_dir/ 
                --batch_size=10
                --source_obj=obj_path 
                --embedding_ids=label[s] of the font, separate by comma
                --save_dir=frames/ 
                --output_gif=gif_path 
                --interpolate=1 
                --steps=10
                --uroboros=1

It will run through all the pairs of fonts specified in embedding_ids and interpolate the number of steps as specified.

Pretrained Model

Pretained model can be downloaded here which is trained with 27 fonts, only generator is saved to reduce the model size. You can use encoder in the this pretrained model to accelerate the training process.

Acknowledgements

Code derived and rehashed from:

License

Apache 2.0

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of pytorch-CycleGAN-and-pix2pix

Cons of pytorch-CycleGAN-and-pix2pix

Code Comparison

Pros of CycleGAN

Cons of CycleGAN

Code Comparison

Pros of Keras-GAN

Cons of Keras-GAN

Code Comparison

Pros of StarGAN

Cons of StarGAN

Code Comparison

Convert designs to code with AI

README

zi2zi: Master Chinese Calligraphy with Conditional Adversarial Networks

Introduction

Network Structure

Original Model

Updated Model with Label Shuffling

Gallery

Compare with Ground Truth

Brush Writing Fonts

Cursive Script (Requested by SNS audience)

Mingchao Style (å®ä½/ææä½)

Korean

Interpolation

Animation

How to Use

Step Zero

Requirement

Preprocess

Experiment Layout

Train

Infer and Interpolate

Pretrained Model

Acknowledgements

License

Top Related Projects

Convert designs to code with AI

Mingchao Style (å®ä½/ææä½)