ganhacks

starter from "How to Train a GAN?" at NIPS2016

11,551

1,666

11,551

View on GitHub

Top Related Projects

DCGAN-tensorflow

7,168

A tensorflow implementation of "Deep Convolutional Generative Adversarial Networks"

CycleGAN

12,676

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Keras-GAN

9,222

Keras implementations of Generative Adversarial Networks.

pix2pix

10,444

Image-to-image translation with conditional adversarial nets

Quick Overview

The soumith/ganhacks repository is a collection of tips and tricks for training Generative Adversarial Networks (GANs). It provides a curated list of best practices, heuristics, and techniques gathered from various sources and experiences in the field of GAN research and development.

Pros

Comprehensive collection of practical tips for GAN training
Community-driven content with contributions from experienced researchers
Regularly updated with new insights and techniques
Applicable to various GAN architectures and applications

Cons

Not a structured tutorial or course, requiring prior knowledge of GANs
Some tips may be specific to certain architectures or datasets
Lacks detailed explanations or theoretical background for each tip
May not cover all edge cases or specific problems in GAN training

Note: As this is not a code library but rather a collection of tips and best practices, there are no code examples or getting started instructions to provide.

Competitor Comparisons

DCGAN-tensorflow

7,168

A tensorflow implementation of "Deep Convolutional Generative Adversarial Networks"

Pros of DCGAN-tensorflow

Provides a complete implementation of DCGAN in TensorFlow
Includes pre-trained models and sample outputs
Offers detailed documentation and usage instructions

Cons of DCGAN-tensorflow

Focuses on a specific GAN architecture (DCGAN)
May require more computational resources to run
Less flexible for experimenting with different GAN variants

Code Comparison

DCGAN-tensorflow (model definition):

def discriminator(image, reuse=False):
    with tf.variable_scope("discriminator") as scope:
        if reuse:
            scope.reuse_variables()
        # Discriminator network architecture
        # ...

ganhacks (tip implementation):

# Use a spherical Z
z_dim = 100
z = uniform(-1, 1, (batchsize, z_dim))
z /= np.sqrt(np.sum(z**2, axis=1, keepdims=True))

Summary

DCGAN-tensorflow provides a complete implementation of DCGAN with pre-trained models and detailed documentation. It's ideal for those looking to work specifically with DCGAN. ganhacks, on the other hand, offers a collection of tips and tricks for training GANs in general, making it more versatile for experimenting with different GAN architectures and techniques.

CycleGAN

12,676

Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.

Pros of CycleGAN

Focuses on a specific GAN architecture for unpaired image-to-image translation
Provides a complete implementation with training and testing scripts
Includes pre-trained models and datasets for immediate use

Cons of CycleGAN

Limited to a single GAN architecture, less versatile than ganhacks
Requires more computational resources and dataset preparation
Steeper learning curve for beginners compared to ganhacks

Code Comparison

CycleGAN (PyTorch implementation):

class CycleGANModel(BaseModel):
    def __init__(self, opt):
        BaseModel.__init__(self, opt)
        self.loss_names = ['D_A', 'G_A', 'cycle_A', 'idt_A', 'D_B', 'G_B', 'cycle_B', 'idt_B']
        self.visual_names = ['real_A', 'fake_B', 'rec_A', 'real_B', 'fake_A', 'rec_B']
        self.model_names = ['G_A', 'G_B', 'D_A', 'D_B']

ganhacks (general tips, no specific implementation):

# Example tip: Use LeakyReLU
model.add(LeakyReLU(alpha=0.2))

# Another tip: Use noise as input to the discriminator
noise = tf.random_normal([batch_size, 1, 1, 1])

CycleGAN provides a complete implementation of a specific GAN architecture, while ganhacks offers general tips and tricks for improving GAN performance across various architectures.

Keras-GAN

9,222

Keras implementations of Generative Adversarial Networks.

Pros of Keras-GAN

Provides implementations of multiple GAN architectures in Keras
Includes ready-to-use code examples for various GAN models
Offers a more structured and comprehensive approach to GAN implementation

Cons of Keras-GAN

Focuses solely on Keras implementations, limiting flexibility for other frameworks
May not cover the latest GAN techniques or optimizations
Less emphasis on general GAN training tips and tricks

Code Comparison

ganhacks:

# No specific code provided, mainly text-based tips

Keras-GAN:

def build_generator(self):
    model = Sequential()
    model.add(Dense(256, input_dim=self.latent_dim))
    model.add(LeakyReLU(alpha=0.2))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Dense(512))
    model.add(LeakyReLU(alpha=0.2))

Summary

ganhacks provides a collection of tips and tricks for training GANs, while Keras-GAN offers concrete implementations of various GAN architectures using the Keras framework. ganhacks is more general and framework-agnostic, focusing on best practices, while Keras-GAN provides ready-to-use code examples for specific GAN models. The choice between the two depends on whether you're looking for general guidance or specific Keras implementations.

the-gan-zoo

14,570

A list of all named GANs!

Pros of the-gan-zoo

Comprehensive list of GAN variants with links to papers and code
Regularly updated with new GAN architectures
Categorized by application areas (e.g., image, video, text)

Cons of the-gan-zoo

Lacks practical implementation tips and tricks
No detailed explanations or tutorials on GAN concepts
Primarily a reference list rather than a hands-on guide

Code comparison

ganhacks:

# Example: Normalize inputs to [-1, 1] range
def normalize(x):
    return (x - 0.5) * 2

the-gan-zoo:

| Paper | Architecture | Code |
|-------|--------------|------|
| [DCGAN](https://arxiv.org/abs/1511.06434) | Deep Convolutional GAN | [Official](https://github.com/Newmu/dcgan_code) |

Summary

ganhacks focuses on practical tips for implementing and training GANs, offering code snippets and best practices. the-gan-zoo serves as a comprehensive catalog of GAN variants, providing links to papers and implementations. While ganhacks is more suitable for developers looking to improve their GAN implementations, the-gan-zoo is an excellent resource for researchers and practitioners seeking an overview of the GAN landscape and finding specific architectures for their needs.

pix2pix

10,444

Image-to-image translation with conditional adversarial nets

Pros of pix2pix

Focused on image-to-image translation tasks
Provides a complete implementation with training and testing scripts
Includes pre-trained models for various applications

Cons of pix2pix

Limited to specific image translation tasks
Requires paired datasets for training
More complex setup and usage compared to general GAN tips

Code Comparison

pix2pix (model definition):

class UnetGenerator(nn.Module):
    def __init__(self, input_nc, output_nc, num_downs, ngf=64):
        super(UnetGenerator, self).__init__()
        # U-Net architecture implementation

ganhacks (tip implementation):

# Use virtual batch normalization
def virtual_batch_normalization(x, gamma, beta, mean, var, eps=1e-5):
    return gamma * (x - mean) / torch.sqrt(var + eps) + beta

Summary

pix2pix is a specialized framework for image-to-image translation tasks, offering a complete implementation with pre-trained models. It's more focused but requires paired datasets and has a steeper learning curve. ganhacks, on the other hand, provides general tips and tricks for improving GAN performance across various applications, making it more versatile but less specialized. The code comparison highlights the difference in scope, with pix2pix showing a full model architecture and ganhacks demonstrating a specific optimization technique.

fast-neural-style

4,331

Feedforward style transfer

Pros of fast-neural-style

Focuses specifically on neural style transfer, providing a more specialized and optimized solution
Includes pre-trained models for quick style transfer applications
Offers both CPU and GPU support for broader accessibility

Cons of fast-neural-style

Limited to style transfer tasks, whereas ganhacks covers a wider range of GAN-related techniques
Less frequently updated compared to ganhacks, which may impact its relevance to current research

Code Comparison

fast-neural-style:

local cmd = torch.CmdLine()
cmd:option('-style_image', 'examples/inputs/seated-nude.jpg', 'Style target image')
cmd:option('-content_image', 'examples/inputs/tubingen.jpg', 'Content target image')
cmd:option('-image_size', 512, 'Maximum height / width of generated image')
cmd:option('-gpu', 0, 'Zero-indexed ID of the GPU to use; for CPU mode set -gpu = -1')

ganhacks:

# No specific code snippets available for comparison
# ganhacks primarily consists of textual tips and tricks for training GANs

Note: ganhacks is a collection of best practices and tips for training GANs, while fast-neural-style provides actual implementation code for neural style transfer. This fundamental difference in purpose makes a direct code comparison less relevant.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

(this list is no longer maintained, and I am not sure how relevant it is in 2020)

How to Train a GAN? Tips and tricks to make GANs work

While research in Generative Adversarial Networks (GANs) continues to improve the fundamental stability of these models, we use a bunch of tricks to train them and make them stable day to day.

Here are a summary of some of the tricks.

Here's a link to the authors of this document

If you find a trick that is particularly useful in practice, please open a Pull Request to add it to the document. If we find it to be reasonable and verified, we will merge it in.

1. Normalize the inputs

normalize the images between -1 and 1
Tanh as the last layer of the generator output

2: A modified loss function

In GAN papers, the loss function to optimize G is min (log 1-D), but in practice folks practically use max log D

because the first formulation has vanishing gradients early on
Goodfellow et. al (2014)

In practice, works well:

Flip labels when training generator: real = fake, fake = real

3: Use a spherical Z

Dont sample from a Uniform distribution

cube

Sample from a gaussian distribution

sphere

When doing interpolations, do the interpolation via a great circle, rather than a straight line from point A to point B
Tom White's Sampling Generative Networks ref code https://github.com/dribnet/plat has more details

4: BatchNorm

Construct different mini-batches for real and fake, i.e. each mini-batch needs to contain only all real images or all generated images.
when batchnorm is not an option use instance normalization (for each sample, subtract mean and divide by standard deviation).

batchmix

5: Avoid Sparse Gradients: ReLU, MaxPool

the stability of the GAN game suffers if you have sparse gradients
LeakyReLU = good (in both G and D)
For Downsampling, use: Average Pooling, Conv2d + stride
For Upsampling, use: PixelShuffle, ConvTranspose2d + stride
- PixelShuffle: https://arxiv.org/abs/1609.05158

6: Use Soft and Noisy Labels

Label Smoothing, i.e. if you have two target labels: Real=1 and Fake=0, then for each incoming sample, if it is real, then replace the label with a random number between 0.7 and 1.2, and if it is a fake sample, replace it with 0.0 and 0.3 (for example).
- Salimans et. al. 2016
make the labels the noisy for the discriminator: occasionally flip the labels when training the discriminator

7: DCGAN / Hybrid Models

Use DCGAN when you can. It works!
if you cant use DCGANs and no model is stable, use a hybrid model : KL + GAN or VAE + GAN

8: Use stability tricks from RL

Experience Replay
- Keep a replay buffer of past generations and occassionally show them
- Keep checkpoints from the past of G and D and occassionaly swap them out for a few iterations
All stability tricks that work for deep deterministic policy gradients
See Pfau & Vinyals (2016)

9: Use the ADAM Optimizer

optim.Adam rules!
- See Radford et. al. 2015
Use SGD for discriminator and ADAM for generator

10: Track failures early

D loss goes to 0: failure mode
check norms of gradients: if they are over 100 things are screwing up
when things are working, D loss has low variance and goes down over time vs having huge variance and spiking
if loss of generator steadily decreases, then it's fooling D with garbage (says martin)

11: Dont balance loss via statistics (unless you have a good reason to)

Dont try to find a (number of G / number of D) schedule to uncollapse training
It's hard and we've all tried it.
If you do try it, have a principled approach to it, rather than intuition

For example

while lossD > A:
  train D
while lossG > B:
  train G

12: If you have labels, use them

if you have labels available, training the discriminator to also classify the samples: auxillary GANs

13: Add noise to inputs, decay over time

Add some artificial noise to inputs to D (Arjovsky et. al., Huszar, 2016)
- http://www.inference.vc/instance-noise-a-trick-for-stabilising-gan-training/
- https://openreview.net/forum?id=Hk4_qw5xe
adding gaussian noise to every layer of generator (Zhao et. al. EBGAN)
- Improved GANs: OpenAI code also has it (commented out)

14: [notsure] Train discriminator more (sometimes)

especially when you have noise
hard to find a schedule of number of D iterations vs G iterations

15: [notsure] Batch Discrimination

Mixed results

16: Discrete variables in Conditional GANs

Use an Embedding layer
Add as additional channels to images
Keep embedding dimensionality low and upsample to match image channel size

17: Use Dropouts in G in both train and test phase

Provide noise in the form of dropout (50%).
Apply on several layers of our generator at both training and test time
https://arxiv.org/pdf/1611.07004v1.pdf

Authors

Soumith Chintala
Emily Denton
Martin Arjovsky
Michael Mathieu

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot