diffusion

Denoising Diffusion Probabilistic Models

4,512

425

4,512

View on GitHub

Top Related Projects

stable-diffusion

71,028

A latent text-to-image diffusion model

ddim

1,662

Denoising Diffusion Implicit Models

denoising-diffusion-pytorch

9,537

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

k-diffusion

2,476

Karras et al. (2022) diffusion models for PyTorch

Quick Overview

The hojonathanho/diffusion repository is a PyTorch implementation of denoising diffusion probabilistic models (DDPMs) for image generation. It provides a framework for training and sampling from diffusion models, which have shown impressive results in generating high-quality images.

Pros

Implements state-of-the-art diffusion models for image generation
Provides a flexible and extensible codebase for experimenting with diffusion models
Includes pre-trained models and example scripts for quick start and experimentation
Supports various image datasets and model architectures

Cons

Limited documentation and explanations of the underlying concepts
May require significant computational resources for training large models
Lacks extensive hyperparameter tuning guidelines
Not actively maintained (last commit was over a year ago at the time of writing)

Code Examples

Loading a pre-trained model and generating samples:

import torch
from models import UNet
from diffusion import GaussianDiffusion

model = UNet(
    dim=64,
    dim_mults=(1, 2, 4, 8)
)
diffusion = GaussianDiffusion(
    model,
    image_size=32,
    timesteps=1000,
    loss_type='l1'
)

ckpt = torch.load('path/to/checkpoint.pt')
diffusion.load_state_dict(ckpt['model'])

samples = diffusion.sample(batch_size=16)

Training a new diffusion model:

from torchvision import datasets, transforms
from torch.utils.data import DataLoader

transform = transforms.Compose([
    transforms.Resize(32),
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

for epoch in range(100):
    for batch in dataloader:
        loss = diffusion(batch)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

Customizing the noise schedule:

from diffusion import GaussianDiffusion, cosine_beta_schedule

custom_betas = cosine_beta_schedule(timesteps=1000)
diffusion = GaussianDiffusion(
    model,
    image_size=32,
    timesteps=1000,
    loss_type='l1',
    betas=custom_betas
)

Getting Started

Clone the repository:

git clone https://github.com/hojonathanho/diffusion.git
cd diffusion

Install dependencies:
```
pip install -r requirements.txt
```

Train a model or use a pre-trained one:

from models import UNet
from diffusion import GaussianDiffusion

model = UNet(dim=64, dim_mults=(1, 2, 4, 8))
diffusion = GaussianDiffusion(model, image_size=32, timesteps=1000)

# Train the model or load a pre-trained checkpoint
# Generate samples
samples = diffusion.sample(batch_size=16)

Competitor Comparisons

stable-diffusion

71,028

A latent text-to-image diffusion model

Pros of stable-diffusion

More advanced and widely adopted in the AI image generation community
Offers pre-trained models and easier integration with various applications
Extensive documentation and community support

Cons of stable-diffusion

Larger model size and higher computational requirements
More complex architecture, potentially harder for beginners to understand and modify

Code Comparison

diffusion:

def p_sample(self, x, t, clip_denoised=True):
    out = self.p_mean_variance(x, t, clip_denoised=clip_denoised)
    noise = torch.randn_like(x)
    nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
    return out["mean"] + nonzero_mask * torch.exp(0.5 * out["log_variance"]) * noise

stable-diffusion:

@torch.no_grad()
def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
                  temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
                  unconditional_guidance_scale=1., unconditional_conditioning=None, old_eps=None, t_next=None):
    b, *_, device = *x.shape, x.device
    e_t = self.model.apply_model(x, t, c)
    if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
        e_t = e_t
    else:
        e_t = e_t * unconditional_guidance_scale + (1. - unconditional_guidance_scale) * self.model.apply_model(x, t, unconditional_conditioning)

guided-diffusion

6,865

Pros of guided-diffusion

More comprehensive documentation and examples
Supports a wider range of diffusion models and techniques
Actively maintained with regular updates and contributions

Cons of guided-diffusion

Higher computational requirements due to more complex models
Steeper learning curve for beginners
Less focus on specific image generation tasks

Code Comparison

guided-diffusion:

def create_model(
    image_size,
    num_channels,
    num_res_blocks,
    channel_mult="",
    learn_sigma=False,
    class_cond=False,
    use_checkpoint=False,
    attention_resolutions="16",
    num_heads=1,
    num_head_channels=-1,
    num_heads_upsample=-1,
    use_scale_shift_norm=False,
    dropout=0,
    resblock_updown=False,
    use_fp16=False,
    use_new_attention_order=False,
):
    # ... (implementation details)

diffusion:

def create_model(
    image_size,
    num_channels,
    num_res_blocks,
    channel_mult="",
    learn_sigma=False,
    class_cond=False,
    use_checkpoint=False,
    attention_resolutions="16",
    num_heads=1,
    num_head_channels=-1,
    num_heads_upsample=-1,
    use_scale_shift_norm=False,
    dropout=0,
):
    # ... (implementation details)

The code comparison shows that guided-diffusion offers more parameters and options for model creation, allowing for greater flexibility and customization.

ddim

1,662

Denoising Diffusion Implicit Models

Pros of ddim

Implements Denoising Diffusion Implicit Models (DDIM), offering faster sampling
Provides a more comprehensive set of pre-trained models
Includes additional utilities for image manipulation and processing

Cons of ddim

Less active development and maintenance compared to diffusion
Fewer options for customizing the diffusion process
Limited documentation on advanced usage and model fine-tuning

Code Comparison

diffusion:

def p_sample_loop(model, shape, noise=None, clip_denoised=True, denoised_fn=None,
                  model_kwargs=None, device=None, progress=False):
    final = None
    for sample in p_sample_loop_progressive(
        model,
        shape,
        noise=noise,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        model_kwargs=model_kwargs,
        device=device,
        progress=progress,
    ):
        final = sample
    return final["sample"]

ddim:

def p_sample_ddim(model, x, t, clip_denoised=True, denoised_fn=None, model_kwargs=None):
    out = p_mean_variance(
        model,
        x,
        t,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        model_kwargs=model_kwargs,
    )
    noise = th.randn_like(x)
    nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
    sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
    return {"sample": sample, "pred_xstart": out["pred_xstart"]}

denoising-diffusion-pytorch

9,537

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Pros of denoising-diffusion-pytorch

More comprehensive implementation with additional features like conditional generation and image inpainting
Better documentation and code organization, making it easier for users to understand and modify
Active development and maintenance, with regular updates and improvements

Cons of denoising-diffusion-pytorch

Higher complexity, which may be overwhelming for beginners or those seeking a simpler implementation
Potentially slower execution due to additional features and abstractions

Code Comparison

diffusion:

def p_sample(self, x, t, clip_denoised=True):
    t_batch = torch.full((x.shape[0],), t, device=x.device, dtype=torch.long)
    noise_level = extract(self.sqrt_alphas_cumprod, t_batch, x.shape)
    x_recon = self.predict_start_from_noise(x, t=t_batch, noise=self.denoise_fn(x, t_batch))
    if clip_denoised:
        x_recon.clamp_(-1., 1.)

denoising-diffusion-pytorch:

@torch.no_grad()
def p_sample(self, x, t, t_index):
    betas_t = extract(self.betas, t, x.shape)
    sqrt_one_minus_alphas_cumprod_t = extract(self.sqrt_one_minus_alphas_cumprod, t, x.shape)
    sqrt_recip_alphas_t = extract(self.sqrt_recip_alphas, t, x.shape)
    model_mean = sqrt_recip_alphas_t * (x - betas_t * self.model(x, t) / sqrt_one_minus_alphas_cumprod_t)

k-diffusion

2,476

Karras et al. (2022) diffusion models for PyTorch

Pros of k-diffusion

More flexible and customizable implementation
Supports a wider range of sampling methods
Better documentation and examples for usage

Cons of k-diffusion

Less optimized for performance in some cases
May require more setup and configuration
Potentially steeper learning curve for beginners

Code Comparison

k-diffusion:

model = diffusion.get_model(config)
x = torch.randn(1, 3, 64, 64)
samples = diffusion.sample(model, x, steps=1000)

diffusion:

model = create_model(config)
noise = torch.randn(1, 3, 64, 64)
samples = p_sample_loop(model, noise, num_timesteps=1000)

Both repositories provide implementations of diffusion models, but k-diffusion offers more flexibility and customization options, while diffusion may be simpler to use out of the box. The code comparison shows that k-diffusion uses a more modular approach, separating the model creation and sampling steps, while diffusion combines these steps in a single function call.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, Pieter Abbeel

Paper: https://arxiv.org/abs/2006.11239

Website: https://hojonathanho.github.io/diffusion

Samples generated by our model

Experiments run on Google Cloud TPU v3-8. Requires TensorFlow 1.15 and Python 3.5, and these dependencies for CPU instances (see requirements.txt):

pip3 install fire
pip3 install scipy
pip3 install pillow
pip3 install tensorflow-probability==0.8
pip3 install tensorflow-gan==0.0.0.dev0
pip3 install tensorflow-datasets==2.1.0

The training and evaluation scripts are in the scripts/ subdirectory. The commands to run training and evaluation are in comments at the top of the scripts. Data is stored in GCS buckets. The scripts are written to assume that the bucket names are of the form gs://mybucketprefix-us-central1; i.e. some prefix followed by the region. The prefix should be passed into the scripts using the --bucket_name_prefix flag.

Models and samples can be found at: https://www.dropbox.com/sh/pm6tn31da21yrx4/AABWKZnBzIROmDjGxpB6vn6Ja

Citation

If you find our work relevant to your research, please cite:

@article{ho2020denoising,
    title={Denoising Diffusion Probabilistic Models},
    author={Jonathan Ho and Ajay Jain and Pieter Abbeel},
    year={2020},
    journal={arXiv preprint arxiv:2006.11239}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot