Convert Figma logo to code with AI

ermongroup logoddim

Denoising Diffusion Implicit Models

1,504
209
1,504
15

Top Related Projects

A latent text-to-image diffusion model

Denoising Diffusion Probabilistic Models

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Karras et al. (2022) diffusion models for PyTorch

Quick Overview

The ermongroup/ddim repository is an implementation of Denoising Diffusion Implicit Models (DDIMs), a class of non-Markovian diffusion processes that generate high-quality samples faster than their Markovian counterparts. This project provides a PyTorch implementation of DDIMs, along with training and sampling scripts for various datasets.

Pros

  • Faster sampling compared to traditional diffusion models
  • High-quality image generation results
  • Flexible architecture that can be applied to various datasets
  • Well-documented codebase with clear instructions

Cons

  • Requires significant computational resources for training
  • Limited to image generation tasks
  • May require fine-tuning for specific use cases
  • Relatively complex mathematical foundation, which may be challenging for beginners

Code Examples

  1. Loading a pre-trained DDIM model:
from models.diffusion import Model

model = Model(args)
model.load_state_dict(torch.load('path/to/checkpoint.pth'))
model.eval()
  1. Generating samples using DDIM:
from sampling import ddim_sampling

samples = ddim_sampling(model, x_T, alphas, betas, T, eta)
  1. Training a DDIM model:
from train import train

train(args, model, train_loader, optimizer, scheduler)

Getting Started

To get started with DDIM:

  1. Clone the repository:

    git clone https://github.com/ermongroup/ddim.git
    cd ddim
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Train a model or use a pre-trained one:

    from models.diffusion import Model
    from train import train
    
    model = Model(args)
    train(args, model, train_loader, optimizer, scheduler)
    
  4. Generate samples:

    from sampling import ddim_sampling
    
    samples = ddim_sampling(model, x_T, alphas, betas, T, eta)
    

Competitor Comparisons

Pros of guided-diffusion

  • More comprehensive implementation with additional features like classifier guidance
  • Better documentation and code organization
  • Supports a wider range of diffusion models and sampling techniques

Cons of guided-diffusion

  • Higher computational requirements due to more complex models
  • Steeper learning curve for newcomers to diffusion models
  • Less focused on specific DDIM implementation

Code Comparison

guided-diffusion:

def p_sample_loop(
    model,
    shape,
    noise=None,
    clip_denoised=True,
    denoised_fn=None,
    model_kwargs=None,
    device=None,
    progress=False,
):
    # ... (implementation details)

ddim:

def p_sample_ddim(model, x, t, clip_denoised=True, denoised_fn=None, model_kwargs=None):
    out = p_mean_variance(
        model,
        x,
        t,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        model_kwargs=model_kwargs,
    )
    # ... (implementation details)

The guided-diffusion repository offers a more flexible sampling loop with additional parameters, while ddim provides a more focused implementation of the DDIM sampling process. guided-diffusion's approach allows for easier integration of various sampling techniques, but ddim's implementation is more straightforward for those specifically interested in DDIM.

A latent text-to-image diffusion model

Pros of stable-diffusion

  • More advanced and versatile image generation capabilities
  • Larger community and active development
  • Supports text-to-image generation

Cons of stable-diffusion

  • Higher computational requirements
  • More complex architecture and implementation
  • Larger model size, requiring more storage and memory

Code Comparison

DDIM:

def p_sample(self, x, t, clip_denoised=True):
    out = self.p_mean_variance(x, t)
    noise = torch.randn_like(x)
    nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
    return out["mean"] + nonzero_mask * torch.exp(0.5 * out["log_variance"]) * noise

stable-diffusion:

@torch.no_grad()
def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
                  temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
                  unconditional_guidance_scale=1., unconditional_conditioning=None, old_eps=None, t_next=None):
    b, *_, device = *x.shape, x.device

The code snippets show that stable-diffusion has a more complex sampling function with additional parameters and features compared to DDIM's simpler implementation.

Denoising Diffusion Probabilistic Models

Pros of diffusion

  • More comprehensive implementation of diffusion models
  • Includes additional features like conditional sampling and various noise schedules
  • Better documentation and code organization

Cons of diffusion

  • Larger codebase, potentially more complex to understand and modify
  • May require more computational resources due to additional features
  • Less focused on specific applications compared to ddim

Code Comparison

diffusion:

def p_sample(self, model, x, t, clip_denoised=True, denoised_fn=None, cond_fn=None, model_kwargs=None):
    out = self.p_mean_variance(
        model,
        x,
        t,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        model_kwargs=model_kwargs,
    )

ddim:

def p_sample(self, model, x, t, t_index, betas):
    e_t = model(x, t)
    alphas = 1.0 - betas
    alphas_cumprod = torch.cumprod(alphas, dim=0)
    x_recon = self._predict_xstart_from_eps(x, t_index, e_t)

The code snippets show different approaches to sampling in the diffusion process, with diffusion offering more flexibility and options, while ddim focuses on a specific implementation.

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Pros of denoising-diffusion-pytorch

  • More comprehensive implementation with additional features like conditional generation and image inpainting
  • Better documentation and code organization, making it easier for users to understand and modify
  • Active development and regular updates, incorporating latest research and improvements

Cons of denoising-diffusion-pytorch

  • Higher complexity, which may be overwhelming for beginners or those seeking a simpler implementation
  • Potentially slower inference time due to the additional features and flexibility

Code Comparison

ddim:

def p_sample(self, x, t, clip_denoised=True):
    out = self.p_mean_variance(x, t)
    noise = torch.randn_like(x)
    nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
    return out["mean"] + nonzero_mask * torch.exp(0.5 * out["log_variance"]) * noise

denoising-diffusion-pytorch:

@torch.no_grad()
def p_sample(self, x, t: int, x_self_cond = None, clip_denoised = True):
    b, *_, device = *x.shape, x.device
    model_mean, _, model_log_variance = self.p_mean_variance(x = x, t = t, x_self_cond = x_self_cond)
    noise = torch.randn_like(x) if t > 0 else 0.
    pred_img = model_mean + (0.5 * model_log_variance).exp() * noise
    return pred_img, x_self_cond

Karras et al. (2022) diffusion models for PyTorch

Pros of k-diffusion

  • More flexible sampling algorithms, including advanced methods like DPM-Solver
  • Better support for conditional generation and guidance techniques
  • More active development and community support

Cons of k-diffusion

  • Steeper learning curve due to more complex architecture
  • Less focus on theoretical foundations compared to DDIM
  • May require more computational resources for some advanced sampling methods

Code Comparison

k-diffusion:

model = diffusion.DiffusionModel(...)
x = torch.randn(...)
samples = diffusion.sample(model, x, steps=20, eta=0.0)

DDIM:

model = DDIM(...)
x_T = torch.randn(...)
samples = model.sample(x_T, num_steps=20)

Both repositories provide implementations of diffusion models, but k-diffusion offers a wider range of sampling algorithms and more flexibility in model architecture. DDIM focuses on a specific sampling technique (Denoising Diffusion Implicit Models) and provides a more straightforward implementation. k-diffusion is generally more suitable for advanced users and researchers exploring various diffusion model variants, while DDIM may be preferable for those specifically interested in the DDIM algorithm or seeking a simpler starting point.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Denoising Diffusion Implicit Models (DDIM)

Jiaming Song, Chenlin Meng and Stefano Ermon, Stanford

Implements sampling from an implicit model that is trained with the same procedure as Denoising Diffusion Probabilistic Model, but costs much less time and compute if you want to sample from it (click image below for a video demo):

Integration with 🤗 Diffusers library

DDIM is now also available in 🧨 Diffusers and accesible via the DDIMPipeline. Diffusers allows you to test DDIM in PyTorch in just a couple lines of code.

You can install diffusers as follows:

pip install diffusers torch accelerate

And then try out the model with just a couple lines of code:

from diffusers import DDIMPipeline

model_id = "google/ddpm-cifar10-32"

# load model and scheduler
ddim = DDIMPipeline.from_pretrained(model_id)

# run pipeline in inference (sample random noise and denoise)
image = ddim(num_inference_steps=50).images[0]

# save image
image.save("ddim_generated_image.png")

More DDPM/DDIM models compatible with hte DDIM pipeline can be found directly on the Hub

To better understand the DDIM scheduler, you can check out this introductionary google colab

The DDIM scheduler can also be used with more powerful diffusion models such as Stable Diffusion

You simply need to accept the license on the Hub, login with huggingface-cli login and install transformers:

pip install transformers

Then you can run:

from diffusers import StableDiffusionPipeline, DDIMScheduler

ddim = DDIMScheduler.from_config("runwayml/stable-diffusion-v1-5", subfolder="scheduler")
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler=ddim)

image = pipeline("An astronaut riding a horse.").images[0]

image.save("astronaut_riding_a_horse.png")

Running the Experiments

The code has been tested on PyTorch 1.6.

Train a model

Training is exactly the same as DDPM with the following:

python main.py --config {DATASET}.yml --exp {PROJECT_PATH} --doc {MODEL_NAME} --ni

Sampling from the model

Sampling from the generalized model for FID evaluation

python main.py --config {DATASET}.yml --exp {PROJECT_PATH} --doc {MODEL_NAME} --sample --fid --timesteps {STEPS} --eta {ETA} --ni

where

  • ETA controls the scale of the variance (0 is DDIM, and 1 is one type of DDPM).
  • STEPS controls how many timesteps used in the process.
  • MODEL_NAME finds the pre-trained checkpoint according to its inferred path.

If you want to use the DDPM pretrained model:

python main.py --config {DATASET}.yml --exp {PROJECT_PATH} --use_pretrained --sample --fid --timesteps {STEPS} --eta {ETA} --ni

the --use_pretrained option will automatically load the model according to the dataset.

We provide a CelebA 64x64 model here, and use the DDPM version for CIFAR10 and LSUN.

If you want to use the version with the larger variance in DDPM: use the --sample_type ddpm_noisy option.

Sampling from the model for image inpainting

Use --interpolation option instead of --fid.

Sampling from the sequence of images that lead to the sample

Use --sequence option instead.

The above two cases contain some hard-coded lines specific to producing the image, so modify them according to your needs.

References and Acknowledgements

@article{song2020denoising,
  title={Denoising Diffusion Implicit Models},
  author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
  journal={arXiv:2010.02502},
  year={2020},
  month={October},
  abbr={Preprint},
  url={https://arxiv.org/abs/2010.02502}
}

This implementation is based on / inspired by: