Convert Figma logo to code with AI

lucidrains logodenoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

8,551
1,056
8,551
144

Top Related Projects

A latent text-to-image diffusion model

Denoising Diffusion Probabilistic Models

1,504

Denoising Diffusion Implicit Models

Karras et al. (2022) diffusion models for PyTorch

Quick Overview

The lucidrains/denoising-diffusion-pytorch repository is a PyTorch implementation of the Denoising Diffusion Probabilistic Model (DDPM), a powerful generative model that can be used for tasks such as image generation, text generation, and more. The model is based on the paper "Denoising Diffusion Probabilistic Models" by Jonathan Ho, Ajay Jain, and Pieter Abbeel.

Pros

  • Flexible and Extensible: The codebase is well-structured and modular, making it easy to extend and customize the model for different applications.
  • State-of-the-art Performance: The DDPM model has been shown to achieve state-of-the-art results on a variety of generative tasks, including image generation and text generation.
  • Active Development and Community: The repository is actively maintained, and the project has a growing community of contributors and users.
  • Comprehensive Documentation: The repository includes detailed documentation, including tutorials and examples, making it easier for users to get started with the project.

Cons

  • Computational Complexity: The DDPM model can be computationally intensive, especially for large-scale tasks, which may limit its applicability in certain real-time or resource-constrained environments.
  • Hyperparameter Tuning: The performance of the DDPM model can be sensitive to the choice of hyperparameters, which may require extensive experimentation and tuning to achieve optimal results.
  • Limited Support for Specific Tasks: While the DDPM model is a general-purpose generative model, it may not be the best choice for certain specialized tasks, such as high-resolution image generation or text generation with specific stylistic requirements.
  • Steep Learning Curve: Fully understanding and effectively using the DDPM model may require a significant amount of background knowledge in machine learning and generative modeling, which could be a barrier for some users.

Code Examples

Here are a few code examples from the lucidrains/denoising-diffusion-pytorch repository:

  1. Defining the DDPM Model:
from denoising_diffusion_pytorch import Unet, GaussianDiffusion

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    channels = 3
)

diffusion = GaussianDiffusion(
    model,
    image_size = 64,
    timesteps = 1000,
    loss_type = 'l1'
)

This code defines a DDPM model using the Unet and GaussianDiffusion classes from the library.

  1. Training the DDPM Model:
import torch
from torch.optim import Adam

optimizer = Adam(model.parameters(), lr=3e-4)

for epoch in range(100):
    loss = diffusion.train_step(train_dataset, optimizer)
    print(f'Epoch {epoch}, Loss: {loss.item()}')

This code shows how to train the DDPM model using the train_step method and an Adam optimizer.

  1. Generating Samples:
samples = diffusion.sample(batch_size=16)

This code generates 16 samples from the trained DDPM model.

  1. Evaluating the Model:
fid_score = diffusion.calc_fid(test_dataset)
print(f'FID Score: {fid_score}')

This code calculates the Fréchet Inception Distance (FID) score on a test dataset to evaluate the performance of the trained DDPM model.

Getting Started

To get started with the lucidrains/denoising-diffusion-pytorch repository, follow these steps:

  1. Clone the repository:
git clone https://github.com/lucidrains/denoising-diffusion-pytorch.git
  1. Install the required dependencies:
cd denoising-diffusion-pytorch
pip install -r requirements.txt
  1. Prepare your dataset:
    • The repository includes support for various datasets, such as CIFAR-10 and ImageNet. You can use the provided dataset

Competitor Comparisons

A latent text-to-image diffusion model

Pros of stable-diffusion

  • More advanced and feature-rich, offering state-of-the-art image generation capabilities
  • Includes pre-trained models and extensive documentation for easier implementation
  • Supports various conditioning methods, including text-to-image generation

Cons of stable-diffusion

  • Higher computational requirements and more complex setup process
  • Less flexible for customization and experimentation with core diffusion algorithms
  • Larger codebase, which may be more challenging to understand and modify

Code Comparison

denoising-diffusion-pytorch:

def p_losses(x_start, t, noise=None):
    noise = default(noise, lambda: torch.randn_like(x_start))
    x_noisy = q_sample(x_start=x_start, t=t, noise=noise)
    model_out = model(x_noisy, t)
    return F.mse_loss(model_out, noise)

stable-diffusion:

def p_losses(self, x_start, cond, t, noise=None):
    noise = default(noise, lambda: torch.randn_like(x_start))
    x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)
    model_output = self.apply_model(x_noisy, t, cond)
    loss_dict = {}
    prefix = 'train' if self.training else 'val'
    loss_dict[f'{prefix}/loss_simple'] = self.get_loss(model_output, noise)
    return loss_dict

Pros of guided-diffusion

  • More comprehensive implementation with additional features like classifier guidance
  • Better documentation and explanations of the underlying concepts
  • Includes pre-trained models and evaluation scripts

Cons of guided-diffusion

  • Less modular and harder to integrate into existing projects
  • Requires more computational resources due to its complexity
  • Steeper learning curve for beginners

Code Comparison

denoising-diffusion-pytorch:

def p_losses(x_start, t, noise=None):
    noise = default(noise, lambda: torch.randn_like(x_start))
    x_noisy = q_sample(x_start=x_start, t=t, noise=noise)
    model_out = model(x_noisy, t)
    return F.mse_loss(model_out, noise)

guided-diffusion:

def p_losses(model, x_start, t, noise=None):
    if noise is None:
        noise = th.randn_like(x_start)
    x_t = q_sample(x_start, t, noise=noise)
    model_output = model(x_t, self._scale_timesteps(t))
    return mean_flat((noise - model_output)**2)

Both implementations focus on calculating the loss for the diffusion process, but guided-diffusion includes additional scaling and flattening operations, potentially offering more flexibility in model training.

Denoising Diffusion Probabilistic Models

Pros of diffusion

  • Implements multiple diffusion models, including DDPM and DDIM
  • Provides extensive configuration options and hyperparameters
  • Includes pre-trained models and evaluation scripts

Cons of diffusion

  • Less actively maintained (last update over 2 years ago)
  • More complex codebase, potentially harder for beginners to understand
  • Fewer examples and documentation compared to denoising-diffusion-pytorch

Code Comparison

diffusion:

def p_sample(self, model, x, t, clip_denoised=True, repeat_noise=False):
    b, *_, device = *x.shape, x.device
    model_mean, _, model_log_variance = self.p_mean_variance(
        model, x, t, clip_denoised=clip_denoised)
    noise = noise_like(x.shape, device, repeat_noise)
    # no noise when t == 0
    nonzero_mask = (1 - (t == 0).float()).reshape(b, *((1,) * (len(x.shape) - 1)))
    return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise

denoising-diffusion-pytorch:

@torch.no_grad()
def p_sample(self, x, t, clip_denoised = True):
    model_mean, _, model_log_variance = self.p_mean_variance(x = x, t = t)
    noise = torch.randn_like(x) if t > 0 else 0.
    return model_mean + (0.5 * model_log_variance).exp() * noise

Both implementations follow similar logic, but diffusion's version includes additional parameters and handling for edge cases.

1,504

Denoising Diffusion Implicit Models

Pros of DDIM

  • DDIM (Denoising Diffusion Implicit Models) provides a more efficient sampling process compared to the original denoising diffusion probabilistic models (DDPM), reducing the number of steps required for image generation.
  • DDIM offers a more flexible and controllable sampling process, allowing for better manipulation of the generated images.
  • The DDIM implementation in the ermongroup/ddim repository includes a well-documented and easy-to-use codebase, making it accessible for researchers and developers.

Cons of DDIM

  • The DDIM repository may have a smaller community and fewer contributions compared to the more popular lucidrains/denoising-diffusion-pytorch repository.
  • The DDIM implementation may not have the same level of feature support or pre-trained models as the lucidrains/denoising-diffusion-pytorch repository.
  • The performance and quality of the generated images may not be as consistently high as the lucidrains/denoising-diffusion-pytorch repository, depending on the specific use case and dataset.

Code Comparison

Here's a brief code comparison between the two repositories:

lucidrains/denoising-diffusion-pytorch:

from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    channels = 3
)

diffusion = GaussianDiffusion(
    model,
    image_size = 64,
    timesteps = 1000,
    loss_type = 'l1'
)

trainer = Trainer(
    diffusion,
    'path/to/dataset',
    train_batch_size = 32,
    train_lr = 2e-5,
    train_num_steps = 100000
)

trainer.train()

ermongroup/ddim:

from ddim.unet import UNet
from ddim.ddim_sampler import DDIMSampler

model = UNet(
    in_channels=3,
    out_channels=3,
    model_channels=128,
    num_res_blocks=2,
    attention_resolutions=(16,),
    dropout=0.1,
    channel_mult=(1, 2, 4, 4)
)

sampler = DDIMSampler(model)
samples = sampler.sample(batch_size=4, image_size=64, num_steps=50)

Karras et al. (2022) diffusion models for PyTorch

Pros of k-diffusion

  • More advanced sampling techniques, including ancestral sampling and SDE solvers
  • Better support for conditional generation and guidance
  • More flexible architecture allowing easier customization

Cons of k-diffusion

  • Less beginner-friendly, with a steeper learning curve
  • Fewer examples and tutorials available
  • Less active community support compared to denoising-diffusion-pytorch

Code Comparison

k-diffusion:

model = diffusion.DiffusionModel(
    unet, sigma_data=1.0, sigma_min=0.02, sigma_max=80.0, rho=7.0
)
x = diffusion.sample(model, (1, 3, 64, 64), steps=20, method='euler')

denoising-diffusion-pytorch:

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
)
diffusion = GaussianDiffusion(
    model,
    image_size = 64,
    timesteps = 1000
)
sampled_images = diffusion.sample(batch_size = 4)

Both repositories provide implementations of diffusion models, but k-diffusion offers more advanced features and flexibility at the cost of complexity. denoising-diffusion-pytorch is more straightforward and easier to use for beginners, but may lack some of the more advanced capabilities found in k-diffusion.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Denoising Diffusion Probabilistic Model, in Pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to generative modeling that may have the potential to rival GANs. It uses denoising score matching to estimate the gradient of the data distribution, followed by Langevin sampling to sample from the true distribution.

This implementation was inspired by the official Tensorflow version here

Youtube AI Educators - Yannic Kilcher | AI Coffeebreak with Letitia | Outlier

Flax implementation from YiYi Xu

Annotated code by Research Scientists / Engineers from 🤗 Huggingface

Update: Turns out none of the technicalities really matters at all | "Cold Diffusion" paper | Muse

PyPI version

Install

$ pip install denoising_diffusion_pytorch

Usage

import torch
from denoising_diffusion_pytorch import Unet, GaussianDiffusion

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True
)

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000    # number of steps
)

training_images = torch.rand(8, 3, 128, 128) # images are normalized from 0 to 1
loss = diffusion(training_images)
loss.backward()

# after a lot of training

sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)

Or, if you simply want to pass in a folder name and the desired image dimensions, you can use the Trainer class to easily train a model.

from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True
)

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000,           # number of steps
    sampling_timesteps = 250    # number of sampling timesteps (using ddim for faster inference [see citation for ddim paper])
)

trainer = Trainer(
    diffusion,
    'path/to/your/images',
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    amp = True,                       # turn on mixed precision
    calculate_fid = True              # whether to calculate fid during training
)

trainer.train()

Samples and model checkpoints will be logged to ./results periodically

Multi-GPU Training

The Trainer class is now equipped with 🤗 Accelerator. You can easily do multi-gpu training in two steps using their accelerate CLI

At the project root directory, where the training script is, run

$ accelerate config

Then, in the same directory

$ accelerate launch train.py

Miscellaneous

1D Sequence

By popular request, a 1D Unet + Gaussian Diffusion implementation.

import torch
from denoising_diffusion_pytorch import Unet1D, GaussianDiffusion1D, Trainer1D, Dataset1D

model = Unet1D(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    channels = 32
)

diffusion = GaussianDiffusion1D(
    model,
    seq_length = 128,
    timesteps = 1000,
    objective = 'pred_v'
)

training_seq = torch.rand(64, 32, 128) # features are normalized from 0 to 1

loss = diffusion(training_seq)
loss.backward()

# Or using trainer

dataset = Dataset1D(training_seq)  # this is just an example, but you can formulate your own Dataset and pass it into the `Trainer1D` below

trainer = Trainer1D(
    diffusion,
    dataset = dataset,
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    amp = True,                       # turn on mixed precision
)
trainer.train()

# after a lot of training

sampled_seq = diffusion.sample(batch_size = 4)
sampled_seq.shape # (4, 32, 128)

Trainer1D does not evaluate the generated samples in any way since the type of data is not known.

You could consider adding a suitable metric to the training loop yourself after doing an editable install of this package pip install -e ..

Citations

@inproceedings{NEURIPS2020_4c5bcfec,
    author      = {Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
    booktitle   = {Advances in Neural Information Processing Systems},
    editor      = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages       = {6840--6851},
    publisher   = {Curran Associates, Inc.},
    title       = {Denoising Diffusion Probabilistic Models},
    url         = {https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf},
    volume      = {33},
    year        = {2020}
}
@InProceedings{pmlr-v139-nichol21a,
    title       = {Improved Denoising Diffusion Probabilistic Models},
    author      = {Nichol, Alexander Quinn and Dhariwal, Prafulla},
    booktitle   = {Proceedings of the 38th International Conference on Machine Learning},
    pages       = {8162--8171},
    year        = {2021},
    editor      = {Meila, Marina and Zhang, Tong},
    volume      = {139},
    series      = {Proceedings of Machine Learning Research},
    month       = {18--24 Jul},
    publisher   = {PMLR},
    pdf         = {http://proceedings.mlr.press/v139/nichol21a/nichol21a.pdf},
    url         = {https://proceedings.mlr.press/v139/nichol21a.html},
}
@inproceedings{kingma2021on,
    title       = {On Density Estimation with Diffusion Models},
    author      = {Diederik P Kingma and Tim Salimans and Ben Poole and Jonathan Ho},
    booktitle   = {Advances in Neural Information Processing Systems},
    editor      = {A. Beygelzimer and Y. Dauphin and P. Liang and J. Wortman Vaughan},
    year        = {2021},
    url         = {https://openreview.net/forum?id=2LdBqxc1Yv}
}
@article{Karras2022ElucidatingTD,
    title   = {Elucidating the Design Space of Diffusion-Based Generative Models},
    author  = {Tero Karras and Miika Aittala and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2206.00364}
}
@article{Song2021DenoisingDI,
    title   = {Denoising Diffusion Implicit Models},
    author  = {Jiaming Song and Chenlin Meng and Stefano Ermon},
    journal = {ArXiv},
    year    = {2021},
    volume  = {abs/2010.02502}
}
@misc{chen2022analog,
    title   = {Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning},
    author  = {Ting Chen and Ruixiang Zhang and Geoffrey Hinton},
    year    = {2022},
    eprint  = {2208.04202},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}
@article{Salimans2022ProgressiveDF,
    title   = {Progressive Distillation for Fast Sampling of Diffusion Models},
    author  = {Tim Salimans and Jonathan Ho},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2202.00512}
}
@article{Ho2022ClassifierFreeDG,
    title   = {Classifier-Free Diffusion Guidance},
    author  = {Jonathan Ho},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2207.12598}
}
@article{Sunkara2022NoMS,
    title   = {No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects},
    author  = {Raja Sunkara and Tie Luo},
    journal = {ArXiv},
    year    = {2022},
    volume  = {abs/2208.03641}
}
@inproceedings{Jabri2022ScalableAC,
    title   = {Scalable Adaptive Computation for Iterative Generation},
    author  = {A. Jabri and David J. Fleet and Ting Chen},
    year    = {2022}
}
@article{Cheng2022DPMSolverPlusPlus,
    title   = {DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models},
    author  = {Cheng Lu and Yuhao Zhou and Fan Bao and Jianfei Chen and Chongxuan Li and Jun Zhu},
    journal = {NeuRips 2022 Oral},
    year    = {2022},
    volume  = {abs/2211.01095}
}
@inproceedings{Hoogeboom2023simpleDE,
    title   = {simple diffusion: End-to-end diffusion for high resolution images},
    author  = {Emiel Hoogeboom and Jonathan Heek and Tim Salimans},
    year    = {2023}
}
@misc{https://doi.org/10.48550/arxiv.2302.01327,
    doi     = {10.48550/ARXIV.2302.01327},
    url     = {https://arxiv.org/abs/2302.01327},
    author  = {Kumar, Manoj and Dehghani, Mostafa and Houlsby, Neil},
    title   = {Dual PatchNorm},
    publisher = {arXiv},
    year    = {2023},
    copyright = {Creative Commons Attribution 4.0 International}
}
@inproceedings{Hang2023EfficientDT,
    title   = {Efficient Diffusion Training via Min-SNR Weighting Strategy},
    author  = {Tiankai Hang and Shuyang Gu and Chen Li and Jianmin Bao and Dong Chen and Han Hu and Xin Geng and Baining Guo},
    year    = {2023}
}
@misc{Guttenberg2023,
    author  = {Nicholas Guttenberg},
    url     = {https://www.crosslabs.org/blog/diffusion-with-offset-noise}
}
@inproceedings{Lin2023CommonDN,
    title   = {Common Diffusion Noise Schedules and Sample Steps are Flawed},
    author  = {Shanchuan Lin and Bingchen Liu and Jiashi Li and Xiao Yang},
    year    = {2023}
}
@inproceedings{dao2022flashattention,
    title   = {Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
    author  = {Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
    booktitle = {Advances in Neural Information Processing Systems},
    year    = {2022}
}
@article{Bondarenko2023QuantizableTR,
    title   = {Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing},
    author  = {Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2306.12929},
    url     = {https://api.semanticscholar.org/CorpusID:259224568}
}
@article{Karras2023AnalyzingAI,
    title   = {Analyzing and Improving the Training Dynamics of Diffusion Models},
    author  = {Tero Karras and Miika Aittala and Jaakko Lehtinen and Janne Hellsten and Timo Aila and Samuli Laine},
    journal = {ArXiv},
    year    = {2023},
    volume  = {abs/2312.02696},
    url     = {https://api.semanticscholar.org/CorpusID:265659032}
}
@article{Li2024ImmiscibleDA,
    title   = {Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment},
    author  = {Yiheng Li and Heyang Jiang and Akio Kodaira and Masayoshi Tomizuka and Kurt Keutzer and Chenfeng Xu},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2406.12303},
    url     = {https://api.semanticscholar.org/CorpusID:270562607}
}
@article{Chung2024CFGMC,
    title   = {CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models},
    author  = {Hyungjin Chung and Jeongsol Kim and Geon Yeong Park and Hyelin Nam and Jong Chul Ye},
    journal = {ArXiv},
    year    = {2024},
    volume  = {abs/2406.08070},
    url     = {https://api.semanticscholar.org/CorpusID:270391454}
}
@inproceedings{Sadat2024EliminatingOA,
    title   = {Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models},
    author  = {Seyedmorteza Sadat and Otmar Hilliges and Romann M. Weber},
    year    = {2024},
    url     = {https://api.semanticscholar.org/CorpusID:273098845}
}