Top Related Projects
A latent text-to-image diffusion model
Denoising Diffusion Implicit Models
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Karras et al. (2022) diffusion models for PyTorch
Quick Overview
The hojonathanho/diffusion repository is a PyTorch implementation of denoising diffusion probabilistic models (DDPMs) for image generation. It provides a framework for training and sampling from diffusion models, which have shown impressive results in generating high-quality images.
Pros
- Implements state-of-the-art diffusion models for image generation
- Provides a flexible and extensible codebase for experimenting with diffusion models
- Includes pre-trained models and example scripts for quick start and experimentation
- Supports various image datasets and model architectures
Cons
- Limited documentation and explanations of the underlying concepts
- May require significant computational resources for training large models
- Lacks extensive hyperparameter tuning guidelines
- Not actively maintained (last commit was over a year ago at the time of writing)
Code Examples
- Loading a pre-trained model and generating samples:
import torch
from models import UNet
from diffusion import GaussianDiffusion
model = UNet(
dim=64,
dim_mults=(1, 2, 4, 8)
)
diffusion = GaussianDiffusion(
model,
image_size=32,
timesteps=1000,
loss_type='l1'
)
ckpt = torch.load('path/to/checkpoint.pt')
diffusion.load_state_dict(ckpt['model'])
samples = diffusion.sample(batch_size=16)
- Training a new diffusion model:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
transform = transforms.Compose([
transforms.Resize(32),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in dataloader:
loss = diffusion(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
- Customizing the noise schedule:
from diffusion import GaussianDiffusion, cosine_beta_schedule
custom_betas = cosine_beta_schedule(timesteps=1000)
diffusion = GaussianDiffusion(
model,
image_size=32,
timesteps=1000,
loss_type='l1',
betas=custom_betas
)
Getting Started
-
Clone the repository:
git clone https://github.com/hojonathanho/diffusion.git cd diffusion
-
Install dependencies:
pip install -r requirements.txt
-
Train a model or use a pre-trained one:
from models import UNet from diffusion import GaussianDiffusion model = UNet(dim=64, dim_mults=(1, 2, 4, 8)) diffusion = GaussianDiffusion(model, image_size=32, timesteps=1000) # Train the model or load a pre-trained checkpoint # Generate samples samples = diffusion.sample(batch_size=16)
Competitor Comparisons
A latent text-to-image diffusion model
Pros of stable-diffusion
- More advanced and widely adopted in the AI image generation community
- Offers pre-trained models and easier integration with various applications
- Extensive documentation and community support
Cons of stable-diffusion
- Larger model size and higher computational requirements
- More complex architecture, potentially harder for beginners to understand and modify
Code Comparison
diffusion:
def p_sample(self, x, t, clip_denoised=True):
out = self.p_mean_variance(x, t, clip_denoised=clip_denoised)
noise = torch.randn_like(x)
nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
return out["mean"] + nonzero_mask * torch.exp(0.5 * out["log_variance"]) * noise
stable-diffusion:
@torch.no_grad()
def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None, old_eps=None, t_next=None):
b, *_, device = *x.shape, x.device
e_t = self.model.apply_model(x, t, c)
if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
e_t = e_t
else:
e_t = e_t * unconditional_guidance_scale + (1. - unconditional_guidance_scale) * self.model.apply_model(x, t, unconditional_conditioning)
Pros of guided-diffusion
- More comprehensive documentation and examples
- Supports a wider range of diffusion models and techniques
- Actively maintained with regular updates and contributions
Cons of guided-diffusion
- Higher computational requirements due to more complex models
- Steeper learning curve for beginners
- Less focus on specific image generation tasks
Code Comparison
guided-diffusion:
def create_model(
image_size,
num_channels,
num_res_blocks,
channel_mult="",
learn_sigma=False,
class_cond=False,
use_checkpoint=False,
attention_resolutions="16",
num_heads=1,
num_head_channels=-1,
num_heads_upsample=-1,
use_scale_shift_norm=False,
dropout=0,
resblock_updown=False,
use_fp16=False,
use_new_attention_order=False,
):
# ... (implementation details)
diffusion:
def create_model(
image_size,
num_channels,
num_res_blocks,
channel_mult="",
learn_sigma=False,
class_cond=False,
use_checkpoint=False,
attention_resolutions="16",
num_heads=1,
num_head_channels=-1,
num_heads_upsample=-1,
use_scale_shift_norm=False,
dropout=0,
):
# ... (implementation details)
The code comparison shows that guided-diffusion offers more parameters and options for model creation, allowing for greater flexibility and customization.
Denoising Diffusion Implicit Models
Pros of ddim
- Implements Denoising Diffusion Implicit Models (DDIM), offering faster sampling
- Provides a more comprehensive set of pre-trained models
- Includes additional utilities for image manipulation and processing
Cons of ddim
- Less active development and maintenance compared to diffusion
- Fewer options for customizing the diffusion process
- Limited documentation on advanced usage and model fine-tuning
Code Comparison
diffusion:
def p_sample_loop(model, shape, noise=None, clip_denoised=True, denoised_fn=None,
model_kwargs=None, device=None, progress=False):
final = None
for sample in p_sample_loop_progressive(
model,
shape,
noise=noise,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
model_kwargs=model_kwargs,
device=device,
progress=progress,
):
final = sample
return final["sample"]
ddim:
def p_sample_ddim(model, x, t, clip_denoised=True, denoised_fn=None, model_kwargs=None):
out = p_mean_variance(
model,
x,
t,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
model_kwargs=model_kwargs,
)
noise = th.randn_like(x)
nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
return {"sample": sample, "pred_xstart": out["pred_xstart"]}
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Pros of denoising-diffusion-pytorch
- More comprehensive implementation with additional features like conditional generation and image inpainting
- Better documentation and code organization, making it easier for users to understand and modify
- Active development and maintenance, with regular updates and improvements
Cons of denoising-diffusion-pytorch
- Higher complexity, which may be overwhelming for beginners or those seeking a simpler implementation
- Potentially slower execution due to additional features and abstractions
Code Comparison
diffusion:
def p_sample(self, x, t, clip_denoised=True):
t_batch = torch.full((x.shape[0],), t, device=x.device, dtype=torch.long)
noise_level = extract(self.sqrt_alphas_cumprod, t_batch, x.shape)
x_recon = self.predict_start_from_noise(x, t=t_batch, noise=self.denoise_fn(x, t_batch))
if clip_denoised:
x_recon.clamp_(-1., 1.)
denoising-diffusion-pytorch:
@torch.no_grad()
def p_sample(self, x, t, t_index):
betas_t = extract(self.betas, t, x.shape)
sqrt_one_minus_alphas_cumprod_t = extract(self.sqrt_one_minus_alphas_cumprod, t, x.shape)
sqrt_recip_alphas_t = extract(self.sqrt_recip_alphas, t, x.shape)
model_mean = sqrt_recip_alphas_t * (x - betas_t * self.model(x, t) / sqrt_one_minus_alphas_cumprod_t)
Karras et al. (2022) diffusion models for PyTorch
Pros of k-diffusion
- More flexible and customizable implementation
- Supports a wider range of sampling methods
- Better documentation and examples for usage
Cons of k-diffusion
- Less optimized for performance in some cases
- May require more setup and configuration
- Potentially steeper learning curve for beginners
Code Comparison
k-diffusion:
model = diffusion.get_model(config)
x = torch.randn(1, 3, 64, 64)
samples = diffusion.sample(model, x, steps=1000)
diffusion:
model = create_model(config)
noise = torch.randn(1, 3, 64, 64)
samples = p_sample_loop(model, noise, num_timesteps=1000)
Both repositories provide implementations of diffusion models, but k-diffusion offers more flexibility and customization options, while diffusion may be simpler to use out of the box. The code comparison shows that k-diffusion uses a more modular approach, separating the model creation and sampling steps, while diffusion combines these steps in a single function call.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel
Paper: https://arxiv.org/abs/2006.11239
Website: https://hojonathanho.github.io/diffusion
Experiments run on Google Cloud TPU v3-8.
Requires TensorFlow 1.15 and Python 3.5, and these dependencies for CPU instances (see requirements.txt
):
pip3 install fire
pip3 install scipy
pip3 install pillow
pip3 install tensorflow-probability==0.8
pip3 install tensorflow-gan==0.0.0.dev0
pip3 install tensorflow-datasets==2.1.0
The training and evaluation scripts are in the scripts/
subdirectory.
The commands to run training and evaluation are in comments at the top of the scripts.
Data is stored in GCS buckets. The scripts are written to assume that the bucket names are of the form gs://mybucketprefix-us-central1
; i.e. some prefix followed by the region.
The prefix should be passed into the scripts using the --bucket_name_prefix
flag.
Models and samples can be found at: https://www.dropbox.com/sh/pm6tn31da21yrx4/AABWKZnBzIROmDjGxpB6vn6Ja
Citation
If you find our work relevant to your research, please cite:
@article{ho2020denoising,
title={Denoising Diffusion Probabilistic Models},
author={Jonathan Ho and Ajay Jain and Pieter Abbeel},
year={2020},
journal={arXiv preprint arxiv:2006.11239}
}
Top Related Projects
A latent text-to-image diffusion model
Denoising Diffusion Implicit Models
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Karras et al. (2022) diffusion models for PyTorch
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot