Diffusion-Models-pytorch

Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)

1,339

296

1,339

View on GitHub

Top Related Projects

stable-diffusion

71,028

A latent text-to-image diffusion model

denoising-diffusion-pytorch

9,537

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

diffusion

4,512

Denoising Diffusion Probabilistic Models

ddim

1,662

Denoising Diffusion Implicit Models

Quick Overview

The dome272/Diffusion-Models-pytorch repository is an implementation of diffusion models in PyTorch. It provides a simple and educational codebase for understanding and experimenting with diffusion models, which are a class of generative models that have gained popularity in recent years for their high-quality image generation capabilities.

Pros

Clear and concise implementation of diffusion models
Educational resource for understanding diffusion model concepts
Includes both training and sampling scripts
Built with PyTorch, a popular deep learning framework

Cons

Limited documentation and explanations
May not include the latest optimizations or techniques in diffusion models
Focused on a specific dataset (MNIST), which may limit its applicability
Not actively maintained (last commit over a year ago)

Code Examples

Defining the UNet model architecture:

class UNet(nn.Module):
    def __init__(self, c_in=1, c_out=1, time_dim=256):
        super().__init__()
        self.time_dim = time_dim
        self.inc = DoubleConv(c_in, 64)
        self.down1 = Down(64, 128)
        self.sa1 = SelfAttention(128, 32)
        self.down2 = Down(128, 256)
        self.sa2 = SelfAttention(256, 16)
        self.down3 = Down(256, 256)
        self.sa3 = SelfAttention(256, 8)

        self.bot1 = DoubleConv(256, 512)
        self.bot2 = DoubleConv(512, 512)
        self.bot3 = DoubleConv(512, 256)

        self.up1 = Up(512, 128)
        self.sa4 = SelfAttention(128, 16)
        self.up2 = Up(256, 64)
        self.sa5 = SelfAttention(64, 32)
        self.up3 = Up(128, 64)
        self.sa6 = SelfAttention(64, 64)
        self.outc = nn.Conv2d(64, c_out, kernel_size=1)

Implementing the forward pass of the diffusion model:

def forward(self, x, t):
    t = self.pos_encoding(t)
    x1 = self.inc(x)
    x2 = self.down1(x1, t)
    x2 = self.sa1(x2)
    x3 = self.down2(x2, t)
    x3 = self.sa2(x3)
    x4 = self.down3(x3, t)
    x4 = self.sa3(x4)

    x4 = self.bot1(x4)
    x4 = self.bot2(x4)
    x4 = self.bot3(x4)

    x = self.up1(x4, x3, t)
    x = self.sa4(x)
    x = self.up2(x, x2, t)
    x = self.sa5(x)
    x = self.up3(x, x1, t)
    x = self.sa6(x)
    output = self.outc(x)
    return output

Training loop for the diffusion model:

for epoch in range(args.epochs):
    for step, batch in enumerate(dataloader):
        optimizer.zero_grad()

        t = torch.randint(0, T, (BATCH_SIZE,), device=device).long()
        loss = diffusion.p_losses(model, batch[0], t, loss_type="huber")
        
        loss.backward()
        optimizer.step()

        if epoch % 10 == 0 and step == 0:
            print(f"Epoch {epoch} | step {step:03d} Loss: {loss.item()} ")
            sample_plot_image()

Getting Starte

Competitor Comparisons

stable-diffusion

71,028

A latent text-to-image diffusion model

Pros of stable-diffusion

More advanced and feature-rich, capable of generating high-quality images
Actively maintained with regular updates and improvements
Extensive documentation and community support

Cons of stable-diffusion

Higher computational requirements and more complex setup
Steeper learning curve for beginners
Larger codebase, which may be overwhelming for some users

Code Comparison

Diffusion-Models-pytorch:

def forward(self, x, t):
    t = t.unsqueeze(-1).type(torch.float)
    t = self.time_mlp(t)
    
    h = self.init_conv(x)
    for block in self.down_blocks:
        h = block(h, t)

stable-diffusion:

def forward(self, x, t, context=None):
    t_emb = timestep_embedding(t, self.model_channels, repeat_only=False)
    emb = self.time_embed(t_emb)

    h = x.type(self.dtype)
    for module in self.input_blocks:
        h = module(h, emb, context)

Both repositories implement diffusion models, but stable-diffusion offers a more sophisticated approach with additional features and flexibility. While Diffusion-Models-pytorch is simpler and easier to understand, stable-diffusion provides a more powerful and production-ready solution for image generation tasks.

denoising-diffusion-pytorch

9,537

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Pros of denoising-diffusion-pytorch

More comprehensive implementation with additional features like attention mechanisms and advanced scheduling
Better code organization with modular architecture
Includes support for various image sizes and flexible configuration options

Cons of denoising-diffusion-pytorch

Higher complexity, which may be challenging for beginners
Requires more computational resources due to advanced features
Less focused on educational purposes compared to Diffusion-Models-pytorch

Code Comparison

Diffusion-Models-pytorch:

def forward(self, x, t):
    t = t.unsqueeze(-1).type(torch.float)
    return self.linear(x)

denoising-diffusion-pytorch:

def forward(self, x, time):
    t = self.time_mlp(time)
    h = self.init_conv(x)
    for block1, block2, attn, downsample in self.downs:
        h = block1(h, t)
        h = block2(h, t)
        h = attn(h)
        if downsample:
            h = downsample(h)
    return h

The code comparison shows that denoising-diffusion-pytorch has a more complex forward pass with additional components like time embedding, attention, and downsampling, while Diffusion-Models-pytorch has a simpler linear transformation.

guided-diffusion

6,865

Pros of guided-diffusion

More advanced and feature-rich implementation of diffusion models
Includes classifier guidance and other advanced techniques
Better documentation and code organization

Cons of guided-diffusion

More complex and potentially harder to understand for beginners
Requires more computational resources due to its advanced features

Code Comparison

Diffusion-Models-pytorch:

def p_sample_loop(self, x, continous=False):
    device = self.betas.device
    sample = x.to(device)
    if not continous:
        for i in reversed(range(0, self.n_T)):
            sample = self.p_sample(sample, torch.full((1,), i, device=device, dtype=torch.long))
    else:
        for i in reversed(range(0, self.n_T)):
            sample = self.p_sample(sample, torch.full((1,), i, device=device, dtype=torch.long))
            if i % self.log_every_t == 0 or i == self.n_T-1:
                yield sample
    return sample

guided-diffusion:

def p_sample_loop(
    self,
    model,
    shape,
    noise=None,
    clip_denoised=True,
    denoised_fn=None,
    cond_fn=None,
    model_kwargs=None,
    device=None,
    progress=False,
):
    final = None
    for sample in self.p_sample_loop_progressive(
        model,
        shape,
        noise=noise,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        cond_fn=cond_fn,
        model_kwargs=model_kwargs,
        device=device,
        progress=progress,
    ):
        final = sample
    return final["sample"]

diffusion

4,512

Denoising Diffusion Probabilistic Models

Pros of diffusion

More comprehensive implementation with additional features like sampling and evaluation
Better documentation and code organization
Supports multiple architectures and datasets

Cons of diffusion

More complex codebase, potentially harder for beginners to understand
Less focused on educational purposes compared to Diffusion-Models-pytorch

Code Comparison

diffusion:

def p_sample(self, model, x, t, clip_denoised=True, denoised_fn=None, cond_fn=None, model_kwargs=None):
    out = self.p_mean_variance(
        model,
        x,
        t,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        model_kwargs=model_kwargs,
    )

Diffusion-Models-pytorch:

def p_sample(self, model, x, t, t_index):
    betas_t = extract(self.betas, t, x.shape)
    sqrt_one_minus_alphas_cumprod_t = extract(
        self.sqrt_one_minus_alphas_cumprod, t, x.shape
    )
    sqrt_recip_alphas_t = extract(self.sqrt_recip_alphas, t, x.shape)

The diffusion repository offers a more flexible implementation with additional parameters and options, while Diffusion-Models-pytorch provides a simpler, more straightforward approach. This reflects the overall differences in complexity and focus between the two projects.

ddim

1,662

Denoising Diffusion Implicit Models

Pros of ddim

More comprehensive implementation of diffusion models, including DDIM sampling
Better documentation and examples for various use cases
Supports both unconditional and conditional generation

Cons of ddim

Less beginner-friendly, with a steeper learning curve
Requires more computational resources due to its complexity

Code Comparison

ddim:

def p_sample_ddim(self, x, t, clip_denoised=True, condition_x=None):
    out = self.p_mean_variance(x, t, clip_denoised, condition_x)
    noise = torch.randn_like(x)
    nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
    return out["mean"] + nonzero_mask * torch.exp(0.5 * out["log_variance"]) * noise

Diffusion-Models-pytorch:

def p_sample(self, x, t, t_index):
    betas_t = extract(self.betas, t, x.shape)
    sqrt_one_minus_alphas_cumprod_t = extract(
        self.sqrt_one_minus_alphas_cumprod, t, x.shape
    )
    sqrt_recip_alphas_t = extract(self.sqrt_recip_alphas, t, x.shape)
    
    model_mean = sqrt_recip_alphas_t * (
        x - betas_t * self.model(x, t) / sqrt_one_minus_alphas_cumprod_t
    )
    
    if t_index == 0:
        return model_mean
    else:
        posterior_variance_t = extract(self.posterior_variance, t, x.shape)
        noise = torch.randn_like(x)
        return model_mean + torch.sqrt(posterior_variance_t) * noise

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Diffusion Models

This is an easy-to-understand implementation of diffusion models within 100 lines of code. Different from other implementations, this code doesn't use the lower-bound formulation for sampling and strictly follows Algorithm 1 from the DDPM paper, which makes it extremely short and easy to follow. There are two implementations: conditional and unconditional. Furthermore, the conditional code also implements Classifier-Free-Guidance (CFG) and Exponential-Moving-Average (EMA). Below you can find two explanation videos for the theory behind diffusion models and the implementation.

Train a Diffusion Model on your own data:

Unconditional Training

(optional) Configure Hyperparameters in ddpm.py
Set path to dataset in ddpm.py
python ddpm.py

Conditional Training

(optional) Configure Hyperparameters in ddpm_conditional.py
Set path to dataset in ddpm_conditional.py
python ddpm_conditional.py

Sampling

The following examples show how to sample images using the models trained in the video on the Landscape Dataset. You can download the checkpoints for the models here.

Unconditional Model

    device = "cuda"
    model = UNet().to(device)
    ckpt = torch.load("unconditional_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    x = diffusion.sample(model, n=16)
    plot_images(x)

Conditional Model

This model was trained on CIFAR-10 64x64 with 10 classes airplane:0, auto:1, bird:2, cat:3, deer:4, dog:5, frog:6, horse:7, ship:8, truck:9

    n = 10
    device = "cuda"
    model = UNet_conditional(num_classes=10).to(device)
    ckpt = torch.load("conditional_ema_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    y = torch.Tensor([6] * n).long().to(device)
    x = diffusion.sample(model, n, y, cfg_scale=3)
    plot_images(x)

A more advanced version of this code can be found here by @tcapelle. It introduces better logging, faster & more efficient training and other nice features and is also being followed by a nice write-up.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot