Convert Figma logo to code with AI

dome272 logoDiffusion-Models-pytorch

Pytorch implementation of Diffusion Models (https://arxiv.org/pdf/2006.11239.pdf)

1,279
287
1,279
37

Top Related Projects

A latent text-to-image diffusion model

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Denoising Diffusion Probabilistic Models

1,595

Denoising Diffusion Implicit Models

Quick Overview

The dome272/Diffusion-Models-pytorch repository is an implementation of diffusion models in PyTorch. It provides a simple and educational codebase for understanding and experimenting with diffusion models, which are a class of generative models that have gained popularity in recent years for their high-quality image generation capabilities.

Pros

  • Clear and concise implementation of diffusion models
  • Educational resource for understanding diffusion model concepts
  • Includes both training and sampling scripts
  • Built with PyTorch, a popular deep learning framework

Cons

  • Limited documentation and explanations
  • May not include the latest optimizations or techniques in diffusion models
  • Focused on a specific dataset (MNIST), which may limit its applicability
  • Not actively maintained (last commit over a year ago)

Code Examples

  1. Defining the UNet model architecture:
class UNet(nn.Module):
    def __init__(self, c_in=1, c_out=1, time_dim=256):
        super().__init__()
        self.time_dim = time_dim
        self.inc = DoubleConv(c_in, 64)
        self.down1 = Down(64, 128)
        self.sa1 = SelfAttention(128, 32)
        self.down2 = Down(128, 256)
        self.sa2 = SelfAttention(256, 16)
        self.down3 = Down(256, 256)
        self.sa3 = SelfAttention(256, 8)

        self.bot1 = DoubleConv(256, 512)
        self.bot2 = DoubleConv(512, 512)
        self.bot3 = DoubleConv(512, 256)

        self.up1 = Up(512, 128)
        self.sa4 = SelfAttention(128, 16)
        self.up2 = Up(256, 64)
        self.sa5 = SelfAttention(64, 32)
        self.up3 = Up(128, 64)
        self.sa6 = SelfAttention(64, 64)
        self.outc = nn.Conv2d(64, c_out, kernel_size=1)
  1. Implementing the forward pass of the diffusion model:
def forward(self, x, t):
    t = self.pos_encoding(t)
    x1 = self.inc(x)
    x2 = self.down1(x1, t)
    x2 = self.sa1(x2)
    x3 = self.down2(x2, t)
    x3 = self.sa2(x3)
    x4 = self.down3(x3, t)
    x4 = self.sa3(x4)

    x4 = self.bot1(x4)
    x4 = self.bot2(x4)
    x4 = self.bot3(x4)

    x = self.up1(x4, x3, t)
    x = self.sa4(x)
    x = self.up2(x, x2, t)
    x = self.sa5(x)
    x = self.up3(x, x1, t)
    x = self.sa6(x)
    output = self.outc(x)
    return output
  1. Training loop for the diffusion model:
for epoch in range(args.epochs):
    for step, batch in enumerate(dataloader):
        optimizer.zero_grad()

        t = torch.randint(0, T, (BATCH_SIZE,), device=device).long()
        loss = diffusion.p_losses(model, batch[0], t, loss_type="huber")
        
        loss.backward()
        optimizer.step()

        if epoch % 10 == 0 and step == 0:
            print(f"Epoch {epoch} | step {step:03d} Loss: {loss.item()} ")
            sample_plot_image()

Getting Starte

Competitor Comparisons

A latent text-to-image diffusion model

Pros of stable-diffusion

  • More advanced and feature-rich, capable of generating high-quality images
  • Actively maintained with regular updates and improvements
  • Extensive documentation and community support

Cons of stable-diffusion

  • Higher computational requirements and more complex setup
  • Steeper learning curve for beginners
  • Larger codebase, which may be overwhelming for some users

Code Comparison

Diffusion-Models-pytorch:

def forward(self, x, t):
    t = t.unsqueeze(-1).type(torch.float)
    t = self.time_mlp(t)
    
    h = self.init_conv(x)
    for block in self.down_blocks:
        h = block(h, t)

stable-diffusion:

def forward(self, x, t, context=None):
    t_emb = timestep_embedding(t, self.model_channels, repeat_only=False)
    emb = self.time_embed(t_emb)

    h = x.type(self.dtype)
    for module in self.input_blocks:
        h = module(h, emb, context)

Both repositories implement diffusion models, but stable-diffusion offers a more sophisticated approach with additional features and flexibility. While Diffusion-Models-pytorch is simpler and easier to understand, stable-diffusion provides a more powerful and production-ready solution for image generation tasks.

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Pros of denoising-diffusion-pytorch

  • More comprehensive implementation with additional features like attention mechanisms and advanced scheduling
  • Better code organization with modular architecture
  • Includes support for various image sizes and flexible configuration options

Cons of denoising-diffusion-pytorch

  • Higher complexity, which may be challenging for beginners
  • Requires more computational resources due to advanced features
  • Less focused on educational purposes compared to Diffusion-Models-pytorch

Code Comparison

Diffusion-Models-pytorch:

def forward(self, x, t):
    t = t.unsqueeze(-1).type(torch.float)
    return self.linear(x)

denoising-diffusion-pytorch:

def forward(self, x, time):
    t = self.time_mlp(time)
    h = self.init_conv(x)
    for block1, block2, attn, downsample in self.downs:
        h = block1(h, t)
        h = block2(h, t)
        h = attn(h)
        if downsample:
            h = downsample(h)
    return h

The code comparison shows that denoising-diffusion-pytorch has a more complex forward pass with additional components like time embedding, attention, and downsampling, while Diffusion-Models-pytorch has a simpler linear transformation.

Pros of guided-diffusion

  • More advanced and feature-rich implementation of diffusion models
  • Includes classifier guidance and other advanced techniques
  • Better documentation and code organization

Cons of guided-diffusion

  • More complex and potentially harder to understand for beginners
  • Requires more computational resources due to its advanced features

Code Comparison

Diffusion-Models-pytorch:

def p_sample_loop(self, x, continous=False):
    device = self.betas.device
    sample = x.to(device)
    if not continous:
        for i in reversed(range(0, self.n_T)):
            sample = self.p_sample(sample, torch.full((1,), i, device=device, dtype=torch.long))
    else:
        for i in reversed(range(0, self.n_T)):
            sample = self.p_sample(sample, torch.full((1,), i, device=device, dtype=torch.long))
            if i % self.log_every_t == 0 or i == self.n_T-1:
                yield sample
    return sample

guided-diffusion:

def p_sample_loop(
    self,
    model,
    shape,
    noise=None,
    clip_denoised=True,
    denoised_fn=None,
    cond_fn=None,
    model_kwargs=None,
    device=None,
    progress=False,
):
    final = None
    for sample in self.p_sample_loop_progressive(
        model,
        shape,
        noise=noise,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        cond_fn=cond_fn,
        model_kwargs=model_kwargs,
        device=device,
        progress=progress,
    ):
        final = sample
    return final["sample"]

Denoising Diffusion Probabilistic Models

Pros of diffusion

  • More comprehensive implementation with additional features like sampling and evaluation
  • Better documentation and code organization
  • Supports multiple architectures and datasets

Cons of diffusion

  • More complex codebase, potentially harder for beginners to understand
  • Less focused on educational purposes compared to Diffusion-Models-pytorch

Code Comparison

diffusion:

def p_sample(self, model, x, t, clip_denoised=True, denoised_fn=None, cond_fn=None, model_kwargs=None):
    out = self.p_mean_variance(
        model,
        x,
        t,
        clip_denoised=clip_denoised,
        denoised_fn=denoised_fn,
        model_kwargs=model_kwargs,
    )

Diffusion-Models-pytorch:

def p_sample(self, model, x, t, t_index):
    betas_t = extract(self.betas, t, x.shape)
    sqrt_one_minus_alphas_cumprod_t = extract(
        self.sqrt_one_minus_alphas_cumprod, t, x.shape
    )
    sqrt_recip_alphas_t = extract(self.sqrt_recip_alphas, t, x.shape)

The diffusion repository offers a more flexible implementation with additional parameters and options, while Diffusion-Models-pytorch provides a simpler, more straightforward approach. This reflects the overall differences in complexity and focus between the two projects.

1,595

Denoising Diffusion Implicit Models

Pros of ddim

  • More comprehensive implementation of diffusion models, including DDIM sampling
  • Better documentation and examples for various use cases
  • Supports both unconditional and conditional generation

Cons of ddim

  • Less beginner-friendly, with a steeper learning curve
  • Requires more computational resources due to its complexity

Code Comparison

ddim:

def p_sample_ddim(self, x, t, clip_denoised=True, condition_x=None):
    out = self.p_mean_variance(x, t, clip_denoised, condition_x)
    noise = torch.randn_like(x)
    nonzero_mask = (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
    return out["mean"] + nonzero_mask * torch.exp(0.5 * out["log_variance"]) * noise

Diffusion-Models-pytorch:

def p_sample(self, x, t, t_index):
    betas_t = extract(self.betas, t, x.shape)
    sqrt_one_minus_alphas_cumprod_t = extract(
        self.sqrt_one_minus_alphas_cumprod, t, x.shape
    )
    sqrt_recip_alphas_t = extract(self.sqrt_recip_alphas, t, x.shape)
    
    model_mean = sqrt_recip_alphas_t * (
        x - betas_t * self.model(x, t) / sqrt_one_minus_alphas_cumprod_t
    )
    
    if t_index == 0:
        return model_mean
    else:
        posterior_variance_t = extract(self.posterior_variance, t, x.shape)
        noise = torch.randn_like(x)
        return model_mean + torch.sqrt(posterior_variance_t) * noise

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Diffusion Models

This is an easy-to-understand implementation of diffusion models within 100 lines of code. Different from other implementations, this code doesn't use the lower-bound formulation for sampling and strictly follows Algorithm 1 from the DDPM paper, which makes it extremely short and easy to follow. There are two implementations: conditional and unconditional. Furthermore, the conditional code also implements Classifier-Free-Guidance (CFG) and Exponential-Moving-Average (EMA). Below you can find two explanation videos for the theory behind diffusion models and the implementation.

Qries Qries

Train a Diffusion Model on your own data:

Unconditional Training

  1. (optional) Configure Hyperparameters in ddpm.py
  2. Set path to dataset in ddpm.py
  3. python ddpm.py

Conditional Training

  1. (optional) Configure Hyperparameters in ddpm_conditional.py
  2. Set path to dataset in ddpm_conditional.py
  3. python ddpm_conditional.py

Sampling

The following examples show how to sample images using the models trained in the video on the Landscape Dataset. You can download the checkpoints for the models here.

Unconditional Model

    device = "cuda"
    model = UNet().to(device)
    ckpt = torch.load("unconditional_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    x = diffusion.sample(model, n=16)
    plot_images(x)

Conditional Model

This model was trained on CIFAR-10 64x64 with 10 classes airplane:0, auto:1, bird:2, cat:3, deer:4, dog:5, frog:6, horse:7, ship:8, truck:9

    n = 10
    device = "cuda"
    model = UNet_conditional(num_classes=10).to(device)
    ckpt = torch.load("conditional_ema_ckpt.pt")
    model.load_state_dict(ckpt)
    diffusion = Diffusion(img_size=64, device=device)
    y = torch.Tensor([6] * n).long().to(device)
    x = diffusion.sample(model, n, y, cfg_scale=3)
    plot_images(x)

A more advanced version of this code can be found here by @tcapelle. It introduces better logging, faster & more efficient training and other nice features and is also being followed by a nice write-up.