generative-models

Generative Models by Stability AI

26,088

2,903

26,088

318

View on GitHub

Top Related Projects

diffusers

29,520

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

CLIP

29,576

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

imagen-pytorch

8,329

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Quick Overview

Stability-AI/generative-models is an open-source repository containing the official implementation of Stable Diffusion XL (SDXL) and other generative models by Stability AI. It provides a framework for training and deploying state-of-the-art text-to-image and image-to-image models, allowing researchers and developers to explore and build upon these advanced AI technologies.

Pros

Offers access to cutting-edge generative AI models, including SDXL
Provides a flexible and extensible framework for training and fine-tuning models
Includes comprehensive documentation and examples for ease of use
Supports various image generation tasks and customization options

Cons

Requires significant computational resources for training and inference
May have a steep learning curve for users new to generative AI
Limited to specific model architectures and may not be suitable for all use cases
Potential ethical concerns regarding the generation of synthetic media

Code Examples

Loading and using a pre-trained SDXL model:

from sgm.inference.helpers import init_model
from sgm.inference.api import SamplingParams, SamplingPipeline

model = init_model("path/to/sdxl_model.safetensors")
pipeline = SamplingPipeline(model)

prompt = "A beautiful sunset over a calm ocean"
params = SamplingParams(batch_size=1, num_steps=50)
images = pipeline.sample(prompt, params)

Fine-tuning a model on custom data:

from sgm.training.trainer import Trainer
from sgm.data import ImageDataset

dataset = ImageDataset("path/to/custom_data")
trainer = Trainer(model, dataset, batch_size=4, lr=1e-5)
trainer.train(num_epochs=10)

Performing image-to-image generation:

from sgm.inference.api import ImageToImageParams

init_image = load_image("path/to/input_image.jpg")
prompt = "Transform this image into a watercolor painting"
params = ImageToImageParams(init_image=init_image, strength=0.75)
images = pipeline.sample(prompt, params)

Getting Started

To get started with Stability-AI/generative-models:

Clone the repository:

git clone https://github.com/Stability-AI/generative-models.git
cd generative-models

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained weights:

wget https://github.com/Stability-AI/generative-models/releases/download/sdxl1.0/sdxl_1.0.safetensors

Run inference:

from sgm.inference.api import SamplingPipeline, SamplingParams
from sgm.inference.helpers import init_model

model = init_model("sdxl_1.0.safetensors")
pipeline = SamplingPipeline(model)
images = pipeline.sample("A futuristic cityscape", SamplingParams())
images[0].save("output.png")

Competitor Comparisons

diffusers

29,520

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Pros of diffusers

Extensive model support: Includes a wide range of pre-trained models and architectures
User-friendly API: Offers a high-level interface for easy integration and use
Active community: Regular updates and contributions from a large user base

Cons of diffusers

Less focused on cutting-edge research: May not always include the latest advancements
Higher-level abstraction: Can be less flexible for custom implementations

Code Comparison

diffusers:

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
image = pipeline("A beautiful sunset over the ocean").images[0]

generative-models:

from sgm.inference.helpers import generate_image

model = load_model("sd_v1.5")
image = generate_image(model, prompt="A beautiful sunset over the ocean")

Both repositories provide tools for working with generative AI models, but diffusers offers a more user-friendly approach with its pipeline abstraction, while generative-models may provide more flexibility for advanced users and researchers.

CLIP

29,576

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Pros of CLIP

Focused on image-text understanding, making it highly specialized for tasks like image classification and retrieval
Simpler architecture, potentially easier to implement and fine-tune for specific use cases
Extensive documentation and examples provided in the repository

Cons of CLIP

Limited to image-text tasks, less versatile compared to generative-models' broader capabilities
Smaller community and fewer updates, potentially slower development and improvement cycle

Code Comparison

CLIP (Python):

import torch
from PIL import Image
import clip

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("image.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)

generative-models (Python):

import torch
from sgm.inference.helpers import embed_watermark
from sgm.util import default, instantiate_from_config

config = OmegaConf.load("configs/inference/txt2img.yaml")
model = instantiate_from_config(config.model).to(device)
model.eval()

with torch.no_grad():
    samples = model.conditioned_sample(conditioning, batch_size=1)
    samples = embed_watermark(samples)

imagen-pytorch

8,329

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Pros of imagen-pytorch

Lightweight and focused implementation of the Imagen architecture
Easier to understand and modify for research purposes
More flexible and customizable for specific use cases

Cons of imagen-pytorch

Less comprehensive and feature-rich compared to generative-models
May lack some optimizations and performance improvements
Potentially less stable or production-ready

Code Comparison

imagen-pytorch:

imagen = Imagen(
    unets = (unet1, unet2, unet3),
    image_sizes = (64, 256, 1024),
    timesteps = 1000,
    cond_drop_prob = 0.1
)

generative-models:

model = SDXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0")
images = model(
    prompt="a photo of an astronaut riding a horse on mars",
    num_inference_steps=50,
).images

The imagen-pytorch example shows a more low-level approach to model configuration, while generative-models provides a higher-level API for easy use of pre-trained models. generative-models offers a more comprehensive suite of tools and models, but imagen-pytorch may be more suitable for researchers looking to experiment with the core architecture.

stable-diffusion-webui

153,957

Stable Diffusion web UI

Pros of stable-diffusion-webui

User-friendly web interface for easy interaction with Stable Diffusion models
Extensive customization options and a wide range of features
Active community with frequent updates and extensions

Cons of stable-diffusion-webui

Primarily focused on image generation, with limited support for other modalities
May require more setup and configuration for advanced users

Code Comparison

stable-diffusion-webui:

import modules.scripts
from modules import sd_samplers
from modules.processing import process_images, Processed
from modules.shared import opts, cmd_opts, state

generative-models:

import torch
from omegaconf import OmegaConf
from sgm.util import instantiate_from_config
from sgm.inference.api import ModelArchitecture

The code snippets show that stable-diffusion-webui is more focused on providing a user interface and processing images, while generative-models is geared towards model architecture and configuration. generative-models appears to be more flexible and adaptable to various generative tasks, while stable-diffusion-webui is optimized for Stable Diffusion image generation workflows.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Generative Models by Stability AI

sample1

News

May 20, 2025

We are releasing Stable Video 4D 2.0 (SV4D 2.0), an enhanced video-to-4D diffusion model for high-fidelity novel-view video synthesis and 4D asset generation. For research purposes:
- SV4D 2.0 was trained to generate 48 frames (12 video frames x 4 camera views) at 576x576 resolution, given a 12-frame input video of the same size, ideally consisting of white-background images of a moving object.
- Compared to our previous 4D model SV4D, SV4D 2.0 can generate videos with higher fidelity, sharper details during motion, and better spatio-temporal consistency. It also generalizes much better to real-world videos. Moreover, it does not rely on refernce multi-view of the first frame generated by SV3D, making it more robust to self-occlusions.
- To generate longer novel-view videos, we autoregressively generate 12 frames at a time and use the previous generation as conditioning views for the remaining frames.
- Please check our project page, arxiv paper and video summary for more details.

QUICKSTART :

python scripts/sampling/simple_video_sample_4d2.py --input_path assets/sv4d_videos/camel.gif --output_folder outputs (after downloading sv4d2.safetensors from HuggingFace into checkpoints/)

To run SV4D 2.0 on a single input video of 21 frames:

Download SV4D 2.0 model (sv4d2.safetensors) from here to checkpoints/: huggingface-cli download stabilityai/sv4d2.0 sv4d2.safetensors --local-dir checkpoints
Run inference: python scripts/sampling/simple_video_sample_4d2.py --input_path <path/to/video>
- input_path : The input video <path/to/video> can be
  - a single video file in gif or mp4 format, such as assets/sv4d_videos/camel.gif, or
  - a folder containing images of video frames in .jpg, .jpeg, or .png format, or
  - a file name pattern matching images of video frames.
- num_steps : default is 50, can decrease to it to shorten sampling time.
- elevations_deg : specified elevations (reletive to input view), default is 0.0 (same as input view).
- Background removal : For input videos with plain background, (optionally) use rembg to remove background and crop video frames by setting --remove_bg=True. To obtain higher quality outputs on real-world input videos with noisy background, try segmenting the foreground object using Clipdrop or SAM2 before running SV4D.
- Low VRAM environment : To run on GPUs with low VRAM, try setting --encoding_t=1 (of frames encoded at a time) and --decoding_t=1 (of frames decoded at a time) or lower video resolution like --img_size=512.

Notes:

We also train a 8-view model that generates 5 frames x 8 views at a time (same as SV4D).
- Download the model from huggingface: huggingface-cli download stabilityai/sv4d2.0 sv4d2_8views.safetensors --local-dir checkpoints
- Run inference: python scripts/sampling/simple_video_sample_4d2.py --model_path checkpoints/sv4d2_8views.safetensors --input_path assets/sv4d_videos/chest.gif --output_folder outputs
- The 5x8 model takes 5 frames of input at a time. But the inference scripts for both model take 21-frame video as input by default (same as SV3D and SV4D), we run the model autoregressively until we generate 21 frames.
Install dependencies before running:

python3.10 -m venv .generativemodels
source .generativemodels/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # check CUDA version
pip3 install -r requirements/pt2.txt
pip3 install .
pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

tile

July 24, 2024

We are releasing Stable Video 4D (SV4D), a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
- SV4D was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
- To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
- To run the community-build gradio demo locally, run python -m scripts.demo.gradio_app_sv4d.
- Please check our project page, tech report and video summary for more details.

QUICKSTART : python scripts/sampling/simple_video_sample_4d.py --input_path assets/sv4d_videos/test_video1.mp4 --output_folder outputs/sv4d (after downloading sv4d.safetensors and sv3d_u.safetensors from HuggingFace into checkpoints/)

To run SV4D on a single input video of 21 frames:

Download SV3D models (sv3d_u.safetensors and sv3d_p.safetensors) from here and SV4D model (sv4d.safetensors) from here to checkpoints/
Run python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>
- input_path : The input video <path/to/video> can be
  - a single video file in gif or mp4 format, such as assets/sv4d_videos/test_video1.mp4, or
  - a folder containing images of video frames in .jpg, .jpeg, or .png format, or
  - a file name pattern matching images of video frames.
- num_steps : default is 20, can increase to 50 for better quality but longer sampling time.
- sv3d_version : To specify the SV3D model to generate reference multi-views, set --sv3d_version=sv3d_u for SV3D_u or --sv3d_version=sv3d_p for SV3D_p.
- elevations_deg : To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), run python scripts/sampling/simple_video_sample_4d.py --input_path assets/sv4d_videos/test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0
- Background removal : For input videos with plain background, (optionally) use rembg to remove background and crop video frames by setting --remove_bg=True. To obtain higher quality outputs on real-world input videos with noisy background, try segmenting the foreground object using Clipdrop or SAM2 before running SV4D.
- Low VRAM environment : To run on GPUs with low VRAM, try setting --encoding_t=1 (of frames encoded at a time) and --decoding_t=1 (of frames decoded at a time) or lower video resolution like --img_size=512.

March 18, 2024

We are releasing SV3D, an image-to-video model for novel multi-view synthesis, for research purposes:
- SV3D was trained to generate 21 frames at resolution 576x576, given 1 context frame of the same size, ideally a white-background image with one object.
- SV3D_u: This variant generates orbital videos based on single image inputs without camera conditioning..
- SV3D_p: Extending the capability of SVD3_u, this variant accommodates both single images and orbital views allowing for the creation of 3D video along specified camera paths.
- We extend the streamlit demo scripts/demo/video_sampling.py and the standalone python script scripts/sampling/simple_video_sample.py for inference of both models.
- Please check our project page, tech report and video summary for more details.

To run SV3D_u on a single image:

Download sv3d_u.safetensors from https://huggingface.co/stabilityai/sv3d to checkpoints/sv3d_u.safetensors
Run python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_u

To run SV3D_p on a single image:

Download sv3d_p.safetensors from https://huggingface.co/stabilityai/sv3d to checkpoints/sv3d_p.safetensors

Generate static orbit at a specified elevation eg. 10.0 : python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg 10.0
Generate dynamic orbit at a specified elevations and azimuths: specify sequences of 21 elevations (in degrees) to elevations_deg ([-90, 90]), and 21 azimuths (in degrees) to azimuths_deg [0, 360] in sorted order from 0 to 360. For example: python scripts/sampling/simple_video_sample.py --input_path <path/to/image.png> --version sv3d_p --elevations_deg [<list of 21 elevations in degrees>] --azimuths_deg [<list of 21 azimuths in degrees>]

To run SVD or SV3D on a streamlit server: streamlit run scripts/demo/video_sampling.py

tile

November 30, 2023

Following the launch of SDXL-Turbo, we are releasing SD-Turbo.

November 28, 2023

We are releasing SDXL-Turbo, a lightning fast text-to image model. Alongside the model, we release a technical report
- Usage:
  - Follow the installation instructions or update the existing environment with pip install streamlit-keyup.
  - Download the weights and place them in the checkpoints/ directory.
  - Run streamlit run scripts/demo/turbo.py.

November 21, 2023

We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
- SVD: This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware deflickering decoder.
- SVD-XT: Same architecture as SVD but finetuned for 25 frame generation.
- You can run the community-build gradio demo locally by running python -m scripts.demo.gradio_app.
- We provide a streamlit demo scripts/demo/video_sampling.py and a standalone python script scripts/sampling/simple_video_sample.py for inference of both models.
- Alongside the model, we release a technical report.

July 26, 2023

We are releasing two new open models with a permissive CreativeML Open RAIL++-M license (see Inference for file hashes):
- SDXL-base-1.0: An improved version over SDXL-base-0.9.
- SDXL-refiner-1.0: An improved version over SDXL-refiner-0.9.

sample2

July 4, 2023

A technical report on SDXL is now available here.

June 22, 2023

We are releasing two new diffusion models for research purposes:
- SDXL-base-0.9: The base model was trained on a variety of aspect ratios on images with resolution 1024^2. The base model uses OpenCLIP-ViT/G and CLIP-ViT/L for text encoding whereas the refiner model only uses the OpenCLIP model.
- SDXL-refiner-0.9: The refiner has been trained to denoise small noise levels of high quality data and as such is not expected to work as a text-to-image model; instead, it should only be used as an image-to-image model.

If you would like to access these models for your research, please apply using one of the following links: SDXL-0.9-Base model, and SDXL-0.9-Refiner. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access. We plan to do a full release soon (July).

The codebase

General Philosophy

Modularity is king. This repo implements a config-driven approach where we build and combine submodules by calling instantiate_from_config() on objects defined in yaml configs. See configs/ for many examples.

Changelog from the old `ldm` codebase

For training, we use PyTorch Lightning, but it should be easy to use other training wrappers around the base modules. The core diffusion model class (formerly LatentDiffusion, now DiffusionEngine) has been cleaned up:

No more extensive subclassing! We now handle all types of conditioning inputs (vectors, sequences and spatial conditionings, and all combinations thereof) in a single class: GeneralConditioner, see sgm/modules/encoders/modules.py.
We separate guiders (such as classifier-free guidance, see sgm/modules/diffusionmodules/guiders.py) from the samplers (sgm/modules/diffusionmodules/sampling.py), and the samplers are independent of the model.
We adopt the "denoiser framework" for both training and inference (most notable change is probably now the option to train continuous time models):
- Discrete times models (denoisers) are simply a special case of continuous time models (denoisers); see sgm/modules/diffusionmodules/denoiser.py.
- The following features are now independent: weighting of the diffusion loss function (sgm/modules/diffusionmodules/denoiser_weighting.py), preconditioning of the network (sgm/modules/diffusionmodules/denoiser_scaling.py), and sampling of noise levels during training (sgm/modules/diffusionmodules/sigma_sampling.py).
Autoencoding models have also been cleaned up.

Installation:

1. Clone the repo

git clone https://github.com/Stability-AI/generative-models.git
cd generative-models

2. Setting up the virtualenv

This is assuming you have navigated to the generative-models root after cloning it.

NOTE: This is tested under python3.10. For other python versions, you might encounter version conflicts.

PyTorch 2.0

# install required packages from pypi
python3 -m venv .pt2
source .pt2/bin/activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements/pt2.txt

3. Install `sgm`

pip3 install .

4. Install `sdata` for training

pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Packaging

This repository uses PEP 517 compliant packaging using Hatch.

To build a distributable wheel, install hatch and run hatch build (specifying -t wheel will skip building a sdist, which is not necessary).

pip install hatch
hatch build -t wheel

You will find the built package in dist/. You can install the wheel with pip install dist/*.whl.

Note that the package does not currently specify dependencies; you will need to install the required packages, depending on your use case and PyTorch version, manually.

Inference

We provide a streamlit demo for text-to-image and image-to-image sampling in scripts/demo/sampling.py. We provide file hashes for the complete file as well as for only the saved tensors in the file ( see Model Spec for a script to evaluate that). The following models are currently supported:

SDXL-base-1.0

File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b
Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7

SDXL-refiner-1.0

File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f
Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81

SDXL-base-0.9
SDXL-refiner-0.9
SD-2.1-512
SD-2.1-768

Weights for SDXL:

SDXL-1.0: The weights of SDXL-1.0 are available (subject to a CreativeML Open RAIL++-M license) here:

base model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/
refiner model: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/

SDXL-0.9: The weights of SDXL-0.9 are available and subject to a research license. If you would like to access these models for your research, please apply using one of the following links: SDXL-base-0.9 model, and SDXL-refiner-0.9. This means that you can apply for any of the two links - and if you are granted - you can access both. Please log in to your Hugging Face Account with your organization email to request access.

After obtaining the weights, place them into checkpoints/. Next, start the demo using

streamlit run scripts/demo/sampling.py --server.port <your_port>

Invisible Watermark Detection

Images generated with our code use the invisible-watermark library to embed an invisible watermark into the model output. We also provide a script to easily detect that watermark. Please note that this watermark is not the same as in previous Stable Diffusion 1.x/2.x versions.

To run the script you need to either have a working installation as above or try an experimental import using only a minimal amount of packages:

python -m venv .detect
source .detect/bin/activate

pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"
pip install --no-deps invisible-watermark

To run the script you need to have a working installation as above. The script is then useable in the following ways (don't forget to activate your virtual environment beforehand, e.g. source .pt1/bin/activate):

# test a single file
python scripts/demo/detect.py <your filename here>
# test multiple files at once
python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n>
# test all files in a specific folder
python scripts/demo/detect.py <your folder name here>/*

Training:

We are providing example training configs in configs/example_training. To launch a training, run

python main.py --base configs/<config1.yaml> configs/<config2.yaml>

where configs are merged from left to right (later configs overwrite the same values). This can be used to combine model, training and data configs. However, all of them can also be defined in a single config. For example, to run a class-conditional pixel-based diffusion model training on MNIST, run

python main.py --base configs/example_training/toy/mnist_cond.yaml

NOTE 1: Using the non-toy-dataset configs configs/example_training/imagenet-f8_cond.yaml, configs/example_training/txt2img-clipl.yaml and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml for training will require edits depending on the used dataset (which is expected to stored in tar-file in the webdataset-format). To find the parts which have to be adapted, search for comments containing USER: in the respective config.

NOTE 2: This repository supports both pytorch1.13 and pytorch2for training generative models. However for autoencoder training as e.g. in configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml, only pytorch1.13 is supported.

NOTE 3: Training latent generative models (as e.g. in configs/example_training/imagenet-f8_cond.yaml) requires retrieving the checkpoint from Hugging Face and replacing the CKPT_PATH placeholder in this line. The same is to be done for the provided text-to-image configs.

Building New Diffusion Models

Conditioner

The GeneralConditioner is configured through the conditioner_config. Its only attribute is emb_models, a list of different embedders (all inherited from AbstractEmbModel) that are used to condition the generative model. All embedders should define whether or not they are trainable (is_trainable, default False), a classifier-free guidance dropout rate is used (ucg_rate, default 0), and an input key (input_key), for example, txt for text-conditioning or cls for class-conditioning. When computing conditionings, the embedder will get batch[input_key] as input. We currently support two to four dimensional conditionings and conditionings of different embedders are concatenated appropriately. Note that the order of the embedders in the conditioner_config is important.

Network

The neural network is set through the network_config. This used to be called unet_config, which is not general enough as we plan to experiment with transformer-based diffusion backbones.

Loss

The loss is configured through loss_config. For standard diffusion model training, you will have to set sigma_sampler_config.

Sampler config

As discussed above, the sampler is independent of the model. In the sampler_config, we set the type of numerical solver, number of steps, type of discretization, as well as, for example, guidance wrappers for classifier-free guidance.

Dataset Handling

For large scale training we recommend using the data pipelines from our data pipelines project. The project is contained in the requirement and automatically included when following the steps from the Installation section. Small map-style datasets should be defined here in the repository (e.g., MNIST, CIFAR-10, ...), and return a dict of data keys/values, e.g.,

example = {"jpg": x,  # this is a tensor -1...1 chw
           "txt": "a beautiful image"}

where we expect images in -1...1, channel-first format.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of diffusers

Cons of diffusers

Code Comparison

Pros of CLIP

Cons of CLIP

Code Comparison

Pros of imagen-pytorch

Cons of imagen-pytorch

Code Comparison

Pros of stable-diffusion-webui

Cons of stable-diffusion-webui

Code Comparison

Convert designs to code with AI

README

Generative Models by Stability AI

News

The codebase

General Philosophy

Changelog from the old ldm codebase

Installation:

1. Clone the repo

2. Setting up the virtualenv

3. Install sgm

4. Install sdata for training

Packaging

Inference

Invisible Watermark Detection

Training:

Building New Diffusion Models

Conditioner

Network

Loss

Sampler config

Dataset Handling

Top Related Projects

Convert designs to code with AI

Changelog from the old `ldm` codebase

3. Install `sgm`

4. Install `sdata` for training