Convert Figma logo to code with AI

TencentARC logoT2I-Adapter

T2I-Adapter

3,396
199
3,396
88

Top Related Projects

35,503

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Let us control diffusion models!

High-Resolution Image Synthesis with Latent Diffusion Models

Quick Overview

T2I-Adapter is an open-source project that enhances text-to-image generation models by introducing adapters. These adapters allow for more precise control over the generated images, enabling users to incorporate additional conditions such as color, depth, and sketch information into the image generation process.

Pros

  • Improves control and flexibility in text-to-image generation
  • Compatible with popular models like Stable Diffusion
  • Supports various condition types (color, depth, sketch, etc.)
  • Relatively easy to integrate into existing pipelines

Cons

  • Requires additional training for new condition types
  • May increase computational requirements
  • Limited documentation for advanced usage
  • Potential compatibility issues with future model updates

Code Examples

  1. Loading and using a T2I-Adapter:
import torch
from diffusers import StableDiffusionAdapterPipeline, T2IAdapter

adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-depth-midas")
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    adapter=adapter,
    torch_dtype=torch.float16
).to("cuda")

image = pipe(
    "a photo of a cat",
    image=depth_image,
    negative_prompt="lowres, bad anatomy, worst quality, low quality"
).images[0]
  1. Generating images with multiple conditions:
from diffusers import T2IAdapter, MultiAdapter

color_adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-color")
sketch_adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-sketch")

multi_adapter = MultiAdapter([color_adapter, sketch_adapter])

pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    adapter=multi_adapter,
    torch_dtype=torch.float16
).to("cuda")

image = pipe(
    "a colorful landscape",
    image=[color_image, sketch_image],
    negative_prompt="grayscale, monochrome"
).images[0]
  1. Fine-tuning an adapter:
from diffusers import T2IAdapter, AutoencoderKL

adapter = T2IAdapter(
    in_channels=3,
    channels=[320, 640, 1280, 1280],
    num_res_blocks=2,
    downscale_factor=16,
    adapter_type="full_adapter_xl"
)

vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")

adapter.train()
vae.train()

# Training loop code here

Getting Started

To get started with T2I-Adapter:

  1. Install the required packages:
pip install diffusers transformers accelerate
  1. Load a pre-trained adapter and pipeline:
import torch
from diffusers import StableDiffusionAdapterPipeline, T2IAdapter

adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-depth-midas")
pipe = StableDiffusionAdapterPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    adapter=adapter,
    torch_dtype=torch.float16
).to("cuda")
  1. Generate an image using the adapter:
image = pipe(
    "a photo of a mountain landscape",
    image=depth_image,
    negative_prompt="lowres, bad anatomy, worst quality, low quality"
).images[0]
image.save("output.png")

Competitor Comparisons

35,503

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Pros of GFPGAN

  • Specialized in face restoration and enhancement
  • Produces high-quality results for facial images
  • Easier to use for specific face-related tasks

Cons of GFPGAN

  • Limited to face-related applications
  • Less versatile compared to T2I-Adapter's broader image generation capabilities

Code Comparison

GFPGAN usage:

from gfpgan import GFPGANer

restorer = GFPGANer(model_path='experiments/pretrained_models/GFPGANv1.3.pth', upscale=2)
restored_img, _ = restorer.enhance(img, has_aligned=False, only_center_face=False, paste_back=True)

T2I-Adapter usage:

from diffusers import StableDiffusionAdapterPipeline

pipe = StableDiffusionAdapterPipeline.from_pretrained("TencentARC/t2i-adapter-lineart-sdxl-1.0")
image = pipe(prompt="a cat", image=adapter_image).images[0]

GFPGAN focuses on face restoration, while T2I-Adapter offers more flexibility for various image generation tasks. GFPGAN is more straightforward for face-specific applications, but T2I-Adapter provides broader capabilities for general image generation and manipulation. The code examples demonstrate the different approaches: GFPGAN directly enhances facial images, while T2I-Adapter integrates with diffusion models for diverse image generation tasks.

Let us control diffusion models!

Pros of ControlNet

  • More versatile, supporting a wider range of control types and applications
  • Better integration with popular diffusion models like Stable Diffusion
  • More active community and frequent updates

Cons of ControlNet

  • Higher computational requirements and longer processing times
  • Steeper learning curve for beginners due to its complexity
  • Less flexibility in fine-tuning specific aspects of the control

Code Comparison

ControlNet:

from share import *
import config

import cv2
import einops
import gradio as gr
import numpy as np
import torch
import random

from pytorch_lightning import seed_everything
from annotator.util import resize_image, HWC3
from annotator.canny import CannyDetector
from cldm.model import create_model, load_state_dict
from cldm.ddim_hacked import DDIMSampler

T2I-Adapter:

import torch
from diffusers import StableDiffusionPipeline, DDIMScheduler
from t2i_adapter import T2IAdapter

model_id = "stabilityai/stable-diffusion-2-1-base"
adapter_id = "TencentARC/t2iadapter_depth_sd21"

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
adapter = T2IAdapter.from_pretrained(adapter_id, torch_dtype=torch.float16)

High-Resolution Image Synthesis with Latent Diffusion Models

Pros of stablediffusion

  • More comprehensive and versatile text-to-image generation capabilities
  • Larger community and ecosystem, with more resources and pre-trained models
  • Supports various fine-tuning and customization options

Cons of stablediffusion

  • Higher computational requirements and longer inference times
  • More complex setup and configuration process
  • Steeper learning curve for beginners

Code Comparison

T2I-Adapter:

adapter = T2IAdapter(512, 256, 'style')
image = pipe(prompt="A painting of a cat", image=depth_image, adapter=adapter).images[0]

stablediffusion:

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
image = pipe(prompt="A painting of a cat").images[0]

T2I-Adapter focuses on adapting existing models with additional inputs, while stablediffusion provides a more general-purpose text-to-image generation pipeline. T2I-Adapter's code emphasizes the use of adapters and additional input images, whereas stablediffusion's code is more straightforward for basic text-to-image generation.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

👉 T2I-Adapter for [SD-1.4/1.5], for [SDXL]

Huggingface T2I-Adapter-SDXLBlog T2I-Adapter-SDXLarXiv


Official implementation of T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models based on Stable Diffusion-XL.

The diffusers team and the T2I-Adapter authors have been collaborating to bring the support of T2I-Adapters for Stable Diffusion XL (SDXL) in diffusers! It achieves impressive results in both performance and efficiency.


image

🚩 New Features/Updates

  • ✅ Sep. 8, 2023. We collaborate with the diffusers team to bring the support of T2I-Adapters for Stable Diffusion XL (SDXL) in diffusers! It achieves impressive results in both performance and efficiency. We release T2I-Adapter-SDXL models for sketch, canny, lineart, openpose, depth-zoe, and depth-mid. We release two online demos: Huggingface T2I-Adapter-SDXL and Huggingface T2I-Adapter-SDXL Doodle.

  • ✅ Aug. 21, 2023. We release T2I-Adapter-SDXL, including sketch, canny, and keypoint. We still use the original recipe (77M parameters, a single inference) to drive StableDiffusion-XL. Due to the limited computing resources, those adapters still need further improvement. We are collaborating with HuggingFace, and a more powerful adapter is in the works.

  • ✅ Jul. 13, 2023. Stability AI release Stable Doodle, a groundbreaking sketch-to-image tool based on T2I-Adapter and SDXL. It makes drawing easier.

  • ✅ Mar. 16, 2023. We add CoAdapter (Composable Adapter). The online Huggingface Gadio has been updated Huggingface Gradio (CoAdapter). You can also try the local gradio demo.

  • ✅ Mar. 16, 2023. We have shrunk the git repo with bfg. If you encounter any issues when pulling or pushing, you can try re-cloning the repository. Sorry for the inconvenience.

  • ✅ Mar. 3, 2023. Add a color adapter (spatial palette), which has only 17M parameters.

  • ✅ Mar. 3, 2023. Add four new adapters style, color, openpose and canny. See more info in the Adapter Zoo.

  • ✅ Feb. 23, 2023. Add the depth adapter t2iadapter_depth_sd14v1.pth. See more info in the Adapter Zoo.

  • ✅ Feb. 15, 2023. Release T2I-Adapter.


🔥🔥🔥 Why T2I-Adapter-SDXL?

The Original Recipe Drives Larger SD.

SD-V1.4/1.5SD-XLT2I-AdapterT2I-Adapter-SDXL
Parameters860M2.6B77 M77/79 M

Inherit High-quality Generation from SDXL.

  • Lineart-guided

Model from TencentARC/t2i-adapter-lineart-sdxl-1.0

  • Keypoint-guided

Model from openpose_sdxl_1.0

  • Sketch-guided

Model from TencentARC/t2i-adapter-sketch-sdxl-1.0

  • Depth-guided

Depth guided models from TencentARC/t2i-adapter-depth-midas-sdxl-1.0 and TencentARC/t2i-adapter-depth-zoe-sdxl-1.0 respectively

🔧 Dependencies and Installation

pip install -r requirements.txt

⏬ Download Models

All models will be automatically downloaded. You can also choose to download manually from this url.

🔥 How to Train

Here we take sketch guidance as an example, but of course, you can also prepare your own dataset following this method.

accelerate launch train_sketch.py --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --output_dir experiments/adapter_sketch_xl --config configs/train/Adapter-XL-sketch.yaml --mixed_precision="fp16" --resolution=1024 --learning_rate=1e-5 --max_train_steps=60000 --train_batch_size=1 --gradient_accumulation_steps=4 --report_to="wandb" --seed=42 --num_train_epochs 100

We train with FP16 data precision on 4 NVIDIA A100 GPUs.

💻 How to Test

Inference requires at least 15GB of GPU memory.

Quick start with diffusers

To get started, first install the required dependencies:

pip install git+https://github.com/huggingface/diffusers.git@t2iadapterxl # for now
pip install -U controlnet_aux==0.0.7 # for conditioning models and detectors  
pip install transformers accelerate safetensors
  1. Images are first downloaded into the appropriate control image format.
  2. The control image and prompt are passed to the StableDiffusionXLAdapterPipeline.

Let's have a look at a simple example using the LineArt Adapter.

  • Dependency
from diffusers import StableDiffusionXLAdapterPipeline, T2IAdapter, EulerAncestralDiscreteScheduler, AutoencoderKL
from diffusers.utils import load_image, make_image_grid
from controlnet_aux.lineart import LineartDetector
import torch

# load adapter
adapter = T2IAdapter.from_pretrained(
  "TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16, varient="fp16"
).to("cuda")

# load euler_a scheduler
model_id = 'stabilityai/stable-diffusion-xl-base-1.0'
euler_a = EulerAncestralDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
vae=AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
    model_id, vae=vae, adapter=adapter, scheduler=euler_a, torch_dtype=torch.float16, variant="fp16", 
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()

line_detector = LineartDetector.from_pretrained("lllyasviel/Annotators").to("cuda")
  • Condition Image
url = "https://huggingface.co/Adapter/t2iadapter/resolve/main/figs_SDXLV1.0/org_lin.jpg"
image = load_image(url)
image = line_detector(
    image, detect_resolution=384, image_resolution=1024
)

  • Generation
prompt = "Ice dragon roar, 4k photo"
negative_prompt = "anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured"
gen_images = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=image,
    num_inference_steps=30,
    adapter_conditioning_scale=0.8,
    guidance_scale=7.5, 
).images[0]
gen_images.save('out_lin.png')

Online Demo Huggingface T2I-Adapter-SDXL

Online Doodly Demo Huggingface T2I-Adapter-SDXL

Tutorials on HuggingFace:

...

Other Source

Jul. 13, 2023. Stability AI release Stable Doodle, a groundbreaking sketch-to-image tool based on T2I-Adapter and SDXL. It makes drawing easier.

https://user-images.githubusercontent.com/73707470/253800159-c7e12362-1ea1-4b20-a44e-bd6c8d546765.mp4

🤗 Acknowledgements

  • Thanks to HuggingFace for their support of T2I-Adapter.
  • T2I-Adapter is co-hosted by Tencent ARC Lab and Peking University VILLA.

BibTeX

@article{mou2023t2i,
  title={T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models},
  author={Mou, Chong and Wang, Xintao and Xie, Liangbin and Wu, Yanze and Zhang, Jian and Qi, Zhongang and Shan, Ying and Qie, Xiaohu},
  journal={arXiv preprint arXiv:2302.08453},
  year={2023}
}