PhotoMaker

PhotoMaker [CVPR 2024]

10,005

807

10,005

153

View on GitHub

Top Related Projects

diffusers

29,520

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

stablediffusion

40,867

High-Resolution Image Synthesis with Latent Diffusion Models

Quick Overview

PhotoMaker is an AI-powered tool for creating and editing photos based on text prompts and reference images. It allows users to generate personalized images by combining textual descriptions with visual references, enabling the creation of custom portraits and scenes with specific styles or characteristics.

Pros

Offers high-quality, personalized image generation
Combines text prompts with reference images for precise control
Supports various image editing and manipulation tasks
User-friendly interface for both beginners and advanced users

Cons

Requires significant computational resources for optimal performance
May have limitations in generating certain complex or highly specific scenes
Potential ethical concerns regarding the creation of synthetic images
Learning curve for achieving desired results with complex prompts

Code Examples

# Initialize PhotoMaker
from photomaker import PhotoMaker

pm = PhotoMaker()

# Generate an image based on a text prompt and reference image
result = pm.generate(
    prompt="A portrait of a woman with long blonde hair in a forest",
    reference_image="path/to/reference.jpg"
)

# Edit an existing image
edited_image = pm.edit(
    image="path/to/input.jpg",
    prompt="Add a red scarf and make the background snowy"
)

# Create a style transfer
styled_image = pm.style_transfer(
    content_image="path/to/content.jpg",
    style_image="path/to/style.jpg",
    strength=0.7
)

Getting Started

To get started with PhotoMaker:

Install the library:
```
pip install photomaker
```

Import and initialize PhotoMaker:

from photomaker import PhotoMaker
pm = PhotoMaker()

Generate an image:

result = pm.generate(
    prompt="Your text prompt here",
    reference_image="path/to/reference.jpg"
)
result.save("output.jpg")

For more advanced usage and options, refer to the official documentation.

Competitor Comparisons

ControlNet

32,165

Let us control diffusion models!

Pros of ControlNet

More versatile, supporting various conditioning types (edges, depth maps, poses, etc.)
Offers finer control over image generation process
Extensive documentation and community support

Cons of ControlNet

Steeper learning curve due to its complexity
Requires more computational resources
May be overkill for simple image editing tasks

Code Comparison

ControlNet:

from share import *
import config

model = create_model('./models/cldm_v15.yaml').cpu()
model.load_state_dict(load_state_dict('./models/control_sd15_canny.pth'))

detect_edge = cv2.Canny(img, 100, 200)
control = torch.from_numpy(detect_edge).float().cuda() / 255.0
control = control.unsqueeze(0).unsqueeze(0).repeat(1, 4, 1, 1)

PhotoMaker:

from diffusers import StableDiffusionPipeline
import torch

pipe = StableDiffusionPipeline.from_pretrained("TencentARC/PhotoMaker", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

image = pipe(prompt="a photo of a person", num_inference_steps=50).images[0]

ControlNet offers more granular control over the image generation process, allowing for various conditioning inputs. PhotoMaker, on the other hand, provides a simpler interface for generating images based on text prompts, making it more accessible for basic image creation tasks.

stable-diffusion-webui

153,957

Stable Diffusion web UI

Pros of stable-diffusion-webui

More comprehensive and feature-rich, offering a wide range of image generation and manipulation tools
Highly customizable with a large ecosystem of extensions and models
Active community support and frequent updates

Cons of stable-diffusion-webui

Steeper learning curve due to its extensive features and options
Requires more computational resources for optimal performance
Setup process can be more complex, especially for beginners

Code Comparison

PhotoMaker:

from photomaker import PhotoMaker

pm = PhotoMaker(device="cuda")
images = pm.process(prompt, image_paths, num_samples=4)

stable-diffusion-webui:

import modules.scripts as scripts
import gradio as gr

class Script(scripts.Script):
    def run(self, p, *args):
        # Custom processing logic here

The code snippets highlight the difference in approach: PhotoMaker offers a more straightforward API for photo generation, while stable-diffusion-webui provides a framework for creating custom scripts and extensions, offering greater flexibility but requiring more setup and coding knowledge.

diffusers

29,520

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Pros of diffusers

Broader scope, supporting various diffusion models and tasks
Extensive documentation and community support
Seamless integration with other Hugging Face libraries

Cons of diffusers

Steeper learning curve for beginners
May require more setup and configuration for specific tasks

Code Comparison

PhotoMaker:

from photomaker import PhotoMaker

pm = PhotoMaker()
pm.generate_image("A portrait of a woman", num_images=1)

diffusers:

from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
image = pipeline("A portrait of a woman").images[0]

Summary

PhotoMaker focuses specifically on generating photos, offering a simpler API for this task. diffusers provides a more comprehensive toolkit for various diffusion models, making it more versatile but potentially more complex to use. PhotoMaker may be easier for beginners or those focused solely on photo generation, while diffusers offers more flexibility and integration with the broader Hugging Face ecosystem.

stablediffusion

40,867

High-Resolution Image Synthesis with Latent Diffusion Models

Pros of stablediffusion

More versatile, capable of generating a wide range of images beyond portraits
Larger community and ecosystem, with more resources and integrations available
Open-source nature allows for more customization and fine-tuning

Cons of stablediffusion

Less specialized in portrait generation compared to PhotoMaker
May require more prompt engineering to achieve specific results
Generally requires more computational resources for inference

Code Comparison

PhotoMaker:

from photomaker import PhotoMaker

pm = PhotoMaker(device="cuda")
images = pm.generate(
    prompt="A smiling woman with blonde hair",
    num_samples=1
)

stablediffusion:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
image = pipe("A smiling woman with blonde hair").images[0]

Both repositories offer straightforward APIs for image generation, but PhotoMaker is more focused on portrait creation, while stablediffusion provides a more general-purpose image generation pipeline.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

[Paper] [Project Page] [Model Card]

[ð¥New ð¤ Demo (PhotoMaker V2)] [ð¤ Demo (Realistic)] [ð¤ Demo (Stylization)]

[Replicate Demo (Realistic)] [Replicate Demo (Stylization)] [Jittor version]

PhotoMaker-V2 is supported by the HunyuanDiT team.

ð¥³ We release PhotoMaker V2. Please refer to comparisons between PhotoMaker V1, PhotoMaker V2, IP-Adapter-FaceID-plus-V2, and InstantID. Please watch this video for how to use our demo. For PhotoMaker V2 ComfyUI nodes, please refer to the Related Resources

ð Key Features:

Rapid customization within seconds, with no additional LoRA training.
Ensures impressive ID fidelity, offering diversity, promising text controllability, and high-quality generation.
Can serve as an Adapter to collaborate with other Base Models alongside LoRA modules in community.

ââ Note: If there are any PhotoMaker based resources and applications, please leave them in the discussion and we will list them in the Related Resources section in README file. Now we know the implementation of Replicate, Windows, ComfyUI, and WebUI. Thank you all!

photomaker_demo_fast

ð© New Features/Updates

â July 22, 2024. ð¥ We release PhotoMaker V2 with improved ID fidelity. At the same time, it still maintains the generation quality, editability, and compatibility with any plugins that PhotoMaker V1 offers. We have also provided scripts for integration with ControlNet, T2I-Adapter, and IP-Adapter to offer excellent control capabilities. Users can further customize scripts for upgrades, such as combining with LCM for acceleration or integrating with IP-Adapter-FaceID or InstantID to further improve ID fidelity. We will release technical report of PhotoMaker V2 soon. Please refer to this doc for a quick preview.
â January 20, 2024. An important note: For those GPUs that do not support bfloat16, please change this line to torch_dtype = torch.float16, the speed will be greatly improved (1min/img (before) vs. 14s/img (after) on V100). The minimum GPU memory requirement for PhotoMaker is 11G (Please refer to this link for saving GPU memory).
â January 15, 2024. We release PhotoMaker.

ð¥ Examples

Realistic generation

PhotoMaker notebook demo

Stylization generation

Note: only change the base model and add the LoRA modules for better stylization

PhotoMaker-Style notebook demo

ð§ Dependencies and Installation

Python >= 3.8 (Recommend to use Anaconda or Miniconda)
PyTorch >= 2.0.0

conda create --name photomaker python=3.10
conda activate photomaker
pip install -U pip

# Install requirements
pip install -r requirements.txt

# Install photomaker
pip install git+https://github.com/TencentARC/PhotoMaker.git

Then you can run the following command to use it

from photomaker import PhotoMakerStableDiffusionXLPipeline

â¬ Download Models

The model will be automatically downloaded through the following two lines:

from huggingface_hub import hf_hub_download
photomaker_path = hf_hub_download(repo_id="TencentARC/PhotoMaker", filename="photomaker-v1.bin", repo_type="model")

You can also choose to download manually from this url.

ð» How to Test

Use like diffusers

Dependency

import torch
import os
from diffusers.utils import load_image
from diffusers import EulerDiscreteScheduler
from photomaker import PhotoMakerStableDiffusionXLPipeline

### Load base model
pipe = PhotoMakerStableDiffusionXLPipeline.from_pretrained(
    base_model_path,  # can change to any base model based on SDXL
    torch_dtype=torch.bfloat16, 
    use_safetensors=True, 
    variant="fp16"
).to(device)

### Load PhotoMaker checkpoint
pipe.load_photomaker_adapter(
    os.path.dirname(photomaker_path),
    subfolder="",
    weight_name=os.path.basename(photomaker_path),
    trigger_word="img"  # define the trigger word
)     

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)

### Also can cooperate with other LoRA modules
# pipe.load_lora_weights(os.path.dirname(lora_path), weight_name=lora_model_name, adapter_name="xl_more_art-full")
# pipe.set_adapters(["photomaker", "xl_more_art-full"], adapter_weights=[1.0, 0.5])

pipe.fuse_lora()

Input ID Images

### define the input ID images
input_folder_name = './examples/newton_man'
image_basename_list = os.listdir(input_folder_name)
image_path_list = sorted([os.path.join(input_folder_name, basename) for basename in image_basename_list])

input_id_images = []
for image_path in image_path_list:
    input_id_images.append(load_image(image_path))

Generation

# Note that the trigger word `img` must follow the class word for personalization
prompt = "a half-body portrait of a man img wearing the sunglasses in Iron man suit, best quality"
negative_prompt = "(asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), open mouth, grayscale"
generator = torch.Generator(device=device).manual_seed(42)
images = pipe(
    prompt=prompt,
    input_id_images=input_id_images,
    negative_prompt=negative_prompt,
    num_images_per_prompt=1,
    num_inference_steps=num_steps,
    start_merge_step=10,
    generator=generator,
).images[0]
gen_images.save('out_photomaker.png')

Start a local gradio demo

Run the following command:

python gradio_demo/app.py

You could customize this script in this file.

If you want to run it on MAC, you should follow this Instruction and then run the app.py.

Usage Tips:

Upload more photos of the person to be customized to improve ID fidelity. If the input is Asian face(s), maybe consider adding 'Asian' before the class word, e.g., Asian woman img
When stylizing, does the generated face look too realistic? Adjust the Style strength to 30-50, the larger the number, the less ID fidelity, but the stylization ability will be better. You could also try out other base models or LoRAs with good stylization effects.
Reduce the number of generated images and sampling steps for faster speed. However, please keep in mind that reducing the sampling steps may compromise the ID fidelity.

Related Resources

Replicate demo of PhotoMaker:

Demo link, run PhotoMaker on replicate, provided by @yorickvP and @jd7h.
Demo link (style version).

WebUI version of PhotoMaker:

stable-diffusion-webui-forge: https://github.com/lllyasviel/stable-diffusion-webui-forge provided by @Lvmin Zhang
Fooocus App: Fooocus-inswapper provided by @machineminded

Windows version of PhotoMaker:

bmaltais/PhotoMaker by @bmaltais, easy to deploy PhotoMaker on Windows. The description can be found in this link.
sdbds/PhotoMaker-for-windows by @sdbds.

ComfyUI:

ð¥ Official Implementation by ComfyUI: https://github.com/comfyanonymous/ComfyUI/commit/d1533d9c0f1dde192f738ef1b745b15f49f41e02
https://github.com/ZHO-ZHO-ZHO/ComfyUI-PhotoMaker
https://github.com/StartHua/Comfyui-Mine-PhotoMaker
https://github.com/shiimizu/ComfyUI-PhotoMaker

ComfyUI (for PhotoMaker V2):

Purely C/C++/CUDA version of PhotoMaker:

stable-diffusion.cpp by @bssrdf.

Other Applications / Web Demos

Wisemodel å§æº (Easy to use in China) https://wisemodel.cn/space/gradio/photomaker
OpenXLab (Easy to use in China): https://openxlab.org.cn/apps/detail/camenduru/PhotoMaker by @camenduru.
Colab: https://github.com/camenduru/PhotoMaker-colab by @camenduru
Monster API: https://monsterapi.ai/playground?model=photo-maker
Pinokio: https://pinokio.computer/item?uri=https://github.com/cocktailpeanutlabs/photomaker

Graido demo in 45 lines

Provided by @Gradio

ð¤ Acknowledgements

PhotoMaker is co-hosted by Tencent ARC Lab and Nankai University MCG-NKU.
Inspired from many excellent demos and repos, including IP-Adapter, multimodalart/Ip-Adapter-FaceID, FastComposer, and T2I-Adapter. Thanks for their great work!
Thanks to the HunyuanDiT team for their generous support and suggestions!
Thanks to the Venus team in Tencent PCG for their feedback and suggestions.
Thanks to the HuggingFace team for their generous support!

Disclaimer

This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

BibTeX

If you find PhotoMaker useful for your research and applications, please cite using this BibTeX:

@inproceedings{li2023photomaker,
  title={PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding},
  author={Li, Zhen and Cao, Mingdeng and Wang, Xintao and Qi, Zhongang and Cheng, Ming-Ming and Shan, Ying},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of ControlNet

Cons of ControlNet

Code Comparison

Pros of stable-diffusion-webui

Cons of stable-diffusion-webui

Code Comparison

Pros of diffusers

Cons of diffusers

Code Comparison

Summary

Pros of stablediffusion

Cons of stablediffusion

Code Comparison

Convert designs to code with AI

README

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

ð Key Features:

ð© New Features/Updates

ð¥ Examples

Realistic generation

Stylization generation

ð§ Dependencies and Installation

â¬ Download Models

ð» How to Test

Use like diffusers

Start a local gradio demo

Usage Tips:

Related Resources

Replicate demo of PhotoMaker:

WebUI version of PhotoMaker:

Windows version of PhotoMaker:

ComfyUI:

ComfyUI (for PhotoMaker V2):

Purely C/C++/CUDA version of PhotoMaker:

Other Applications / Web Demos

Graido demo in 45 lines

ð¤ Acknowledgements

Disclaimer

BibTeX

Top Related Projects

Convert designs to code with AI

ð Key Features:

ð© New Features/Updates

ð¥ Examples

ð§ Dependencies and Installation

â¬ Download Models

ð» How to Test

ð¤ Acknowledgements