Convert Figma logo to code with AI

borisdayma logodalle-mini

DALL路E Mini - Generate images from a text prompt

14,792
1,220
14,792
99

Top Related Projects

10,818

PyTorch package for the discrete VAE used for DALL路E.

A latent text-to-image diffusion model

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

High-Resolution Image Synthesis with Latent Diffusion Models

28,179

馃 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Quick Overview

DALL路E Mini is an open-source AI model that generates images from text prompts. It's a smaller, more accessible version of OpenAI's DALL路E, designed to run on less powerful hardware while still producing creative and diverse images based on textual descriptions.

Pros

  • Open-source and freely available for use and modification
  • Runs on less powerful hardware compared to full-scale DALL路E
  • Produces creative and diverse images from text prompts
  • Active community and ongoing development

Cons

  • Lower image quality compared to more advanced models like DALL路E 2 or Midjourney
  • Limited training data compared to larger models
  • Can produce inconsistent or inaccurate results for complex prompts
  • Slower generation times compared to more optimized models

Code Examples

# Install the dalle-mini library
!pip install git+https://github.com/borisdayma/dalle-mini.git
# Import necessary modules
from dalle_mini import DalleBart, DalleBartProcessor
from vqgan_jax import VQModel
import jax
import jax.numpy as jnp
# Load the model and processor
model = DalleBart.from_pretrained("dalle-mini/dalle-mini")
processor = DalleBartProcessor.from_pretrained("dalle-mini/dalle-mini")
vqgan = VQModel.from_pretrained("dalle-mini/vqgan")
# Generate images from a text prompt
prompt = "a cute cat wearing a hat"
encoded_prompt = processor(prompt, return_tensors="jax").pixel_values
images = model.generate(encoded_prompt, num_return_sequences=4)
decoded_images = vqgan.decode_code(images)

Getting Started

To get started with DALL路E Mini:

  1. Clone the repository:

    git clone https://github.com/borisdayma/dalle-mini.git
    cd dalle-mini
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the inference script:

    python inference.py --prompt "Your text prompt here"
    

This will generate images based on your text prompt and save them in the output directory.

Competitor Comparisons

10,818

PyTorch package for the discrete VAE used for DALL路E.

Pros of DALL-E

  • Higher quality image generation with more detailed and coherent results
  • Larger model with more extensive training data, leading to better understanding of complex prompts
  • Supports inpainting and outpainting features for image editing

Cons of DALL-E

  • Closed-source and not publicly available for free use or modification
  • Limited API access, requiring approval and potentially high costs for usage
  • Less flexibility for customization or fine-tuning on specific datasets

Code Comparison

While a direct code comparison is not possible due to DALL-E's closed-source nature, we can compare the usage of DALL-E Mini with a hypothetical DALL-E API:

DALL-E Mini:

import dalle_mini
model = dalle_mini.load_model()
images = model.generate_images("A cat wearing a hat")

Hypothetical DALL-E API:

import openai
openai.api_key = "your_api_key"
response = openai.Image.create(prompt="A cat wearing a hat")
image_url = response['data'][0]['url']

DALL-E Mini offers a more accessible and open approach, allowing for local usage and modifications. DALL-E, while potentially more powerful, requires API access and may have usage restrictions.

A latent text-to-image diffusion model

Pros of Stable-Diffusion

  • Higher image quality and more detailed outputs
  • Faster inference time, especially on GPU
  • More flexible architecture allowing for various applications (inpainting, image-to-image, etc.)

Cons of Stable-Diffusion

  • Requires more computational resources and VRAM
  • More complex setup and installation process
  • Larger model size, making it less suitable for edge devices

Code Comparison

DALLE-mini:

from dalle_mini import DalleBart, DalleBartProcessor
model = DalleBart.from_pretrained("dalle-mini/dalle-mini")
processor = DalleBartProcessor.from_pretrained("dalle-mini/dalle-mini")
images = model.generate_images(processor("A cute cat"))

Stable-Diffusion:

from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
image = pipe("A cute cat").images[0]

Both repositories offer text-to-image generation capabilities, but Stable-Diffusion generally produces higher quality results with faster inference times. However, it requires more computational resources and has a more complex setup process. DALLE-mini is easier to use and more lightweight, making it suitable for simpler applications or resource-constrained environments. The code comparison shows that Stable-Diffusion has a more streamlined API, while DALLE-mini's approach is slightly more verbose but potentially more flexible for advanced users.

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Pros of DALLE2-pytorch

  • Implements the more advanced DALL-E 2 architecture
  • Offers more flexibility and customization options
  • Potentially higher quality image generation

Cons of DALLE2-pytorch

  • More complex implementation, requiring deeper understanding
  • Heavier computational requirements
  • Less documentation and community support compared to dalle-mini

Code Comparison

DALLE2-pytorch:

model = DALLE2(
    dim = 512,
    image_size = 256,
    text_encoder_depth = 6,
    text_encoder_heads = 8,
    text_encoder_dim_head = 64,
    num_tokens = 16384,
    # ... more parameters ...
)

dalle-mini:

model = DalleBart.from_pretrained(
    "dalle-mini/dalle-mini",
    revision=None,
    dtype=jnp.float16,
    abstract_init=True
)

DALLE2-pytorch offers more granular control over model architecture, while dalle-mini provides a simpler, pre-configured approach. DALLE2-pytorch requires manual setup of various parameters, whereas dalle-mini uses a pre-trained model with fewer configuration options. The DALLE2-pytorch implementation is more suitable for researchers and advanced users looking to experiment with the architecture, while dalle-mini is more accessible for quick deployment and testing.

High-Resolution Image Synthesis with Latent Diffusion Models

Pros of latent-diffusion

  • More efficient training and inference due to compression in latent space
  • Supports various conditioning types (text, image, class labels)
  • Flexible architecture allowing for different applications (image generation, inpainting, super-resolution)

Cons of latent-diffusion

  • More complex implementation and setup
  • May require more computational resources for initial training
  • Less straightforward for beginners to understand and modify

Code Comparison

latent-diffusion:

model = LatentDiffusion(
    unet_config,
    cond_stage_config,
    first_stage_config,
    num_timesteps_cond=num_timesteps_cond,
    cond_stage_key="caption",
    cond_stage_trainable=True,
    concat_mode=True,
)

dalle-mini:

model = DalleBart.from_pretrained(
    pretrained_model_name_or_path,
    revision=revision,
    dtype=dtype,
    device_map="auto"
)

The latent-diffusion code shows a more complex model initialization with various configurations, while dalle-mini uses a simpler pretrained model loading approach. This reflects the difference in complexity and flexibility between the two projects.

28,179

馃 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Pros of diffusers

  • Broader scope, supporting multiple diffusion models and techniques
  • More active development and larger community support
  • Extensive documentation and integration with Hugging Face ecosystem

Cons of diffusers

  • Higher complexity due to supporting multiple models
  • Potentially steeper learning curve for beginners
  • May require more computational resources for some models

Code comparison

dalle-mini:

from dalle_mini import DalleBart, DalleBartProcessor
model = DalleBart.from_pretrained("dalle-mini/dalle-mini")
processor = DalleBartProcessor.from_pretrained("dalle-mini/dalle-mini")

diffusers:

from diffusers import StableDiffusionPipeline
model = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

Summary

diffusers offers a more comprehensive solution for working with various diffusion models, while dalle-mini focuses specifically on the DALL-E Mini model. diffusers benefits from broader community support and integration with the Hugging Face ecosystem, but may be more complex for users seeking a simpler, specific implementation. dalle-mini provides a more straightforward approach for those specifically interested in the DALL-E Mini model, but with potentially less flexibility and ongoing development compared to diffusers.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

DALL脗路E Mini

How to use it?

You can use the model on 冒聼聳聧茂赂聫 craiyon

How does it work?

Refer to our reports:

Development

Dependencies Installation

For inference only, use pip install dalle-mini.

For development, clone the repo and use pip install -e ".[dev]". Before making a PR, check style with make style.

You can experiment with the pipeline step by step through our inference pipeline notebook

Open In Colab

Training of DALL脗路E mini

Use tools/train/train.py.

You can also adjust the sweep configuration file if you need to perform a hyperparameter search.

FAQ

Where to find the latest models?

Trained models are on 冒聼陇聴 Model Hub:

Where does the logo come from?

The "armchair in the shape of an avocado" was used by OpenAI when releasing DALL脗路E to illustrate the model's capabilities. Having successful predictions on this prompt represents a big milestone for us.

Contributing

Join the community on the LAION Discord. Any contribution is welcome, from reporting issues to proposing fixes/improvements or testing the model with cool prompts!

You can also use these great projects from the community:

Acknowledgements

Authors & Contributors

DALL脗路E mini was initially developed by:

Many thanks to the people who helped make it better:

Citing DALL脗路E mini

If you find DALL脗路E mini useful in your research or wish to refer, please use the following BibTeX entry.

@misc{Dayma_DALL脗路E_Mini_2021,
      author = {Dayma, Boris and Patil, Suraj and Cuenca, Pedro and Saifullah, Khalid and Abraham, Tanishq and L脙陋 Kh谩潞炉c, Ph脙潞c and Melas, Luke and Ghosh, Ritobrata},
      doi = {10.5281/zenodo.5146400},
      month = {7},
      title = {DALL脗路E Mini},
      url = {https://github.com/borisdayma/dalle-mini},
      year = {2021}
}

References

Original DALL脗路E from "Zero-Shot Text-to-Image Generation" with image quantization from "Learning Transferable Visual Models From Natural Language Supervision".

Image encoder from "Taming Transformers for High-Resolution Image Synthesis".

Sequence to sequence model based on "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension" with implementation of a few variants:

Main optimizer (Distributed Shampoo) from "Scalable Second Order Optimization for Deep Learning".

Citations

@misc{
  title={Zero-Shot Text-to-Image Generation}, 
  author={Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
  year={2021},
  eprint={2102.12092},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
@misc{
  title={Learning Transferable Visual Models From Natural Language Supervision}, 
  author={Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
  year={2021},
  eprint={2103.00020},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
@misc{
  title={Taming Transformers for High-Resolution Image Synthesis}, 
  author={Patrick Esser and Robin Rombach and Bj脙露rn Ommer},
  year={2021},
  eprint={2012.09841},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
@misc{
  title={BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension}, 
  author={Mike Lewis and Yinhan Liu and Naman Goyal and Marjan Ghazvininejad and Abdelrahman Mohamed and Omer Levy and Ves Stoyanov and Luke Zettlemoyer},
  year={2019},
  eprint={1910.13461},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
@misc{
  title={Scalable Second Order Optimization for Deep Learning},
  author={Rohan Anil and Vineet Gupta and Tomer Koren and Kevin Regan and Yoram Singer},
  year={2021},
  eprint={2002.09018},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
@misc{
  title={GLU Variants Improve Transformer},
  author={Noam Shazeer},
  year={2020},
  url={https://arxiv.org/abs/2002.05202}    
}
 @misc{
  title={DeepNet: Scaling transformers to 1,000 layers},
  author={Wang, Hongyu and Ma, Shuming and Dong, Li and Huang, Shaohan and Zhang, Dongdong and Wei, Furu},
  year={2022},
  eprint={2203.00555}
  archivePrefix={arXiv},
  primaryClass={cs.LG}
} 
@misc{
  title={NormFormer: Improved Transformer Pretraining with Extra Normalization},
  author={Sam Shleifer and Jason Weston and Myle Ott},
  year={2021},
  eprint={2110.09456},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}
@inproceedings{
  title={Swin Transformer V2: Scaling Up Capacity and Resolution}, 
  author={Ze Liu and Han Hu and Yutong Lin and Zhuliang Yao and Zhenda Xie and Yixuan Wei and Jia Ning and Yue Cao and Zheng Zhang and Li Dong and Furu Wei and Baining Guo},
  booktitle={International Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}
@misc{
  title = {CogView: Mastering Text-to-Image Generation via Transformers},
  author = {Ming Ding and Zhuoyi Yang and Wenyi Hong and Wendi Zheng and Chang Zhou and Da Yin and Junyang Lin and Xu Zou and Zhou Shao and Hongxia Yang and Jie Tang},
  year = {2021},
  eprint = {2105.13290},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV}
}
@misc{
  title = {Root Mean Square Layer Normalization},
  author = {Biao Zhang and Rico Sennrich},
  year = {2019},
  eprint = {1910.07467},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG}
}
@misc{
  title = {Sinkformers: Transformers with Doubly Stochastic Attention},
  url = {https://arxiv.org/abs/2110.11773},
  author = {Sander, Michael E. and Ablin, Pierre and Blondel, Mathieu and Peyr脙漏, Gabriel},
  publisher = {arXiv},
  year = {2021},
}
@misc{
  title = {Smooth activations and reproducibility in deep networks},
  url = {https://arxiv.org/abs/2010.09931},
  author = {Shamir, Gil I. and Lin, Dong and Coviello, Lorenzo},
  publisher = {arXiv},
  year = {2020},
}
@misc{
  title = {Foundation Transformers},
  url = {https://arxiv.org/abs/2210.06423},
  author = {Wang, Hongyu and Ma, Shuming and Huang, Shaohan and Dong, Li and Wang, Wenhui and Peng, Zhiliang and Wu, Yu and Bajaj, Payal and Singhal, Saksham and Benhaim, Alon and Patra, Barun and Liu, Zhun and Chaudhary, Vishrav and Song, Xia and Wei, Furu},
  publisher = {arXiv},
  year = {2022},
}