screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
Top Related Projects
Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
A Gradio web UI for Large Language Models.
Stable Diffusion web UI
Robust Speech Recognition via Large-Scale Weak Supervision
PyTorch package for the discrete VAE used for DALL·E.
Quick Overview
Screenshot-to-code is an open-source project that uses AI to convert screenshots of user interfaces into functional HTML/CSS code. It leverages computer vision and large language models to analyze images and generate corresponding code, aiming to streamline the process of translating designs into web implementations.
Pros
- Accelerates the design-to-code process, potentially saving developers significant time
- Provides a useful tool for rapid prototyping and concept visualization
- Supports multiple frontend frameworks and can generate code for various technologies
- Continuously improving with community contributions and AI advancements
Cons
- Generated code may require manual refinement for production-ready implementations
- Accuracy can vary depending on the complexity of the input screenshot
- May not capture all nuances of responsive design or advanced UI interactions
- Reliance on external AI services could raise privacy concerns for sensitive designs
Code Examples
This project is not primarily a code library, but rather a tool that generates code. However, here are some examples of how to use the generated code:
<!-- Example of generated HTML structure -->
<div class="container">
<header>
<h1>Welcome to My Website</h1>
<nav>
<ul>
<li><a href="#home">Home</a></li>
<li><a href="#about">About</a></li>
<li><a href="#contact">Contact</a></li>
</ul>
</nav>
</header>
<main>
<section id="content">
<p>This is the main content area.</p>
</section>
</main>
</div>
/* Example of generated CSS styles */
.container {
max-width: 1200px;
margin: 0 auto;
padding: 20px;
}
header {
display: flex;
justify-content: space-between;
align-items: center;
}
nav ul {
display: flex;
list-style-type: none;
}
nav ul li {
margin-left: 20px;
}
Getting Started
To use screenshot-to-code:
- Clone the repository:
git clone https://github.com/abi/screenshot-to-code.git
- Install dependencies:
pip install -r requirements.txt
- Set up your OpenAI API key as an environment variable
- Run the application:
python app.py
- Upload a screenshot through the web interface
- Review and download the generated code
Note: Detailed setup instructions and requirements are available in the project's README file on GitHub.
Competitor Comparisons
Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
Pros of DiffusionBee-Stable-Diffusion-UI
- Focuses on image generation using Stable Diffusion models
- Provides a user-friendly GUI for easy interaction
- Supports various image generation features like inpainting and outpainting
Cons of DiffusionBee-Stable-Diffusion-UI
- Limited to image generation tasks, not web development
- May require more computational resources for running Stable Diffusion models
- Less versatile in terms of output formats compared to Screenshot-to-Code
Code Comparison
While a direct code comparison is not particularly relevant due to the different nature of these projects, we can highlight some key differences in their implementation:
Screenshot-to-Code:
def generate_code(screenshot, code_template):
# AI-based code generation from screenshot
DiffusionBee-Stable-Diffusion-UI:
def generate_image(prompt, model):
# Stable Diffusion image generation
These snippets illustrate the fundamental difference in purpose between the two projects. Screenshot-to-Code focuses on generating code from visual input, while DiffusionBee-Stable-Diffusion-UI is designed for image generation based on text prompts.
A Gradio web UI for Large Language Models.
Pros of text-generation-webui
- Supports a wide range of language models and architectures
- Offers a user-friendly web interface for text generation tasks
- Provides extensive customization options and parameters
Cons of text-generation-webui
- Requires more setup and configuration compared to screenshot-to-code
- May have a steeper learning curve for users new to language models
- Focuses solely on text generation, lacking image processing capabilities
Code Comparison
text-generation-webui:
def generate_reply(
prompt, state, stopping_strings=None, is_chat=False
):
# Generate text based on prompt and parameters
# ...
screenshot-to-code:
def generate_code(image_path, model):
# Process image and generate HTML/CSS code
# ...
The code snippets highlight the different focus areas of the two projects. text-generation-webui is centered around text generation with various parameters, while screenshot-to-code emphasizes image processing and code generation based on visual input.
Stable Diffusion web UI
Pros of stable-diffusion-webui
- More comprehensive and feature-rich, offering a wide range of image generation and manipulation tools
- Highly customizable with a large ecosystem of extensions and models
- Active community support and frequent updates
Cons of stable-diffusion-webui
- Steeper learning curve due to its extensive features and options
- Requires more computational resources and setup time
- Primarily focused on image generation, not web development or UI creation
Code Comparison
While a direct code comparison isn't particularly relevant due to the different purposes of these projects, here's a brief example of how they might be used:
screenshot-to-code:
from screenshot_to_code import generate_code
code = generate_code("screenshot.png")
print(code)
stable-diffusion-webui:
import modules.scripts as scripts
from modules import sd_samplers
result = scripts.process_images(prompt="A beautiful landscape")
result.images[0].save("output.png")
screenshot-to-code is focused on converting UI designs to code, while stable-diffusion-webui is primarily used for generating and manipulating images using AI models. The choice between them depends on the specific task at hand: UI development vs. image generation.
Pros of TaskMatrix
- Broader scope: Handles a wide range of tasks beyond UI generation
- Multi-modal capabilities: Integrates vision, language, and action
- More flexible: Can adapt to various types of inputs and outputs
Cons of TaskMatrix
- Less specialized: May not produce as refined UI code as Screenshot-to-code
- Potentially more complex to use due to its broader functionality
- Might require more computational resources for its diverse capabilities
Code Comparison
TaskMatrix (Python-based approach):
from taskmatrix import TaskMatrix
tm = TaskMatrix()
result = tm.process_image_and_generate_task("image.jpg", "Generate UI code")
print(result)
Screenshot-to-code (JavaScript-based approach):
import { generateCode } from 'screenshot-to-code';
const screenshot = 'path/to/screenshot.png';
const code = await generateCode(screenshot);
console.log(code);
Summary
TaskMatrix offers a more versatile approach to AI-driven tasks, including UI generation, while Screenshot-to-code focuses specifically on translating UI designs into code. TaskMatrix's broader scope may appeal to users needing multi-modal AI capabilities, while Screenshot-to-code might be preferred for its specialized UI code generation. The choice between them depends on the specific use case and desired level of specialization.
Robust Speech Recognition via Large-Scale Weak Supervision
Pros of Whisper
- Highly accurate speech recognition across multiple languages
- Versatile model capable of transcription, translation, and language identification
- Extensive research and development backing from OpenAI
Cons of Whisper
- Focused solely on audio processing, lacking visual or UI generation capabilities
- Requires significant computational resources for optimal performance
Code Comparison
While a direct code comparison isn't particularly relevant due to the different nature of these projects, here's a brief example of how each might be used:
Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Screenshot-to-code:
from screenshot_to_code import generate_code
screenshot = "screenshot.png"
code = generate_code(screenshot)
print(code)
Summary
Whisper excels in audio processing and speech recognition, offering a robust solution for transcription and translation tasks. Screenshot-to-code, on the other hand, focuses on converting visual designs into code, addressing a different set of challenges in the realm of UI development. While both projects showcase impressive AI capabilities, they serve distinct purposes in the developer ecosystem.
PyTorch package for the discrete VAE used for DALL·E.
Pros of DALL-E
- Generates unique and creative images from text descriptions
- Capable of producing a wide variety of artistic styles and concepts
- Useful for brainstorming visual ideas and inspiration
Cons of DALL-E
- Does not generate functional code or UI elements
- Limited to image generation, not suitable for web development tasks
- Requires careful prompt engineering to achieve desired results
Code Comparison
While a direct code comparison is not relevant due to the different nature of these projects, here's a brief overview of how they might be used:
DALL-E (Python API example):
import openai
response = openai.Image.create(
prompt="A website homepage for a coffee shop",
n=1,
size="1024x1024"
)
image_url = response['data'][0]['url']
Screenshot-to-code (Python usage example):
from screenshot_to_code import generate_code
screenshot_path = "coffee_shop_homepage.png"
generated_code = generate_code(screenshot_path)
print(generated_code)
DALL-E is focused on image generation from text prompts, while Screenshot-to-code aims to convert visual designs into functional code. They serve different purposes in the development process, with DALL-E being more suited for creative ideation and Screenshot-to-code for implementation.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
screenshot-to-code
A simple tool to convert screenshots, mockups and Figma designs into clean, functional code using AI. Now supporting Claude Sonnet 3.5 and GPT-4o!
https://github.com/abi/screenshot-to-code/assets/23818/6cebadae-2fe3-4986-ac6a-8fb9db030045
Supported stacks:
- HTML + Tailwind
- HTML + CSS
- React + Tailwind
- Vue + Tailwind
- Bootstrap
- Ionic + Tailwind
- SVG
Supported AI models:
- Claude Sonnet 3.5 - Best model!
- GPT-4o - also recommended!
- DALL-E 3 or Flux Schnell (using Replicate) for image generation
See the Examples section below for more demos.
We also just added experimental support for taking a video/screen recording of a website in action and turning that into a functional prototype.
Follow me on Twitter for updates.
ð Hosted Version
Try it live on the hosted version (paid). If you're a large or medium enterprise (50+ employees), book a meeting to explore custom enterprise plans.
ð Getting Started
The app has a React/Vite frontend and a FastAPI backend.
Keys needed:
- OpenAI API key with access to GPT-4 or Anthropic key (optional)
- Both are recommended so you can compare results from both Claude and GPT4o
If you'd like to run the app with Ollama open source models (not recommended due to poor quality results), follow this comment.
Run the backend (I use Poetry for package management - pip install poetry
if you don't have it):
cd backend
echo "OPENAI_API_KEY=sk-your-key" > .env
echo "ANTHROPIC_API_KEY=your-key" > .env
poetry install
poetry shell
poetry run uvicorn main:app --reload --port 7001
You can also set up the keys using the settings dialog on the front-end (click the gear icon after loading the frontend).
Run the frontend:
cd frontend
yarn
yarn dev
Open http://localhost:5173 to use the app.
If you prefer to run the backend on a different port, update VITE_WS_BACKEND_URL in frontend/.env.local
For debugging purposes, if you don't want to waste GPT4-Vision credits, you can run the backend in mock mode (which streams a pre-recorded response):
MOCK=true poetry run uvicorn main:app --reload --port 7001
Docker
If you have Docker installed on your system, in the root directory, run:
echo "OPENAI_API_KEY=sk-your-key" > .env
docker-compose up -d --build
The app will be up and running at http://localhost:5173. Note that you can't develop the application with this setup as the file changes won't trigger a rebuild.
ðââï¸ FAQs
- I'm running into an error when setting up the backend. How can I fix it? Try this. If that still doesn't work, open an issue.
- How do I get an OpenAI API key? See https://github.com/abi/screenshot-to-code/blob/main/Troubleshooting.md
- How can I configure an OpenAI proxy? - If you're not able to access the OpenAI API directly (due to e.g. country restrictions), you can try a VPN or you can configure the OpenAI base URL to use a proxy: Set OPENAI_BASE_URL in the
backend/.env
or directly in the UI in the settings dialog. Make sure the URL has "v1" in the path so it should look like this:https://xxx.xxxxx.xxx/v1
- How can I update the backend host that my front-end connects to? - Configure VITE_HTTP_BACKEND_URL and VITE_WS_BACKEND_URL in front/.env.local For example, set VITE_HTTP_BACKEND_URL=http://124.10.20.1:7001
- Seeing UTF-8 errors when running the backend? - On windows, open the .env file with notepad++, then go to Encoding and select UTF-8.
- How can I provide feedback? For feedback, feature requests and bug reports, open an issue or ping me on Twitter.
ð Examples
NYTimes
Original | Replica |
---|---|
Instagram page (with not Taylor Swift pics)
https://github.com/abi/screenshot-to-code/assets/23818/503eb86a-356e-4dfc-926a-dabdb1ac7ba1
Hacker News but it gets the colors wrong at first so we nudge it
https://github.com/abi/screenshot-to-code/assets/23818/3fec0f77-44e8-4fb3-a769-ac7410315e5d
Top Related Projects
Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
A Gradio web UI for Large Language Models.
Stable Diffusion web UI
Robust Speech Recognition via Large-Scale Weak Supervision
PyTorch package for the discrete VAE used for DALL·E.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot