pix2code

pix2code: Generating Code from a Graphical User Interface Screenshot

11,981

1,442

11,981

View on GitHub

Top Related Projects

Screenshot-to-code

16,446

A neural network that transforms a design mock-up into a static website.

sketch-code

5,129

Keras model to generate HTML code from hand-drawn website mockups. Implements an image captioning architecture to drawn source images.

neuraltalk2

5,515

Efficient Image Captioning code in Torch, runs on GPU

Lona

7,528

A tool for defining design systems and using them to generate cross-platform UI code, Sketch files, and other artifacts.

Quick Overview

pix2code is a research project that uses deep learning to generate code from graphical user interface screenshots. It aims to automate the process of translating design mockups into functional code, potentially streamlining the development workflow for front-end applications.

Pros

Innovative approach to automating UI development
Potential to significantly reduce time and effort in translating designs to code
Demonstrates the capabilities of machine learning in software development
Could bridge the gap between designers and developers

Cons

Still in the research phase, not ready for production use
Limited to specific types of user interfaces and design patterns
May require fine-tuning or additional training for different design styles
Potential concerns about code quality and maintainability of generated code

Code Examples

As pix2code is a research project and not a code library, there are no specific code examples to provide. The repository contains implementation details and models, but it's not intended for direct use as a library.

Getting Started

Since pix2code is a research project, there isn't a straightforward "getting started" process for using it as a tool. However, for those interested in exploring the project:

Clone the repository: git clone https://github.com/tonybeltramelli/pix2code.git
Install dependencies (TensorFlow, Keras, etc.) as specified in the project documentation
Explore the provided datasets and model implementations
Refer to the research paper for detailed information on the approach and methodology

Note that this project is primarily for research purposes and may require significant expertise in machine learning and computer vision to understand and potentially adapt for specific use cases.

Competitor Comparisons

Screenshot-to-code

16,446

A neural network that transforms a design mock-up into a static website.

Pros of Screenshot-to-code

More flexible and adaptable to different types of UI designs
Supports multiple output formats (HTML/CSS, React, Vue)
Actively maintained with recent updates

Cons of Screenshot-to-code

Requires more computational resources due to its complexity
May have a steeper learning curve for beginners
Less focused on mobile app development compared to pix2code

Code Comparison

Screenshot-to-code:

def generate_html(screenshot):
    encoded_image = encode_image(screenshot)
    model_output = get_model_output(encoded_image)
    return parse_model_output(model_output)

pix2code:

def generate_code(gui_image):
    tokens = tokenize_gui(gui_image)
    dsl_code = generate_dsl(tokens)
    return compile_to_target_language(dsl_code)

Both projects aim to convert UI designs into code, but Screenshot-to-code offers more flexibility in terms of output formats and design types. However, pix2code may be more suitable for mobile app development and could be easier for beginners to use. The code comparison shows that Screenshot-to-code uses a more direct approach, while pix2code employs a domain-specific language as an intermediate step.

sketch-code

5,129

Keras model to generate HTML code from hand-drawn website mockups. Implements an image captioning architecture to drawn source images.

Pros of sketch-code

More recent and actively maintained repository
Includes a web-based GUI for easier interaction and visualization
Supports multiple output formats (HTML/CSS, Android XML, iOS Swift)

Cons of sketch-code

Less comprehensive documentation compared to pix2code
Smaller dataset for training, potentially affecting accuracy
Limited to specific design patterns and layouts

Code Comparison

sketch-code:

def get_model_outputs(input_path, output_path, model_json_path, model_weights_path):
    model = load_model(model_json_path, model_weights_path)
    img = preprocessing.image.load_img(input_path, target_size=(224, 224))
    x = preprocessing.image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    pred = model.predict(x)[0]
    return pred

pix2code:

def run(input_path, output_path, model_json_path, model_weights_path):
    model = model_from_json(open(model_json_path, 'r').read())
    model.load_weights(model_weights_path)
    img_width, img_height = 256, 256
    img = image.load_img(input_path, target_size=(img_width, img_height))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    predicted = model.predict(x)
    return predicted

Both projects aim to convert design mockups into code, but sketch-code offers a more user-friendly interface and broader output options. However, pix2code provides more extensive documentation and potentially better accuracy due to its larger dataset. The code comparison shows similar approaches to loading and processing images, with minor differences in implementation details.

neuraltalk2

5,515

Efficient Image Captioning code in Torch, runs on GPU

Pros of neuraltalk2

Focuses on image captioning, providing more specialized functionality
Utilizes a more advanced deep learning architecture (LSTM-based RNN)
Offers pre-trained models for immediate use

Cons of neuraltalk2

Limited to image captioning, less versatile than pix2code
Requires more computational resources due to complex architecture
Less actively maintained, with fewer recent updates

Code Comparison

neuraltalk2:

# Extract image features
fc7 = self.feature_extractor(images)
# LSTM forward pass
lstm_output, _ = self.lstm(fc7.unsqueeze(0))
# Generate caption
scores = self.linear(lstm_output.squeeze(0))

pix2code:

# Generate GUI code from image
generated_gui = self.model.predict(image)
# Convert GUI representation to code
code = self.compiler.compile(generated_gui)

Summary

neuraltalk2 excels in image captioning with its specialized architecture, while pix2code offers broader functionality in translating GUI designs to code. neuraltalk2 provides more advanced deep learning techniques but requires more resources and has less recent development. pix2code, on the other hand, focuses on a specific use case of GUI code generation, making it more versatile for developers working on user interfaces.

Lona

7,528

A tool for defining design systems and using them to generate cross-platform UI code, Sketch files, and other artifacts.

Pros of Lona

Focuses on design systems and component libraries, offering a more comprehensive approach to UI development
Provides a visual editor for creating and managing design tokens, making it easier to maintain consistency across projects
Supports multiple platforms, including iOS, Android, and web, allowing for greater flexibility in cross-platform development

Cons of Lona

Requires more setup and configuration compared to pix2code's simpler approach
Has a steeper learning curve due to its more extensive feature set
May be overkill for smaller projects or teams that don't require a full design system

Code Comparison

Lona (JSON configuration):

{
  "type": "View",
  "parameters": {
    "backgroundColor": "blue100"
  },
  "children": [
    {
      "type": "Text",
      "parameters": {
        "text": "Hello, World!"
      }
    }
  ]
}

pix2code (DSL output):

<view>
    <text>Hello, World!</text>
</view>

While pix2code generates a simple DSL representation of the UI, Lona uses a more detailed JSON configuration that includes styling information and a hierarchical structure. This reflects Lona's focus on design systems and more complex UI components.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

pix2code

Generating Code from a Graphical User Interface Screenshot

A video demo of the system can be seen here
The paper is available at https://arxiv.org/abs/1705.07962
Official research page: https://uizard.io/research#pix2code

Abstract

Transforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications. In this paper, we show that deep learning methods can be leveraged to train a model end-to-end to automatically generate code from a single input image with over 77% of accuracy for three different platforms (i.e. iOS, Android and web-based technologies).

Citation

@article{beltramelli2017pix2code,
  title={pix2code: Generating Code from a Graphical User Interface Screenshot},
  author={Beltramelli, Tony},
  journal={arXiv preprint arXiv:1705.07962},
  year={2017}
}

Disclaimer

The following software is shared for educational purposes only. The author and its affiliated institution are not responsible in any manner whatsoever for any damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of the use or inability to use this software.

The project pix2code is a research project demonstrating an application of deep neural networks to generate code from visual inputs. The current implementation is not, in any way, intended, nor able to generate code in a real-world context. We could not emphasize enough that this project is experimental and shared for educational purposes only. Both the source code and the datasets are provided to foster future research in machine intelligence and are not designed for end users.

Setup

Prerequisites

Python 2 or 3
pip

Install dependencies

pip install -r  requirements.txt

Usage

Prepare the data:

# reassemble and unzip the data
cd datasets
zip -F pix2code_datasets.zip --out datasets.zip
unzip datasets.zip

cd ../model

# split training set and evaluation set while ensuring no training example in the evaluation set
# usage: build_datasets.py <input path> <distribution (default: 6)>
./build_datasets.py ../datasets/ios/all_data
./build_datasets.py ../datasets/android/all_data
./build_datasets.py ../datasets/web/all_data

# transform images (normalized pixel values and resized pictures) in training dataset to numpy arrays (smaller files if you need to upload the set to train your model in the cloud)
# usage: convert_imgs_to_arrays.py <input path> <output path>
./convert_imgs_to_arrays.py ../datasets/ios/training_set ../datasets/ios/training_features
./convert_imgs_to_arrays.py ../datasets/android/training_set ../datasets/android/training_features
./convert_imgs_to_arrays.py ../datasets/web/training_set ../datasets/web/training_features

Train the model:

mkdir bin
cd model

# provide input path to training data and output path to save trained model and metadata
# usage: train.py <input path> <output path> <is memory intensive (default: 0)> <pretrained weights (optional)>
./train.py ../datasets/web/training_set ../bin

# train on images pre-processed as arrays
./train.py ../datasets/web/training_features ../bin

# train with generator to avoid having to fit all the data in memory (RECOMMENDED)
./train.py ../datasets/web/training_features ../bin 1

# train on top of pretrained weights
./train.py ../datasets/web/training_features ../bin 1 ../bin/pix2code.h5

Generate code for batch of GUIs:

mkdir code
cd model

# generate DSL code (.gui file), the default search method is greedy
# usage: generate.py <trained weights path> <trained model name> <input image> <output path> <search method (default: greedy)>
./generate.py ../bin pix2code ../gui_screenshots ../code

# equivalent to command above
./generate.py ../bin pix2code ../gui_screenshots ../code greedy

# generate DSL code with beam search and a beam width of size 3
./generate.py ../bin pix2code ../gui_screenshots ../code 3

Generate code for a single GUI image:

mkdir code
cd model

# generate DSL code (.gui file), the default search method is greedy
# usage: sample.py <trained weights path> <trained model name> <input image> <output path> <search method (default: greedy)>
./sample.py ../bin pix2code ../test_gui.png ../code

# equivalent to command above
./sample.py ../bin pix2code ../test_gui.png ../code greedy

# generate DSL code with beam search and a beam width of size 3
./sample.py ../bin pix2code ../test_gui.png ../code 3

Compile generated code to target language:

cd compiler

# compile .gui file to Android XML UI
./android-compiler.py <input file path>.gui

# compile .gui file to iOS Storyboard
./ios-compiler.py <input file path>.gui

# compile .gui file to HTML/CSS (Bootstrap style)
./web-compiler.py <input file path>.gui

FAQ

Will pix2code supports other target platforms/languages?

No, pix2code is only a research project and will stay in the state described in the paper for consistency reasons. This project is really just a toy example but you are of course more than welcome to fork the repo and experiment yourself with other target platforms/languages.

Will I be able to use pix2code for my own frontend projects?

No, pix2code is experimental and won't work for your specific use cases.

How is the model performance measured?

The accuracy/error reported in the paper is measured at the DSL level by comparing each generated token with each expected token. Any difference in length between the generated token sequence and the expected token sequence is also counted as error.

How long does it take to train the model?

On a Nvidia Tesla K80 GPU, it takes a little less than 5 hours to optimize the 109 * 10^6 parameters for one dataset; so expect around 15 hours if you want to train the model for the three target platforms.

I am a front-end developer, will I soon lose my job?

(I have genuinely been asked this question multiple times)

TL;DR Not anytime soon will AI replace front-end developers.

Even assuming a mature version of pix2code able to generate GUI code with 100% accuracy for every platforms/languages in the universe, front-enders will still be needed to implement the logic, the interactive parts, the advanced graphics and animations, and all the features users love. The product we are building at Uizard Technologies is intended to bridge the gap between UI/UX designers and front-end developers, not replace any of them. We want to rethink the traditional workflow that too often results in more frustration than innovation. We want designers to be as creative as possible to better serve end users, and developers to dedicate their time programming the core functionality and forget about repetitive tasks such as UI implementation. We believe in a future where AI collaborate with humans, not replace humans.

Media coverage

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot