Convert Figma logo to code with AI

microsoft logoVoTT

Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.

4,330
839
4,330
246

Top Related Projects

12,814

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

13,720

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

22,953

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Quick Overview

VoTT (Visual Object Tagging Tool) is an open-source annotation and labeling tool for image and video assets. It's designed to provide a simple and fast way to build end-to-end machine learning models from a catalog of assets and labels. VoTT is particularly useful for computer vision tasks and can export to various formats for use with different machine learning frameworks.

Pros

  • User-friendly interface with support for both image and video annotation
  • Cross-platform compatibility (Windows, macOS, Linux)
  • Supports multiple export formats (CNTK, TensorFlow, YOLO, etc.)
  • Extensible architecture allowing for custom plugins and integrations

Cons

  • Limited advanced features compared to some commercial annotation tools
  • Occasional performance issues with large datasets or complex projects
  • Learning curve for setting up and configuring projects
  • Limited built-in collaboration features for team-based annotation

Getting Started

To get started with VoTT:

  1. Download the latest release from the GitHub releases page.
  2. Install the application on your system.
  3. Launch VoTT and create a new project:
    • Set your source connection (local folder or cloud storage)
    • Configure your target connection for exports
    • Define your tags/labels
  4. Start annotating your images or videos:
    • Use rectangle, polygon, or point annotations
    • Apply tags to your annotations
  5. Export your annotations in your chosen format.

For more detailed instructions, refer to the official documentation.

Competitor Comparisons

12,814

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

Pros of CVAT

  • More comprehensive annotation tools, supporting a wider range of tasks
  • Web-based interface allows for easier collaboration and remote access
  • Integrates with popular ML frameworks and supports model-assisted annotation

Cons of CVAT

  • Steeper learning curve due to more complex features
  • Requires server setup, which can be challenging for some users
  • May be overkill for simple labeling tasks

Code Comparison

VoTT (JavaScript):

export default class Rect extends React.Component<IRectProps> {
    public render() {
        const { width, height, left, top, featureStyleName } = this.props;
        return (
            <rect className={`${featureStyleName} rect`}
                  x={left} y={top}
                  width={width} height={height} />
        );
    }
}

CVAT (Python):

class RectangleShape(Shape):
    def __init__(self, x, y, w, h):
        self.xtl = x
        self.ytl = y
        self.xbr = x + w
        self.ybr = y + h

    def area(self):
        return (self.xbr - self.xtl) * (self.ybr - self.ytl)

Both repositories provide tools for image and video annotation, but CVAT offers more advanced features and better scalability for larger projects. VoTT is simpler to set up and use, making it suitable for smaller-scale labeling tasks or individual users.

13,720

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

Pros of labelme

  • Simpler and more lightweight interface, easier to get started quickly
  • Supports a wider variety of annotation types (polygons, lines, points, etc.)
  • Exports to multiple formats including COCO, YOLO, and VOC XML

Cons of labelme

  • Less robust project management and workflow features
  • Fewer built-in ML model integration options
  • Limited video annotation capabilities compared to VoTT

Code Comparison

labelme example:

import labelme
from labelme import utils

# Load image and annotations
json_file = labelme.LabelFile(filename='annotations.json')
img = utils.img_data_to_arr(json_file.imageData)

# Access annotations
for shape in json_file.shapes:
    label = shape['label']
    points = shape['points']

VoTT example:

import { IAsset, IProject } from "vott-react";

// Load project and assets
const project: IProject = loadProject("project.vott");
const assets: IAsset[] = project.assets;

// Access annotations
assets.forEach(asset => {
  asset.regions.forEach(region => {
    const label = region.tags[0];
    const boundingBox = region.boundingBox;
  });
});

Both tools offer annotation capabilities, but VoTT provides more robust project management features, while labelme offers a simpler interface and more diverse annotation types.

22,953

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.

Pros of labelImg

  • Lightweight and easy to install with minimal dependencies
  • Supports multiple annotation formats (YOLO, PascalVOC, CreateML)
  • Faster for simple bounding box annotations

Cons of labelImg

  • Limited to bounding box annotations only
  • Less advanced project management features
  • Fewer export options and integrations

Code Comparison

labelImg:

def saveFile(self, _value=False):
    if self.filePath:
        try:
            self.saveLabels(self.filePath)
            self.setClean()
            return True
        except:
            self.errorMessage(u'Error saving file', u'Error saving labels to file')
            return False
    return False

VoTT:

public async save(): Promise<void> {
    if (this.project.isDirty || this.state.assets.length !== this.project.assets.length) {
        await this.projectService.save(this.project);
        this.setState({ project: this.project });
    }
}

Both examples show saving functionality, but VoTT's implementation is more robust with async handling and state management.

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Pros of Label Studio

  • Supports a wider range of data types and labeling tasks
  • More customizable and flexible annotation interface
  • Active community and frequent updates

Cons of Label Studio

  • Steeper learning curve due to increased complexity
  • Requires more setup and configuration compared to VoTT

Code Comparison

Label Studio configuration example:

<View>
  <Image name="image" value="$image"/>
  <RectangleLabels name="label" toName="image">
    <Label value="Car"/>
    <Label value="Pedestrian"/>
  </RectangleLabels>
</View>

VoTT configuration example:

{
  "tags": [
    { "name": "Car", "color": "#FF0000" },
    { "name": "Pedestrian", "color": "#00FF00" }
  ]
}

Label Studio offers more flexibility in defining labeling tasks through its XML-based configuration, while VoTT uses a simpler JSON structure for tag definitions. Label Studio's approach allows for more complex annotation scenarios, but may require more effort to set up initially.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

VoTT is no longer being maintained!

Build Status Code Coverage Quality Gate Status

Complexity Analysis Report


An open source annotation and labeling tool for image and video assets.

VoTT is a React + Redux Web application, written in TypeScript. This project was bootstrapped with Create React App.

Features include:

  • The ability to label images or video frames
  • Extensible model for importing data from local or cloud storage providers
  • Extensible model for exporting labeled data to local or cloud storage providers

VoTT helps facilitate an end-to-end machine learning pipeline:

alt text

Table of Contents

Getting Started

VoTT can be installed as a native application or run from source. VoTT is also available as a stand-alone Web application and can be used in any modern Web browser.

Download and install a release package for your platform (recommended)

VoTT is available for Windows, Linux and OSX. Download the appropriate platform package/installer from GitHub Releases. v2 releases will be prefixed by 2.x.

Build and run from source

VoTT requires NodeJS (>= 10.x, Dubnium) and NPM

 git clone https://github.com/Microsoft/VoTT.git
 cd VoTT
 npm ci
 npm start

IMPORTANT

When running locally with npm, both the electron and the browser versions of the application will start. One major difference is that the electron version can access the local file system.

Run as Web Application

Using a modern Web browser, VoTT can be loaded from: https://vott.z22.web.core.windows.net

As noted above, the Web version of VoTT cannot access the local file system; all assets must be imported/exported through a Cloud project.

V1 & V2

VoTT V2 is a refactor and refresh of the original Electron-based application. As the usage and demand for VoTT grew, V2 was started as an initiative to improve and make VoTT more extensible and maintainable. In addition, V2 uses more modern development frameworks and patterns (React, Redux) and is authored in TypeScript.

A number of code quality practices have been adopted, including:

All V2 efforts are on the master branch

Where is V1

V1 will be on the v1 branch. There will not be any fixes or updates.

V1 releases

1.x releases can still be found under GitHub Releases.

V1 projects in V2

There is support for converting a V1 project into V2 format. Upon opening the JSON file, a window will pop up to confirm that the app should convert the project before redirecting to the editor screen. In this process, a .vott file will be generated in the same project directory, which may be used as the main project file going forward. We recommend backing up the V1 project file before converting the project.

Using VoTT

Creating Connections

VoTT is a 'Bring Your Own Data' (BYOD) application. In VoTT, connections are used to configure and manage source (the assets to label) and target (the location to which labels should be exported).

Connections can be set up and shared across projects. They use an extensible provider model, so new source/target providers can easily be added.

Currently, VoTT supports:

To create a new connection, click the New Connections (plug) icon, in the left hand navigation bar:

alt text

Creating a New Project

Labeling workflows in VoTT revolve around projects - a collection of configurations and settings that persist.

Projects define source and target connections, and project metadata - including tags to be used when labeling source assets.

As mentioned above, all projects require a source and target connection:

  • Source Connection - Where to pull assets from
  • Target Connection - Where project files and exported data should be stored

alt text

Project Settings

Project settings can be modified after a project has been created, by clicking on the Project Setting (slider) icon in the left hand navigation bar. Project metrics, such as Visited Assets, Tagged Assets, and Average Tags Per Asset can also be viewed on this screen.

alt text

Security Tokens

Some project settings can include sensitive values, such as API keys or other shared secrets. Each project will generate a security token that can be used to encrypt/decrypt sensitive project settings.

Security tokens can be found in Application Settings by clicking the gear icon in the lower corner of the left hand navigation bar.

NOTE: Project files can be shared among multiple people. In order to share sensitive project settings, all parties must have/use the same security token.

The token name and key must match in order for sensitive values to be successfully decrypted.

alt text

Labeling an Image

When a project is created or opened, the main tag editor window opens. The tag editor consists of three main parts:

  • A resizeable preview pane that contains a scrollable list of images and videos, from the source connection
  • The main editor pane that allows tags to be applied to drawn regions
  • The tags editor pane that allows users to modify, lock, reorder, and delete tags

Selecting an image or video on the left will load that image in the main tag editor. Regions can then be drawn on the loaded asset and a tag can be applied.

As desired, repeat this process for any additional assets.

alt text

Labeling a Video

Labeling a video is much like labeling a series of images. When a video is selected from the left, it will begin automatically playing, and there are several controls on the player, as seen here:

alt text

In addition to the normal video playback controls, there are two extra pairs of buttons.

On the left, there are the previous and next frame buttons. Clicking these will pause the video, and move to the next appropriate frame as determined by the project settings. For example, if the project settings have a frame extraction rate of 1, these buttons will cause the video to be moved back or forward 1 second, while if the frame extraction rate is 10, the video will be moved back or forward a tenth of a second.

On the right, there are the previous and next tagged frame buttons. Clicking these will pause the video and move to the next or previous frame that has a previously tagged region on it, if a tagged frame exists.

Colored lines will also be visible along the video's timeline. These indicate the video frames that have already been visited. A yellow line denotes a frame that has been visited only, while a green line denotes a frame that has been both visited and tagged. The colored lines can be clicked for quick navigation to the indicated frame.

The timeline can also be used to manually scrub through the video to an arbitrary point, though the project settings for frame extraction rate are always obeyed. Pausing the video will move to the closest frame according to this project setting. This way, a very low frame extraction rate, such as 1 frame per second, can be set for sections of the video known to contain few taggable items, and a much higher frame extraction rate, such as 30 frames per second, to allow fine-grained control.

Tagging and drawing regions is not possible while the video is playing.

Exporting Labels

Once assets have been labeled, they can be exported into a variety of formats:

In addition, users may choose to export

  • all assets
  • only visited assets
  • only tagged assets

Click on the Export (arrow) icon in the left hand navigation. Select the appropriate export provider and which assets to export. The percentage separated into testing and training sets can be adjusted here too.

alt text

Keyboard Shortcuts

VoTT allows a number of keyboard shortcuts to make it easier to keep one hand on the mouse while tagging. It allows most common shortcuts:

  • Ctrl or Cmd + C - copy
  • Ctrl or Cmd + X - cut
  • Ctrl or Cmd + V - paste
  • Ctrl or Cmd + A - select all
  • Ctrl or Cmd + Z - undo
  • Ctrl or Cmd + Shift + Z - redo

Tag Ordering

Hotkeys of 1 through 0 are assigned to the first ten tags. These can be reordered by using the up/down arrow icons in in the tag editor pane.

Tag Locking

A tag can be locked for repeated tagging using the lock icon at the top of the tag editor pane. Tags can also be locked by combining Ctrl or Cmd and the tag hotkey, i.e. Ctrl+2 would lock the second tag in the list.

alt text

Editor Shortcuts

In addition, the editor page has some special shortcuts to select tagging tools:

  • V - Pointer/Select
  • R - Draw Rectangle
  • P - Draw Polygon
  • Ctrl or Cmd + S - Save Project
  • Ctrl or Cmd + E - Export Project

VOTT allows you to fine tune the bounding boxes using the arrow keys in a few different ways. While a region is selected:

  • Ctrl + Arrowkey - Move Region
  • Ctrl + Alt + Arrowkey - Shrink Region
  • Ctrl + Shift + Arrowkey - Expand Region

The slide viewer can be navigated from the keyboard as follows:

  • W or ArrowUp - Previous Asset
  • S or ArrowDown - Next Asset

When the video playback bar is present, it allows the following shortcuts to select frames:

  • A or ArrowLeft - Previous Frame
  • D or ArrowRight - Next Frame
  • Q - Previous Tagged Frame
  • E - Next Tagged Frame

Mouse Controls

  • Two-point mode - Hold down Ctrl while creating a region
  • Square mode - Hold down Shift while creating a region
  • Multi-select - Hold down Shift while selecting regions
  • Exclusive Tracking mode - Ctrl + N to block frame UI allowing a user to create a region on top of existing regions

Release Process

alt text

For more details on github/web releases and versions -- please review our release process document

To build VoTT executable using command:

npm run release

For details on packaging executable for the release -- please review our PACKAGING.md

Collaborators

VoTT was originally developed by the Commercial Software Engineering (CSE) group at Microsoft in Israel.

V2 is developed by the CSE group at Microsoft in Redmond, Washington.

Contributing to VoTT

There are many ways to contribute to VoTT -- please review our contribution guidelines.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

NPM DownloadsLast 30 Days