label-studio
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Top Related Projects
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
Open source annotation tool for machine learning practitioners.
Quick Overview
Label Studio is an open-source data labeling tool that allows users to create and manage custom data annotation projects. It provides a web-based interface for annotating various types of data, including images, text, audio, and video, and supports a wide range of annotation tasks such as classification, segmentation, and bounding box detection.
Pros
- Flexible and Customizable: Label Studio can be easily customized to fit the specific needs of a project, with support for a wide range of data types and annotation tasks.
- Collaborative Workflow: The tool supports collaborative labeling, allowing multiple users to work on the same project simultaneously.
- Extensible: Label Studio is built on a modular architecture, making it easy to extend with custom components and integrations.
- Open-Source: The project is open-source, allowing users to contribute to the codebase and benefit from the community's efforts.
Cons
- Learning Curve: Setting up and configuring Label Studio may require some technical expertise, especially for users unfamiliar with web development and data annotation.
- Limited Offline Support: While Label Studio can be self-hosted, it primarily operates as a web-based application, which may limit its usefulness in scenarios with limited internet connectivity.
- Performance Limitations: For large-scale data annotation projects, the performance of Label Studio may be a concern, as it relies on a web-based interface and may not scale as efficiently as desktop-based tools.
- Maintenance Overhead: As an open-source project, Label Studio requires ongoing maintenance and updates, which may be a burden for some users or organizations.
Getting Started
To get started with Label Studio, follow these steps:
-
Install the required dependencies:
pip install label-studio
-
Initialize a new Label Studio project:
label-studio init my_project
-
Start the Label Studio server:
cd my_project label-studio start
-
Open the Label Studio web interface in your browser at
http://localhost:8080
. -
Create a new project and configure the data sources, annotation tasks, and other settings as needed.
-
Invite collaborators to the project and start annotating your data.
For more detailed instructions and advanced configuration options, please refer to the Label Studio documentation.
Competitor Comparisons
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Pros of label-studio
- Comprehensive data labeling platform with support for various data types
- Active development and regular updates
- Extensive documentation and community support
Cons of label-studio
- Larger codebase, potentially more complex to set up and maintain
- May have more features than needed for simple labeling tasks
Code comparison
label-studio:
from label_studio_sdk import Client
ls = Client(url='http://localhost:8080', api_key='your-api-key')
project = ls.start_project(
title='My Project',
label_config='<View><Text name="text" value="$text"/></View>'
)
As both repositories refer to the same project, there is no code comparison to be made. The code snippet above demonstrates how to use the label-studio SDK to create a new project.
Summary
label-studio is a comprehensive data labeling platform that supports various data types and offers extensive documentation. It's actively developed and has strong community support. However, its larger codebase may be more complex to set up and maintain, and it might offer more features than necessary for simple labeling tasks. The provided code example shows how to create a new project using the label-studio SDK.
Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
Pros of VoTT
- Specialized for video and image annotation tasks
- Offers offline functionality, allowing work without an internet connection
- Provides a more streamlined interface for specific annotation types
Cons of VoTT
- Limited to visual data annotation, lacking support for text or audio
- Less flexible configuration options compared to Label Studio
- Fewer export formats and integration options
Code Comparison
VoTT (TypeScript):
export interface ITag {
name: string;
color: string;
}
export interface IRegion {
id: string;
type: RegionType;
tags: string[];
boundingBox: IBoundingBox;
}
Label Studio (Python):
class Label(models.Model):
created_at = models.DateTimeField(_('created at'), auto_now_add=True)
updated_at = models.DateTimeField(_('updated at'), auto_now=True)
project = models.ForeignKey('projects.Project', related_name='labels', on_delete=models.CASCADE)
organization = models.ForeignKey('organizations.Organization', on_delete=models.CASCADE)
Both repositories offer data annotation tools, but they cater to different use cases. VoTT focuses on visual data annotation with a streamlined interface and offline capabilities. Label Studio provides a more versatile platform supporting various data types and offering extensive configuration options. The code snippets highlight the different approaches: VoTT uses TypeScript and focuses on defining interfaces for tags and regions, while Label Studio uses Python with Django models for a more comprehensive data structure.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Pros of CVAT
- More advanced annotation tools for complex tasks like video segmentation
- Supports a wider range of annotation types, including 3D point cloud annotations
- Better suited for large-scale, enterprise-level annotation projects
Cons of CVAT
- Steeper learning curve and more complex setup process
- Less intuitive user interface for beginners
- Requires more system resources to run effectively
Code Comparison
Label Studio configuration example:
<View>
<Image name="image" value="$image"/>
<RectangleLabels name="label" toName="image">
<Label value="Car"/>
<Label value="Pedestrian"/>
</RectangleLabels>
</View>
CVAT task creation example:
task = Task.objects.create(
name="Vehicle Detection",
overlap=0,
segment_size=100,
project=project,
owner=user,
assignee=user
)
Both projects are open-source data annotation tools, but they cater to different use cases. Label Studio is more user-friendly and suitable for a wide range of annotation tasks, while CVAT offers more advanced features for complex annotation projects, particularly in computer vision domains.
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
Pros of labelme
- Lightweight and easy to install with minimal dependencies
- Supports polygon, rectangle, circle, line, and point annotations
- Allows for custom label categories and attributes
Cons of labelme
- Limited support for complex annotation tasks and workflows
- Less extensive documentation and community support
- Fewer built-in integrations with machine learning frameworks
Code Comparison
labelme:
from labelme import utils
img = utils.img_data_to_arr(img_data)
label_name_to_value = {'_background_': 0, 'person': 1, 'car': 2}
lbl = utils.shapes_to_label(img.shape, shapes, label_name_to_value)
Label Studio:
from label_studio_sdk import Client
ls = Client(url='http://localhost:8080', api_key='your-api-key')
project = ls.start_project(title='Image Classification')
ls.upload_data('image.jpg', 'Image')
label_config = '<View><Image name="image" value="$image"/><Choices name="choice" toName="image"><Choice value="cat"/><Choice value="dog"/></Choices></View>'
Label Studio offers a more comprehensive solution with a wider range of annotation types, project management features, and integration capabilities. It's better suited for large-scale, collaborative projects. labelme, on the other hand, is simpler and more lightweight, making it a good choice for smaller projects or quick annotations. The code comparison shows that labelme focuses on image processing and shape annotations, while Label Studio provides a higher-level API for project management and data labeling.
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
Pros of labelImg
- Lightweight and focused specifically on image annotation
- Simple interface for quick labeling of rectangular bounding boxes
- Supports multiple image formats (JPEG, PNG, BMP, etc.)
Cons of labelImg
- Limited to bounding box annotations only
- Lacks advanced project management and collaboration features
- No built-in support for machine learning model integration
Code Comparison
labelImg:
def saveFile(self, _value=False):
if self.filePath:
try:
self.saveLabels(self.filePath)
self.setClean()
return True
except:
self.errorMessage(u'Error saving file', u'Error saving labels to file')
return False
label-studio:
def save_annotation(self, annotation):
self.annotations.append(annotation)
self.save()
return {'id': annotation['id']}
Summary
labelImg is a lightweight tool focused on bounding box annotations for images, while label-studio is a more comprehensive platform supporting various data types and annotation tasks. labelImg offers simplicity and ease of use for specific image labeling needs, but lacks the advanced features and flexibility provided by label-studio's broader ecosystem.
Open source annotation tool for machine learning practitioners.
Pros of doccano
- Lightweight and simpler to set up and use
- Supports a wider range of annotation types out-of-the-box
- More focused on text annotation tasks
Cons of doccano
- Less extensive customization options
- Fewer integrations with external tools and services
- Limited support for image and audio annotation
Code Comparison
doccano:
from doccano_api_client import DoccanoClient
client = DoccanoClient('http://localhost:8000')
client.login(username='admin', password='password')
project = client.get_project(project_id=1)
Label Studio:
from label_studio_sdk import Client
ls = Client(url='http://localhost:8080', api_key='your-api-key')
project = ls.get_project(1)
tasks = project.get_tasks()
Both repositories provide Python clients for interacting with their respective APIs. doccano's client is more straightforward, while Label Studio's SDK offers more extensive functionality and integration options.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Website ⢠Docs ⢠Twitter ⢠Join Slack Community
What is Label Studio?
Label Studio is an open source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models.
- Try out Label Studio
- What you get from Label Studio
- Included templates for labeling data in Label Studio
- Set up machine learning models with Label Studio
- Integrate Label Studio with your existing tools
Have a custom dataset? You can customize Label Studio to fit your needs. Read an introductory blog post to learn more.
Try out Label Studio
Install Label Studio locally, or deploy it in a cloud instance. Or, sign up for a free trial of our Enterprise edition..
- Install locally with Docker
- Run with Docker Compose (Label Studio + Nginx + PostgreSQL)
- Install locally with pip
- Install locally with poetry
- Install locally with Anaconda
- Install for local development
- Deploy in a cloud instance
Install locally with Docker
Official Label Studio docker image is here and it can be downloaded with docker pull
.
Run Label Studio in a Docker container and access it at http://localhost:8080
.
docker pull heartexlabs/label-studio:latest
docker run -it -p 8080:8080 -v $(pwd)/mydata:/label-studio/data heartexlabs/label-studio:latest
You can find all the generated assets, including SQLite3 database storage label_studio.sqlite3
and uploaded files, in the ./mydata
directory.
Override default Docker install
You can override the default launch command by appending the new arguments:
docker run -it -p 8080:8080 -v $(pwd)/mydata:/label-studio/data heartexlabs/label-studio:latest label-studio --log-level DEBUG
Build a local image with Docker
If you want to build a local image, run:
docker build -t heartexlabs/label-studio:latest .
Run with Docker Compose
Docker Compose script provides production-ready stack consisting of the following components:
- Label Studio
- Nginx - proxy web server used to load various static data, including uploaded audio, images, etc.
- PostgreSQL - production-ready database that replaces less performant SQLite3.
To start using the app from http://localhost
run this command:
docker-compose up
Run with Docker Compose + MinIO
You can also run it with an additional MinIO server for local S3 storage. This is particularly useful when you want to test the behavior with S3 storage on your local system. To start Label Studio in this way, you need to run the following command:
# Add sudo on Linux if you are not a member of the docker group
docker compose -f docker-compose.yml -f docker-compose.minio.yml up -d
If you do not have a static IP address, you must create an entry in your hosts file so that both Label Studio and your browser can access the MinIO server. For more detailed instructions, please refer to our guide on storing data.
Install locally with pip
# Requires Python >=3.8
pip install label-studio
# Start the server at http://localhost:8080
label-studio
Install locally with poetry
### install poetry
pip install poetry
### set poetry environment
poetry new my-label-studio
cd my-label-studio
poetry add label-studio
### activate poetry environment
poetry shell
### Start the server at http://localhost:8080
label-studio
Install locally with Anaconda
conda create --name label-studio
conda activate label-studio
conda install psycopg2
pip install label-studio
Install for local development
You can run the latest Label Studio version locally without installing the package from pypi.
# Install all package dependencies
pip install poetry
poetry install
# Run database migrations
python label_studio/manage.py migrate
python label_studio/manage.py collectstatic
# Start the server in development mode at http://localhost:8080
python label_studio/manage.py runserver
Deploy in a cloud instance
You can deploy Label Studio with one click in Heroku, Microsoft Azure, or Google Cloud Platform:
Apply frontend changes
For information about updating the frontend, see label-studio/web/README.md.
Install dependencies on Windows
To run Label Studio on Windows, download and install the following wheel packages from Gohlke builds to ensure you're using the correct version of Python:
# Upgrade pip
pip install -U pip
# If you're running Win64 with Python 3.8, install the packages downloaded from Gohlke:
pip install lxmlâ4.5.0âcp38âcp38âwin_amd64.whl
# Install label studio
pip install label-studio
Run test suite
To add the tests' dependencies to your local install:
poetry install --with test
Alternatively, it is possible to run the unit tests from a Docker container in which the test dependencies are installed:
make build-testing-image
make docker-testing-shell
In either case, to run the unit tests:
cd label_studio
# sqlite3
DJANGO_DB=sqlite DJANGO_SETTINGS_MODULE=core.settings.label_studio pytest -vv
# postgres (assumes default postgres user,db,pass. Will not work in Docker
# testing container without additional configuration)
DJANGO_DB=default DJANGO_SETTINGS_MODULE=core.settings.label_studio pytest -vv
What you get from Label Studio
- Multi-user labeling sign up and login, when you create an annotation it's tied to your account.
- Multiple projects to work on all your datasets in one instance.
- Streamlined design helps you focus on your task, not how to use the software.
- Configurable label formats let you customize the visual interface to meet your specific labeling needs.
- Support for multiple data types including images, audio, text, HTML, time-series, and video.
- Import from files or from cloud storage in Amazon AWS S3, Google Cloud Storage, or JSON, CSV, TSV, RAR, and ZIP archives.
- Integration with machine learning models so that you can visualize and compare predictions from different models and perform pre-labeling.
- Embed it in your data pipeline REST API makes it easy to make it a part of your pipeline
Included templates for labeling data in Label Studio
Label Studio includes a variety of templates to help you label your data, or you can create your own using specifically designed configuration language. The most common templates and use cases for labeling include the following cases:
Set up machine learning models with Label Studio
Connect your favorite machine learning model using the Label Studio Machine Learning SDK. Follow these steps:
- Start your own machine learning backend server. See more detailed instructions.
- Connect Label Studio to the server on the model page found in project settings.
This lets you:
- Pre-label your data using model predictions.
- Do online learning and retrain your model while new annotations are being created.
- Do active learning by labeling only the most complex examples in your data.
Integrate Label Studio with your existing tools
You can use Label Studio as an independent part of your machine learning workflow or integrate the frontend or backend into your existing tools.
Ecosystem
Project | Description |
---|---|
label-studio | Server, distributed as a pip package |
Frontend library | The Label Studio frontend library. This uses React to build the UI and mobx-state-tree for state management. |
Data Manager library | A library for the Data Manager, our data exploration tool. |
label-studio-converter | Encode labels in the format of your favorite machine learning library |
label-studio-transformers | Transformers library connected and configured for use with Label Studio |
Roadmap
Want to use The Coolest Feature X but Label Studio doesn't support it? Check out our public roadmap!
Citation
@misc{Label Studio,
title={{Label Studio}: Data labeling software},
url={https://github.com/heartexlabs/label-studio},
note={Open source software available from https://github.com/heartexlabs/label-studio},
author={
Maxim Tkachenko and
Mikhail Malyuk and
Andrey Holmanyuk and
Nikolai Liubimov},
year={2020-2022},
}
License
This software is licensed under the Apache 2.0 LICENSE © Heartex. 2020-2022
Top Related Projects
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
Open source annotation tool for machine learning practitioners.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot