deeplab2
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks.
Top Related Projects
Models and examples built with TensorFlow
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Nvidia Semantic Segmentation monorepo
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Quick Overview
DeepLab2 is an open-source library for deep learning-based semantic segmentation. It provides state-of-the-art models and tools for image segmentation tasks, focusing on high-quality and efficient implementations of various DeepLab architectures.
Pros
- Implements multiple advanced DeepLab architectures (e.g., DeepLabV3+, Panoptic-DeepLab)
- Supports various backbone networks and datasets
- Provides pre-trained models for quick deployment
- Offers flexibility for customization and experimentation
Cons
- Requires significant computational resources for training
- Limited documentation for advanced usage scenarios
- Steep learning curve for users new to semantic segmentation
- Dependency on specific versions of TensorFlow and other libraries
Code Examples
- Loading a pre-trained model:
import tensorflow as tf
from deeplab2 import common
from deeplab2.model import deeplab
model = deeplab.DeepLab(common.ModelOptions(
model_variant='deeplabv3plus_mobilenetv3_large',
num_classes=21,
crop_size=[513, 513],
backbone=common.BackboneOptions(output_stride=16),
))
checkpoint = tf.train.Checkpoint(model=model)
checkpoint.restore('/path/to/checkpoint').expect_partial()
- Performing inference on an image:
import numpy as np
from PIL import Image
input_image = np.array(Image.open('image.jpg').resize((513, 513)))
input_tensor = tf.convert_to_tensor(input_image[None, ...], dtype=tf.uint8)
result = model(input_tensor, training=False)
segmentation_map = tf.argmax(result['semantic_logits'], axis=-1)
- Fine-tuning the model on custom data:
import tensorflow_datasets as tfds
dataset = tfds.load('your_custom_dataset', split='train')
dataset = dataset.map(preprocess_function).batch(batch_size)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
@tf.function
def train_step(images, labels):
with tf.GradientTape() as tape:
predictions = model(images, training=True)['semantic_logits']
loss = loss_fn(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
return loss
for epoch in range(num_epochs):
for images, labels in dataset:
loss = train_step(images, labels)
Getting Started
- Install DeepLab2:
git clone https://github.com/google-research/deeplab2.git
cd deeplab2
pip install -r requirements.txt
- Download pre-trained models:
python deeplab2/model/download_deeplab_model.py --model_variant deeplabv3plus_mobilenetv3_large
- Run inference on an image:
from deeplab2 import common
from deeplab2.model import deeplab
import tensorflow as tf
import numpy as np
from PIL import Image
model = deeplab.DeepLab(common.ModelOptions(...)) # Configure model options
checkpoint = tf.train.Checkpoint(model=model)
checkpoint.restore('/path/to/checkpoint').expect_partial()
image = np.array(Image.open('image.jpg').resize((513, 513)))
result = model(tf.convert_to_tensor(image[None, ...], dtype=tf.uint8), training=False)
segmentation_map = tf.argmax(result['semantic_logits'], axis=-1)
Competitor Comparisons
Models and examples built with TensorFlow
Pros of TensorFlow Models
- Broader scope, covering a wide range of machine learning tasks and models
- More extensive documentation and tutorials for various use cases
- Larger community and more frequent updates
Cons of TensorFlow Models
- Less specialized for semantic segmentation tasks
- May require more setup and configuration for specific use cases
- Potentially steeper learning curve due to the breadth of content
Code Comparison
DeepLab2:
model = deeplab2.DeepLab(num_classes=21, backbone='resnet50')
outputs = model(inputs)
TensorFlow Models:
model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
x = tf.keras.layers.GlobalAveragePooling2D()(model.output)
outputs = tf.keras.layers.Dense(21, activation='softmax')(x)
Summary
DeepLab2 is focused on semantic segmentation, offering specialized tools and models for this task. TensorFlow Models provides a broader range of machine learning models and applications, making it more versatile but potentially less optimized for specific tasks like semantic segmentation. DeepLab2 may be easier to use for segmentation tasks, while TensorFlow Models offers more flexibility for various machine learning projects.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- More comprehensive object detection and segmentation framework
- Extensive model zoo with pre-trained models
- Better documentation and community support
Cons of Detectron2
- Steeper learning curve for beginners
- Less focused on semantic segmentation compared to DeepLab2
Code Comparison
DeepLab2:
model = deeplab2.DeepLab(num_classes=21, backbone='resnet101')
outputs = model(inputs)
loss = criterion(outputs, targets)
Detectron2:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
model = build_model(cfg)
loss_dict = model(inputs)
Summary
DeepLab2 focuses primarily on semantic segmentation, while Detectron2 offers a broader range of computer vision tasks. Detectron2 provides more pre-trained models and better documentation, but may be more complex for beginners. DeepLab2 is more specialized for semantic segmentation tasks. The code comparison shows that Detectron2 requires more configuration setup, while DeepLab2 has a simpler API for model initialization and training.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Pros of Mask_RCNN
- Easier to use and implement, with a more user-friendly API
- Better documentation and community support
- More suitable for general object detection and instance segmentation tasks
Cons of Mask_RCNN
- Less advanced in semantic segmentation compared to DeepLab2
- May have lower performance on complex scenes with multiple overlapping objects
- Limited in handling high-resolution images efficiently
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(WEIGHTS_PATH, by_name=True)
results = model.detect([image], verbose=1)
DeepLab2:
import deeplab2.model.deeplab as deeplab
model = deeplab.DeepLab(num_classes=21, backbone='resnet_v1_101')
model.load_weights(WEIGHTS_PATH)
outputs = model(inputs)
Both repositories offer powerful tools for image segmentation, but Mask_RCNN is more accessible for general object detection tasks, while DeepLab2 excels in advanced semantic segmentation scenarios. The code snippets demonstrate the difference in API complexity, with Mask_RCNN offering a more straightforward implementation.
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Pros of mmsegmentation
- More comprehensive model zoo with a wider variety of architectures
- Easier to use and more flexible configuration system
- Better documentation and community support
Cons of mmsegmentation
- Slightly slower inference speed for some models
- Less focus on state-of-the-art architectures like MaX-DeepLab
Code Comparison
mmsegmentation:
from mmseg.apis import inference_segmentor, init_segmentor
config_file = 'configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py'
checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
model = init_segmentor(config_file, checkpoint_file, device='cuda:0')
result = inference_segmentor(model, img)
deeplab2:
import tensorflow as tf
from deeplab2 import common
from deeplab2.model import deeplab
model = deeplab.DeepLab(common.ModelOptions())
inputs = tf.keras.Input(shape=[None, None, 3])
outputs = model(inputs)
Nvidia Semantic Segmentation monorepo
Pros of semantic-segmentation
- Optimized for NVIDIA GPUs, potentially offering better performance on compatible hardware
- Includes TensorRT integration for faster inference
- Provides pre-trained models for quick deployment
Cons of semantic-segmentation
- Limited to specific architectures (primarily HRNet and DeepLabV3+)
- Less flexibility in terms of model customization compared to DeepLab2
- Smaller community and fewer updates
Code Comparison
DeepLab2:
model = deeplab2.DeepLab(
config=config,
dataset_descriptor=dataset,
)
outputs = model(inputs)
semantic-segmentation:
model = network.hrnet(
config=config,
pretrained=True
)
outputs = model(inputs)
Both repositories focus on semantic segmentation, but DeepLab2 offers a more comprehensive and flexible framework with support for various architectures and tasks. semantic-segmentation is tailored for NVIDIA hardware and provides optimized performance for specific models.
DeepLab2 is more suitable for research and experimentation, while semantic-segmentation is better for production deployment on NVIDIA GPUs. The choice between the two depends on the specific use case, hardware availability, and required flexibility.
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Pros of Detectron
- More comprehensive object detection framework, supporting multiple tasks beyond semantic segmentation
- Larger community and more extensive documentation
- Includes pre-trained models for various tasks out-of-the-box
Cons of Detectron
- Less focused on semantic segmentation specifically
- May have a steeper learning curve for users primarily interested in segmentation tasks
- Older codebase with potential for deprecated components
Code Comparison
DeepLab2:
model = deeplab2.Model(num_classes=21)
outputs = model(inputs)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=outputs))
Detectron:
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
model = build_model(cfg)
outputs = model(inputs)
Both repositories offer powerful tools for computer vision tasks, with DeepLab2 focusing more on semantic segmentation and Detectron providing a broader range of object detection and instance segmentation capabilities. The choice between them depends on the specific requirements of your project and the level of customization needed.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DeepLab2: A TensorFlow Library for Deep Labeling
DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks, including, but not limited to semantic segmentation, instance segmentation, panoptic segmentation, depth estimation, or even video panoptic segmentation.
Deep labeling refers to solving computer vision problems by assigning a predicted value for each pixel in an image with a deep neural network. As long as the problem of interest could be formulated in this way, DeepLab2 should serve the purpose. Additionally, this codebase includes our recent and state-of-the-art research models on deep labeling. We hope you will find it useful for your projects.
Change logs
-
10/18/2022: Add kMaX-DeepLab ADE20K panoptic segmentation results in model zoo.
-
10/04/2022: Open-source MOAT model code and ImageNet pretrained weights. We thank Chenglin Yang for their valuable contributions.
-
08/26/2022: Add ViP-DeepLab support for Waymo Open Dataset: Panoramic Video Panoptic Segmentation. We thank Jieru Mei, Alex Zhu, Xinchen Yan, and Hang Yan, for their valuable contributions.
-
08/16/2022: Support Colab demo for kMaX-DeepLab.
-
07/12/2022: Open-source k-means Mask Transformer (kMaX-DeepLab) code and model zoo.
-
07/11/2022: Drop support of Tensorflow 2.5. Please update to 2.6.
-
04/27/2022: Add ViP-DeepLab demo and update ViP-DeepLab model zoo.
-
09/07/2021: Add numpy implementation of Segmentation and Tracking Quality. Find it here.
-
09/06/2021: Update Panoptic-DeepLab w/ MobileNetv3 backbone results on Cityscapes.
-
08/13/2021: Open-source MaX-DeepLab-L COCO checkpoints (51.3% PQ on COCO val set).
-
07/26/2021: Add ViP-DeepLab support for SemKITTI-DVPS.
-
07/07/2021: KITTI-STEP and MOTChallenge-STEP are ready to use.
-
06/07/2021: Add hungarian matching support on TPU for MaX-DeepLab, thanks to the help from Jiquan Ngiam and Amil Merchant.
-
06/01/2021: "Hello, World!", DeepLab2 made publicly available.
Installation
See Installation.
Dataset preparation
The dataset needs to be converted to TFRecord. We provide some examples below.
Some guidances about how to convert your own dataset.
Projects
We list a few projects that use DeepLab2.
Colab Demo
-
kMaX-DeepLab Colab notebook for off-the-shelf inference with COCO checkpoints.
-
Panoptic-DeepLab Colab notebook for off-the-shelf inference with Cityscapes checkpoints.
-
ViP-DeepLab Colab notebook for off-the-shelf inference with Cityscapes-DVPS checkpoints.
Note that the exported models used in all the demos are in CPU mode.
Running DeepLab2
See Getting Started. In short, run the following command:
To run DeepLab2 on GPUs, the following command should be used:
python trainer/train.py \
--config_file=${CONFIG_FILE} \
--mode={train | eval | train_and_eval | continuous_eval} \
--model_dir=${BASE_MODEL_DIRECTORY} \
--num_gpus=${NUM_GPUS}
Contacts (Maintainers)
Please check FAQ if you have some questions before
reporting the issues.
- Mark Weber, github: markweberdev
- Huiyu Wang, github: csrhddlam
- Siyuan Qiao, github: joe-siyuan-qiao
- Jun Xie, github: clairexie
- Maxwell D. Collins, github: mcollinswisc
- YuKun Zhu, github: yknzhu
- Liangzhe Yuan, github: yuanliangzhe
- Dahun Kim, github: mcahny
- Qihang Yu, github: yucornetto
- Liang-Chieh Chen, github: aquariusjay
Disclaimer
-
Note that this library contains our re-implemented DeepLab models in TensorFlow2, and thus may have some minor differences from the published papers (e.g., learning rate).
-
This is not an official Google product.
Citing DeepLab2
If you find DeepLab2 useful for your project, please consider citing
DeepLab2
along with the relevant DeepLab series.
- DeepLab2:
@article{deeplab2_2021,
author={Mark Weber and Huiyu Wang and Siyuan Qiao and Jun Xie and Maxwell D. Collins and Yukun Zhu and Liangzhe Yuan and Dahun Kim and Qihang Yu and Daniel Cremers and Laura Leal-Taixe and Alan L. Yuille and Florian Schroff and Hartwig Adam and Liang-Chieh Chen},
title={{DeepLab2: A TensorFlow Library for Deep Labeling}},
journal={arXiv: 2106.09748},
year={2021}
}
References
-
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. "The cityscapes dataset for semantic urban scene understanding." In CVPR, 2016.
-
Andreas Geiger, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." In CVPR, 2012.
-
Jens Behley, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. "Semantickitti: A dataset for semantic scene understanding of lidar sequences." In ICCV, 2019.
-
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollar. "Panoptic segmentation." In CVPR, 2019.
-
Dahun Kim, Sanghyun Woo, Joon-Young Lee, and In So Kweon. "Video panoptic segmentation." In CVPR, 2020.
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence Zitnick. "Microsoft COCO: Common objects in context." In ECCV, 2014.
-
Patrick Dendorfer, Aljosa Osep, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth, and Laura Leal-Taixe. "MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking." IJCV, 2020.
-
Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. "Scene Parsing through ADE20K Dataset." In CVPR, 2017.
Top Related Projects
Models and examples built with TensorFlow
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Nvidia Semantic Segmentation monorepo
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot