semantic-segmentation

Nvidia Semantic Segmentation monorepo

1,808

388

1,808

View on GitHub

Top Related Projects

detectron2

33,428

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Mask_RCNN

25,251

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

models

77,618

Models and examples built with TensorFlow

mmsegmentation

9,261

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

vision

17,046

Datasets, Transforms and Models specific to Computer Vision

segmentation_models.pytorch

10,663

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.

Quick Overview

The NVIDIA/semantic-segmentation repository is a collection of deep learning models and tools for performing semantic segmentation, which is the task of assigning a semantic label (e.g., person, car, road) to each pixel in an image. The repository includes pre-trained models, training scripts, and evaluation tools, making it a valuable resource for researchers and developers working on semantic segmentation problems.

Pros

Comprehensive Collection of Models: The repository includes a wide range of state-of-the-art semantic segmentation models, including popular architectures like UNet, DeepLabV3+, and PSPNet, allowing users to experiment with different approaches.
Pre-trained Models: The repository provides pre-trained models for various datasets, such as Cityscapes, ADE20K, and COCO, which can be used as a starting point for fine-tuning or transfer learning.
Modular and Extensible: The codebase is designed to be modular and extensible, making it easy for users to integrate their own models or datasets into the framework.
Detailed Documentation: The repository includes comprehensive documentation, including installation instructions, model descriptions, and usage examples, making it accessible for both beginners and experienced users.

Cons

Limited Support for Newer Architectures: While the repository includes a wide range of models, it may not always be up-to-date with the latest state-of-the-art architectures, which could be a limitation for researchers working on the cutting edge of the field.
Potential Compatibility Issues: As the repository is maintained by NVIDIA, there may be some compatibility issues when using the code on non-NVIDIA hardware or with different software versions.
Limited Support for Deployment: The repository is primarily focused on model training and evaluation, and may not provide comprehensive support for deployment and inference in production environments.
Potential Licensing Restrictions: The repository includes models and datasets that may have their own licensing requirements, which users should be aware of before using the code in their own projects.

Code Examples

Here are a few code examples from the NVIDIA/semantic-segmentation repository:

Loading a Pre-trained Model:

from segmentation_models_pytorch import UnetPlusPlus
from segmentation_models_pytorch.encoders import get_encoder

encoder = get_encoder('resnet34', pretrained=True)
model = UnetPlusPlus(encoder, classes=19, activation='softmax')

This code demonstrates how to load a pre-trained UNet++ model from the segmentation_models_pytorch library, which is included in the NVIDIA/semantic-segmentation repository.

Training a Model:

from segmentation_models_pytorch.losses import DiceLoss
from segmentation_models_pytorch.metrics import IoU

model = UnetPlusPlus(encoder, classes=19, activation='softmax')
criterion = DiceLoss()
metrics = [IoU(threshold=0.5)]

# Training loop
for epoch in range(num_epochs):
    train_loss = train_step(model, train_loader, criterion, optimizer)
    val_loss, val_iou = val_step(model, val_loader, criterion, metrics)

This code snippet shows how to set up the training process for a semantic segmentation model, including defining the loss function, metrics, and the training and validation loops.

Evaluating a Model:

from segmentation_models_pytorch.metrics import IoU

model = UnetPlusPlus(encoder, classes=19, activation='softmax')
metrics = [IoU(threshold=0.5)]

# Evaluation loop
for batch in test_loader:
    images, masks = batch
    preds = model(images)
    iou = metrics[0](preds, masks)
    # Calculate other metrics

This code demonstrates how to use the IoU metric from the segmentation_models_pytorch library to evaluate the performance of a semantic segmentation model on a test dataset.

Getting Started

To get started with the NVIDIA/semantic-segmentation repository, follow these steps:

Clone the repository:

git clone https://github.com/NVIDIA/semantic-segmentation.git

Install the required dependencies:

cd semantic-segmentation
pip

Competitor Comparisons

detectron2

33,428

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

More comprehensive, supporting object detection, instance segmentation, and other tasks beyond semantic segmentation
Modular design allows for easier customization and extension of models
Extensive documentation and community support

Cons of Detectron2

Steeper learning curve due to its broader scope and more complex architecture
May be overkill for projects focused solely on semantic segmentation
Potentially higher computational requirements for training and inference

Code Comparison

Semantic-segmentation (NVIDIA):

from network import DRNSeg
model = DRNSeg(classes=19, pretrained=True)
output = model(input_image)

Detectron2 (Facebook):

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("path/to/config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(input_image)

Both repositories offer powerful tools for computer vision tasks, with Semantic-segmentation focusing specifically on semantic segmentation, while Detectron2 provides a more versatile framework for various object detection and segmentation tasks. The choice between them depends on the specific requirements of your project and the level of flexibility needed.

Mask_RCNN

25,251

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

Provides instance segmentation in addition to semantic segmentation
Easier to use and implement for beginners
More extensive documentation and community support

Cons of Mask_RCNN

Generally slower inference time compared to semantic-segmentation
Less optimized for real-time applications

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib

model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)

semantic-segmentation:

import kaolin as kal

model = kal.models.SegNet(num_classes=num_classes)
model.load_state_dict(torch.load('segnet.pth'))
output = model(input_tensor)

The code snippets demonstrate the difference in model initialization and inference. Mask_RCNN uses a more straightforward approach, while semantic-segmentation requires more setup but offers more flexibility for advanced users.

models

77,618

Models and examples built with TensorFlow

Pros of models

Broader scope, covering various machine learning tasks beyond just semantic segmentation
Larger community and more frequent updates
Official TensorFlow repository, ensuring compatibility and best practices

Cons of models

Less specialized for semantic segmentation tasks
Potentially more complex to navigate due to its broader scope
May require more setup and configuration for specific use cases

Code Comparison

models:

import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils

model = tf.saved_model.load('path/to/saved_model')
category_index = label_map_util.create_category_index_from_labelmap('path/to/labelmap.pbtxt')

semantic-segmentation:

import torch
from models.model_factory import create_model

model = create_model(arch='deeplab', num_classes=21, pretrained='imagenet')
model.eval()

The models repository provides a more general-purpose approach using TensorFlow, while semantic-segmentation offers a PyTorch-based solution specifically tailored for semantic segmentation tasks. The code snippets demonstrate the different initialization processes, reflecting the repositories' focuses and underlying frameworks.

mmsegmentation

9,261

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Pros of mmsegmentation

More comprehensive model zoo with a wider variety of architectures
Flexible and modular design, allowing easier customization and extension
Better documentation and community support

Cons of mmsegmentation

Steeper learning curve due to its more complex structure
May have slightly higher overhead for simple use cases

Code Comparison

mmsegmentation:

from mmseg.apis import inference_segmentor, init_segmentor

config_file = 'configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py'
checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

model = init_segmentor(config_file, checkpoint_file, device='cuda:0')
result = inference_segmentor(model, img)

semantic-segmentation:

from nvidia.segmentation.pipeline import Pipeline

pipeline = Pipeline(config="configs/cityscapes_pspnet.py")
pipeline.load_checkpoint("checkpoints/cityscapes_pspnet.pth")
result = pipeline.predict(img)

Both repositories provide efficient implementations for semantic segmentation tasks, but mmsegmentation offers more flexibility and a broader range of models at the cost of increased complexity. semantic-segmentation provides a more straightforward API for quick deployment but with fewer options for customization.

vision

17,046

Datasets, Transforms and Models specific to Computer Vision

Pros of vision

Broader scope, covering various computer vision tasks beyond just semantic segmentation
More active development and larger community support
Extensive documentation and tutorials

Cons of vision

Less specialized for semantic segmentation tasks
May require more setup and configuration for specific use cases

Code comparison

semantic-segmentation:

from nvidia.segmentation import UNet

model = UNet(num_classes=21)
output = model(input_image)

vision:

from torchvision.models.segmentation import fcn_resnet50

model = fcn_resnet50(pretrained=True)
output = model(input_image)['out']

Key differences

semantic-segmentation focuses specifically on semantic segmentation, offering optimized models for this task
vision provides a wider range of pre-trained models and utilities for various computer vision tasks
semantic-segmentation may offer better performance for NVIDIA GPUs, while vision is more hardware-agnostic
vision integrates seamlessly with the PyTorch ecosystem, making it easier to use with other PyTorch-based projects

segmentation_models.pytorch

10,663

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.

Pros of segmentation_models.pytorch

Offers a wide range of pre-trained encoder architectures
Provides easy-to-use high-level API for model creation
Supports various loss functions and metrics out-of-the-box

Cons of segmentation_models.pytorch

Limited to PyTorch framework
Fewer optimization techniques compared to semantic-segmentation
May have less focus on NVIDIA-specific hardware optimizations

Code Comparison

segmentation_models.pytorch:

import segmentation_models_pytorch as smp

model = smp.Unet(
    encoder_name="resnet34",
    encoder_weights="imagenet",
    classes=1,
    activation="sigmoid",
)

semantic-segmentation:

from model import PSPNet

model = PSPNet(
    layers=50,
    bins=(1, 2, 3, 6),
    dropout=0.1,
    classes=21,
    zoom_factor=8,
    use_ppm=True,
    pretrained=True
)

Both repositories provide implementations for semantic segmentation tasks, but they differ in their approach and focus. segmentation_models.pytorch offers a more flexible and user-friendly API with various pre-trained encoders, while semantic-segmentation may provide better performance optimizations for NVIDIA hardware.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Paper | YouTube | Cityscapes Score

Pytorch implementation of our paper Hierarchical Multi-Scale Attention for Semantic Segmentation.

Please refer to the sdcnet branch if you are looking for the code corresponding to Improving Semantic Segmentation via Video Prediction and Label Relaxation.

Installation

The code is tested with pytorch 1.3 and python 3.6
You can use ./Dockerfile to build an image.

Download Weights

Create a directory where you can keep large files. Ideally, not in this directory.

  > mkdir <large_asset_dir>

Update __C.ASSETS_PATH in config.py to point at that directory

__C.ASSETS_PATH=<large_asset_dir>
Download pretrained weights from google drive and put into <large_asset_dir>/seg_weights

Download/Prepare Data

If using Cityscapes, download Cityscapes data, then update config.py to set the path:

__C.DATASET.CITYSCAPES_DIR=<path_to_cityscapes>

Download Autolabelled-Data from google drive

If using Cityscapes Autolabelled Images, download Cityscapes data, then update config.py to set the path:

__C.DATASET.CITYSCAPES_CUSTOMCOARSE=<path_to_cityscapes>

If using Mapillary, download Mapillary data, then update config.py to set the path:

__C.DATASET.MAPILLARY_DIR=<path_to_mapillary>

Running the code

The instructions below make use of a tool called runx, which we find useful to help automate experiment running and summarization. For more information about this tool, please see runx. In general, you can either use the runx-style commandlines shown below. Or you can call python train.py <args ...> directly if you like.

Run inference on Cityscapes

Dry run:

> python -m runx.runx scripts/eval_cityscapes.yml -i -n

This will just print out the command but not run. It's a good way to inspect the commandline.

Real run:

> python -m runx.runx scripts/eval_cityscapes.yml -i

The reported IOU should be 86.92. This evaluates with scales of 0.5, 1.0. and 2.0. You will find evaluation results in ./logs/eval_cityscapes/...

Run inference on Mapillary

> python -m runx.runx scripts/eval_mapillary.yml -i

The reported IOU should be 61.05. Note that this must be run on a 32GB node and the use of 'O3' mode for amp is critical in order to avoid GPU out of memory. Results in logs/eval_mapillary/...

Dump images for Cityscapes

> python -m runx.runx scripts/dump_cityscapes.yml -i

This will dump network output and composited images from running evaluation with the Cityscapes validation set.

Run inference and dump images on a folder of images

> python -m runx.runx scripts/dump_folder.yml -i

You should end up seeing images that look like the following:

alt text

Train a model

Train cityscapes, using HRNet + OCR + multi-scale attention with fine data and mapillary-pretrained model

> python -m runx.runx scripts/train_cityscapes.yml -i

The first time this command is run, a centroid file has to be built for the dataset. It'll take about 10 minutes. The centroid file is used during training to know how to sample from the dataset in a class-uniform way.

This training run should deliver a model that achieves 84.7 IOU.

Train SOTA default train-val split

> python -m runx.runx  scripts/train_cityscapes_sota.yml -i

Again, use -n to do a dry run and just print out the command. This should result in a model with 86.8 IOU. If you run out of memory, try to lower the crop size or turn off rmi_loss.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot