Convert Figma logo to code with AI

NVIDIA logosemantic-segmentation

Nvidia Semantic Segmentation monorepo

1,785
389
1,785
88

Top Related Projects

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

24,600

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

77,006

Models and examples built with TensorFlow

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

16,111

Datasets, Transforms and Models specific to Computer Vision

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.

Quick Overview

The NVIDIA/semantic-segmentation repository is a collection of deep learning models and tools for performing semantic segmentation, which is the task of assigning a semantic label (e.g., person, car, road) to each pixel in an image. The repository includes pre-trained models, training scripts, and evaluation tools, making it a valuable resource for researchers and developers working on semantic segmentation problems.

Pros

  • Comprehensive Collection of Models: The repository includes a wide range of state-of-the-art semantic segmentation models, including popular architectures like UNet, DeepLabV3+, and PSPNet, allowing users to experiment with different approaches.
  • Pre-trained Models: The repository provides pre-trained models for various datasets, such as Cityscapes, ADE20K, and COCO, which can be used as a starting point for fine-tuning or transfer learning.
  • Modular and Extensible: The codebase is designed to be modular and extensible, making it easy for users to integrate their own models or datasets into the framework.
  • Detailed Documentation: The repository includes comprehensive documentation, including installation instructions, model descriptions, and usage examples, making it accessible for both beginners and experienced users.

Cons

  • Limited Support for Newer Architectures: While the repository includes a wide range of models, it may not always be up-to-date with the latest state-of-the-art architectures, which could be a limitation for researchers working on the cutting edge of the field.
  • Potential Compatibility Issues: As the repository is maintained by NVIDIA, there may be some compatibility issues when using the code on non-NVIDIA hardware or with different software versions.
  • Limited Support for Deployment: The repository is primarily focused on model training and evaluation, and may not provide comprehensive support for deployment and inference in production environments.
  • Potential Licensing Restrictions: The repository includes models and datasets that may have their own licensing requirements, which users should be aware of before using the code in their own projects.

Code Examples

Here are a few code examples from the NVIDIA/semantic-segmentation repository:

  1. Loading a Pre-trained Model:
from segmentation_models_pytorch import UnetPlusPlus
from segmentation_models_pytorch.encoders import get_encoder

encoder = get_encoder('resnet34', pretrained=True)
model = UnetPlusPlus(encoder, classes=19, activation='softmax')

This code demonstrates how to load a pre-trained UNet++ model from the segmentation_models_pytorch library, which is included in the NVIDIA/semantic-segmentation repository.

  1. Training a Model:
from segmentation_models_pytorch.losses import DiceLoss
from segmentation_models_pytorch.metrics import IoU

model = UnetPlusPlus(encoder, classes=19, activation='softmax')
criterion = DiceLoss()
metrics = [IoU(threshold=0.5)]

# Training loop
for epoch in range(num_epochs):
    train_loss = train_step(model, train_loader, criterion, optimizer)
    val_loss, val_iou = val_step(model, val_loader, criterion, metrics)

This code snippet shows how to set up the training process for a semantic segmentation model, including defining the loss function, metrics, and the training and validation loops.

  1. Evaluating a Model:
from segmentation_models_pytorch.metrics import IoU

model = UnetPlusPlus(encoder, classes=19, activation='softmax')
metrics = [IoU(threshold=0.5)]

# Evaluation loop
for batch in test_loader:
    images, masks = batch
    preds = model(images)
    iou = metrics[0](preds, masks)
    # Calculate other metrics

This code demonstrates how to use the IoU metric from the segmentation_models_pytorch library to evaluate the performance of a semantic segmentation model on a test dataset.

Getting Started

To get started with the NVIDIA/semantic-segmentation repository, follow these steps:

  1. Clone the repository:
git clone https://github.com/NVIDIA/semantic-segmentation.git
  1. Install the required dependencies:
cd semantic-segmentation
pip

Competitor Comparisons

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive, supporting object detection, instance segmentation, and other tasks beyond semantic segmentation
  • Modular design allows for easier customization and extension of models
  • Extensive documentation and community support

Cons of Detectron2

  • Steeper learning curve due to its broader scope and more complex architecture
  • May be overkill for projects focused solely on semantic segmentation
  • Potentially higher computational requirements for training and inference

Code Comparison

Semantic-segmentation (NVIDIA):

from network import DRNSeg
model = DRNSeg(classes=19, pretrained=True)
output = model(input_image)

Detectron2 (Facebook):

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("path/to/config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(input_image)

Both repositories offer powerful tools for computer vision tasks, with Semantic-segmentation focusing specifically on semantic segmentation, while Detectron2 provides a more versatile framework for various object detection and segmentation tasks. The choice between them depends on the specific requirements of your project and the level of flexibility needed.

24,600

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

  • Provides instance segmentation in addition to semantic segmentation
  • Easier to use and implement for beginners
  • More extensive documentation and community support

Cons of Mask_RCNN

  • Generally slower inference time compared to semantic-segmentation
  • Less optimized for real-time applications

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib

model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)

semantic-segmentation:

import kaolin as kal

model = kal.models.SegNet(num_classes=num_classes)
model.load_state_dict(torch.load('segnet.pth'))
output = model(input_tensor)

The code snippets demonstrate the difference in model initialization and inference. Mask_RCNN uses a more straightforward approach, while semantic-segmentation requires more setup but offers more flexibility for advanced users.

77,006

Models and examples built with TensorFlow

Pros of models

  • Broader scope, covering various machine learning tasks beyond just semantic segmentation
  • Larger community and more frequent updates
  • Official TensorFlow repository, ensuring compatibility and best practices

Cons of models

  • Less specialized for semantic segmentation tasks
  • Potentially more complex to navigate due to its broader scope
  • May require more setup and configuration for specific use cases

Code Comparison

models:

import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils

model = tf.saved_model.load('path/to/saved_model')
category_index = label_map_util.create_category_index_from_labelmap('path/to/labelmap.pbtxt')

semantic-segmentation:

import torch
from models.model_factory import create_model

model = create_model(arch='deeplab', num_classes=21, pretrained='imagenet')
model.eval()

The models repository provides a more general-purpose approach using TensorFlow, while semantic-segmentation offers a PyTorch-based solution specifically tailored for semantic segmentation tasks. The code snippets demonstrate the different initialization processes, reflecting the repositories' focuses and underlying frameworks.

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Pros of mmsegmentation

  • More comprehensive model zoo with a wider variety of architectures
  • Flexible and modular design, allowing easier customization and extension
  • Better documentation and community support

Cons of mmsegmentation

  • Steeper learning curve due to its more complex structure
  • May have slightly higher overhead for simple use cases

Code Comparison

mmsegmentation:

from mmseg.apis import inference_segmentor, init_segmentor

config_file = 'configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py'
checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'

model = init_segmentor(config_file, checkpoint_file, device='cuda:0')
result = inference_segmentor(model, img)

semantic-segmentation:

from nvidia.segmentation.pipeline import Pipeline

pipeline = Pipeline(config="configs/cityscapes_pspnet.py")
pipeline.load_checkpoint("checkpoints/cityscapes_pspnet.pth")
result = pipeline.predict(img)

Both repositories provide efficient implementations for semantic segmentation tasks, but mmsegmentation offers more flexibility and a broader range of models at the cost of increased complexity. semantic-segmentation provides a more straightforward API for quick deployment but with fewer options for customization.

16,111

Datasets, Transforms and Models specific to Computer Vision

Pros of vision

  • Broader scope, covering various computer vision tasks beyond just semantic segmentation
  • More active development and larger community support
  • Extensive documentation and tutorials

Cons of vision

  • Less specialized for semantic segmentation tasks
  • May require more setup and configuration for specific use cases

Code comparison

semantic-segmentation:

from nvidia.segmentation import UNet

model = UNet(num_classes=21)
output = model(input_image)

vision:

from torchvision.models.segmentation import fcn_resnet50

model = fcn_resnet50(pretrained=True)
output = model(input_image)['out']

Key differences

  • semantic-segmentation focuses specifically on semantic segmentation, offering optimized models for this task
  • vision provides a wider range of pre-trained models and utilities for various computer vision tasks
  • semantic-segmentation may offer better performance for NVIDIA GPUs, while vision is more hardware-agnostic
  • vision integrates seamlessly with the PyTorch ecosystem, making it easier to use with other PyTorch-based projects

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.

Pros of segmentation_models.pytorch

  • Offers a wide range of pre-trained encoder architectures
  • Provides easy-to-use high-level API for model creation
  • Supports various loss functions and metrics out-of-the-box

Cons of segmentation_models.pytorch

  • Limited to PyTorch framework
  • Fewer optimization techniques compared to semantic-segmentation
  • May have less focus on NVIDIA-specific hardware optimizations

Code Comparison

segmentation_models.pytorch:

import segmentation_models_pytorch as smp

model = smp.Unet(
    encoder_name="resnet34",
    encoder_weights="imagenet",
    classes=1,
    activation="sigmoid",
)

semantic-segmentation:

from model import PSPNet

model = PSPNet(
    layers=50,
    bins=(1, 2, 3, 6),
    dropout=0.1,
    classes=21,
    zoom_factor=8,
    use_ppm=True,
    pretrained=True
)

Both repositories provide implementations for semantic segmentation tasks, but they differ in their approach and focus. segmentation_models.pytorch offers a more flexible and user-friendly API with various pre-trained encoders, while semantic-segmentation may provide better performance optimizations for NVIDIA hardware.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Paper | YouTube | Cityscapes Score

Pytorch implementation of our paper Hierarchical Multi-Scale Attention for Semantic Segmentation.

Please refer to the sdcnet branch if you are looking for the code corresponding to Improving Semantic Segmentation via Video Prediction and Label Relaxation.

Installation

  • The code is tested with pytorch 1.3 and python 3.6
  • You can use ./Dockerfile to build an image.

Download Weights

  • Create a directory where you can keep large files. Ideally, not in this directory.
  > mkdir <large_asset_dir>
  • Update __C.ASSETS_PATH in config.py to point at that directory

    __C.ASSETS_PATH=<large_asset_dir>

  • Download pretrained weights from google drive and put into <large_asset_dir>/seg_weights

Download/Prepare Data

If using Cityscapes, download Cityscapes data, then update config.py to set the path:

__C.DATASET.CITYSCAPES_DIR=<path_to_cityscapes>

If using Cityscapes Autolabelled Images, download Cityscapes data, then update config.py to set the path:

__C.DATASET.CITYSCAPES_CUSTOMCOARSE=<path_to_cityscapes>

If using Mapillary, download Mapillary data, then update config.py to set the path:

__C.DATASET.MAPILLARY_DIR=<path_to_mapillary>

Running the code

The instructions below make use of a tool called runx, which we find useful to help automate experiment running and summarization. For more information about this tool, please see runx. In general, you can either use the runx-style commandlines shown below. Or you can call python train.py <args ...> directly if you like.

Run inference on Cityscapes

Dry run:

> python -m runx.runx scripts/eval_cityscapes.yml -i -n

This will just print out the command but not run. It's a good way to inspect the commandline.

Real run:

> python -m runx.runx scripts/eval_cityscapes.yml -i

The reported IOU should be 86.92. This evaluates with scales of 0.5, 1.0. and 2.0. You will find evaluation results in ./logs/eval_cityscapes/...

Run inference on Mapillary

> python -m runx.runx scripts/eval_mapillary.yml -i

The reported IOU should be 61.05. Note that this must be run on a 32GB node and the use of 'O3' mode for amp is critical in order to avoid GPU out of memory. Results in logs/eval_mapillary/...

Dump images for Cityscapes

> python -m runx.runx scripts/dump_cityscapes.yml -i

This will dump network output and composited images from running evaluation with the Cityscapes validation set.

Run inference and dump images on a folder of images

> python -m runx.runx scripts/dump_folder.yml -i

You should end up seeing images that look like the following:

alt text

Train a model

Train cityscapes, using HRNet + OCR + multi-scale attention with fine data and mapillary-pretrained model

> python -m runx.runx scripts/train_cityscapes.yml -i

The first time this command is run, a centroid file has to be built for the dataset. It'll take about 10 minutes. The centroid file is used during training to know how to sample from the dataset in a class-uniform way.

This training run should deliver a model that achieves 84.7 IOU.

Train SOTA default train-val split

> python -m runx.runx  scripts/train_cityscapes_sota.yml -i

Again, use -n to do a dry run and just print out the command. This should result in a model with 86.8 IOU. If you run out of memory, try to lower the crop size or turn off rmi_loss.