Top Related Projects
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Models and examples built with TensorFlow
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Datasets, Transforms and Models specific to Computer Vision
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Quick Overview
The NVIDIA/semantic-segmentation repository is a collection of deep learning models and tools for performing semantic segmentation, which is the task of assigning a semantic label (e.g., person, car, road) to each pixel in an image. The repository includes pre-trained models, training scripts, and evaluation tools, making it a valuable resource for researchers and developers working on semantic segmentation problems.
Pros
- Comprehensive Collection of Models: The repository includes a wide range of state-of-the-art semantic segmentation models, including popular architectures like UNet, DeepLabV3+, and PSPNet, allowing users to experiment with different approaches.
- Pre-trained Models: The repository provides pre-trained models for various datasets, such as Cityscapes, ADE20K, and COCO, which can be used as a starting point for fine-tuning or transfer learning.
- Modular and Extensible: The codebase is designed to be modular and extensible, making it easy for users to integrate their own models or datasets into the framework.
- Detailed Documentation: The repository includes comprehensive documentation, including installation instructions, model descriptions, and usage examples, making it accessible for both beginners and experienced users.
Cons
- Limited Support for Newer Architectures: While the repository includes a wide range of models, it may not always be up-to-date with the latest state-of-the-art architectures, which could be a limitation for researchers working on the cutting edge of the field.
- Potential Compatibility Issues: As the repository is maintained by NVIDIA, there may be some compatibility issues when using the code on non-NVIDIA hardware or with different software versions.
- Limited Support for Deployment: The repository is primarily focused on model training and evaluation, and may not provide comprehensive support for deployment and inference in production environments.
- Potential Licensing Restrictions: The repository includes models and datasets that may have their own licensing requirements, which users should be aware of before using the code in their own projects.
Code Examples
Here are a few code examples from the NVIDIA/semantic-segmentation repository:
- Loading a Pre-trained Model:
from segmentation_models_pytorch import UnetPlusPlus
from segmentation_models_pytorch.encoders import get_encoder
encoder = get_encoder('resnet34', pretrained=True)
model = UnetPlusPlus(encoder, classes=19, activation='softmax')
This code demonstrates how to load a pre-trained UNet++ model from the segmentation_models_pytorch
library, which is included in the NVIDIA/semantic-segmentation repository.
- Training a Model:
from segmentation_models_pytorch.losses import DiceLoss
from segmentation_models_pytorch.metrics import IoU
model = UnetPlusPlus(encoder, classes=19, activation='softmax')
criterion = DiceLoss()
metrics = [IoU(threshold=0.5)]
# Training loop
for epoch in range(num_epochs):
train_loss = train_step(model, train_loader, criterion, optimizer)
val_loss, val_iou = val_step(model, val_loader, criterion, metrics)
This code snippet shows how to set up the training process for a semantic segmentation model, including defining the loss function, metrics, and the training and validation loops.
- Evaluating a Model:
from segmentation_models_pytorch.metrics import IoU
model = UnetPlusPlus(encoder, classes=19, activation='softmax')
metrics = [IoU(threshold=0.5)]
# Evaluation loop
for batch in test_loader:
images, masks = batch
preds = model(images)
iou = metrics[0](preds, masks)
# Calculate other metrics
This code demonstrates how to use the IoU
metric from the segmentation_models_pytorch
library to evaluate the performance of a semantic segmentation model on a test dataset.
Getting Started
To get started with the NVIDIA/semantic-segmentation repository, follow these steps:
- Clone the repository:
git clone https://github.com/NVIDIA/semantic-segmentation.git
- Install the required dependencies:
cd semantic-segmentation
pip
Competitor Comparisons
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- More comprehensive, supporting object detection, instance segmentation, and other tasks beyond semantic segmentation
- Modular design allows for easier customization and extension of models
- Extensive documentation and community support
Cons of Detectron2
- Steeper learning curve due to its broader scope and more complex architecture
- May be overkill for projects focused solely on semantic segmentation
- Potentially higher computational requirements for training and inference
Code Comparison
Semantic-segmentation (NVIDIA):
from network import DRNSeg
model = DRNSeg(classes=19, pretrained=True)
output = model(input_image)
Detectron2 (Facebook):
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
cfg = get_cfg()
cfg.merge_from_file("path/to/config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(input_image)
Both repositories offer powerful tools for computer vision tasks, with Semantic-segmentation focusing specifically on semantic segmentation, while Detectron2 provides a more versatile framework for various object detection and segmentation tasks. The choice between them depends on the specific requirements of your project and the level of flexibility needed.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Pros of Mask_RCNN
- Provides instance segmentation in addition to semantic segmentation
- Easier to use and implement for beginners
- More extensive documentation and community support
Cons of Mask_RCNN
- Generally slower inference time compared to semantic-segmentation
- Less optimized for real-time applications
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)
semantic-segmentation:
import kaolin as kal
model = kal.models.SegNet(num_classes=num_classes)
model.load_state_dict(torch.load('segnet.pth'))
output = model(input_tensor)
The code snippets demonstrate the difference in model initialization and inference. Mask_RCNN uses a more straightforward approach, while semantic-segmentation requires more setup but offers more flexibility for advanced users.
Models and examples built with TensorFlow
Pros of models
- Broader scope, covering various machine learning tasks beyond just semantic segmentation
- Larger community and more frequent updates
- Official TensorFlow repository, ensuring compatibility and best practices
Cons of models
- Less specialized for semantic segmentation tasks
- Potentially more complex to navigate due to its broader scope
- May require more setup and configuration for specific use cases
Code Comparison
models:
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
model = tf.saved_model.load('path/to/saved_model')
category_index = label_map_util.create_category_index_from_labelmap('path/to/labelmap.pbtxt')
semantic-segmentation:
import torch
from models.model_factory import create_model
model = create_model(arch='deeplab', num_classes=21, pretrained='imagenet')
model.eval()
The models repository provides a more general-purpose approach using TensorFlow, while semantic-segmentation offers a PyTorch-based solution specifically tailored for semantic segmentation tasks. The code snippets demonstrate the different initialization processes, reflecting the repositories' focuses and underlying frameworks.
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Pros of mmsegmentation
- More comprehensive model zoo with a wider variety of architectures
- Flexible and modular design, allowing easier customization and extension
- Better documentation and community support
Cons of mmsegmentation
- Steeper learning curve due to its more complex structure
- May have slightly higher overhead for simple use cases
Code Comparison
mmsegmentation:
from mmseg.apis import inference_segmentor, init_segmentor
config_file = 'configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py'
checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth'
model = init_segmentor(config_file, checkpoint_file, device='cuda:0')
result = inference_segmentor(model, img)
semantic-segmentation:
from nvidia.segmentation.pipeline import Pipeline
pipeline = Pipeline(config="configs/cityscapes_pspnet.py")
pipeline.load_checkpoint("checkpoints/cityscapes_pspnet.pth")
result = pipeline.predict(img)
Both repositories provide efficient implementations for semantic segmentation tasks, but mmsegmentation offers more flexibility and a broader range of models at the cost of increased complexity. semantic-segmentation provides a more straightforward API for quick deployment but with fewer options for customization.
Datasets, Transforms and Models specific to Computer Vision
Pros of vision
- Broader scope, covering various computer vision tasks beyond just semantic segmentation
- More active development and larger community support
- Extensive documentation and tutorials
Cons of vision
- Less specialized for semantic segmentation tasks
- May require more setup and configuration for specific use cases
Code comparison
semantic-segmentation:
from nvidia.segmentation import UNet
model = UNet(num_classes=21)
output = model(input_image)
vision:
from torchvision.models.segmentation import fcn_resnet50
model = fcn_resnet50(pretrained=True)
output = model(input_image)['out']
Key differences
- semantic-segmentation focuses specifically on semantic segmentation, offering optimized models for this task
- vision provides a wider range of pre-trained models and utilities for various computer vision tasks
- semantic-segmentation may offer better performance for NVIDIA GPUs, while vision is more hardware-agnostic
- vision integrates seamlessly with the PyTorch ecosystem, making it easier to use with other PyTorch-based projects
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Pros of segmentation_models.pytorch
- Offers a wide range of pre-trained encoder architectures
- Provides easy-to-use high-level API for model creation
- Supports various loss functions and metrics out-of-the-box
Cons of segmentation_models.pytorch
- Limited to PyTorch framework
- Fewer optimization techniques compared to semantic-segmentation
- May have less focus on NVIDIA-specific hardware optimizations
Code Comparison
segmentation_models.pytorch:
import segmentation_models_pytorch as smp
model = smp.Unet(
encoder_name="resnet34",
encoder_weights="imagenet",
classes=1,
activation="sigmoid",
)
semantic-segmentation:
from model import PSPNet
model = PSPNet(
layers=50,
bins=(1, 2, 3, 6),
dropout=0.1,
classes=21,
zoom_factor=8,
use_ppm=True,
pretrained=True
)
Both repositories provide implementations for semantic segmentation tasks, but they differ in their approach and focus. segmentation_models.pytorch offers a more flexible and user-friendly API with various pre-trained encoders, while semantic-segmentation may provide better performance optimizations for NVIDIA hardware.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Paper | YouTube | Cityscapes Score
Pytorch implementation of our paper Hierarchical Multi-Scale Attention for Semantic Segmentation.
Please refer to the sdcnet
branch if you are looking for the code corresponding to Improving Semantic Segmentation via Video Prediction and Label Relaxation.
Installation
- The code is tested with pytorch 1.3 and python 3.6
- You can use ./Dockerfile to build an image.
Download Weights
- Create a directory where you can keep large files. Ideally, not in this directory.
> mkdir <large_asset_dir>
-
Update
__C.ASSETS_PATH
inconfig.py
to point at that directory__C.ASSETS_PATH=<large_asset_dir>
-
Download pretrained weights from google drive and put into
<large_asset_dir>/seg_weights
Download/Prepare Data
If using Cityscapes, download Cityscapes data, then update config.py
to set the path:
__C.DATASET.CITYSCAPES_DIR=<path_to_cityscapes>
- Download Autolabelled-Data from google drive
If using Cityscapes Autolabelled Images, download Cityscapes data, then update config.py
to set the path:
__C.DATASET.CITYSCAPES_CUSTOMCOARSE=<path_to_cityscapes>
If using Mapillary, download Mapillary data, then update config.py
to set the path:
__C.DATASET.MAPILLARY_DIR=<path_to_mapillary>
Running the code
The instructions below make use of a tool called runx
, which we find useful to help automate experiment running and summarization. For more information about this tool, please see runx.
In general, you can either use the runx-style commandlines shown below. Or you can call python train.py <args ...>
directly if you like.
Run inference on Cityscapes
Dry run:
> python -m runx.runx scripts/eval_cityscapes.yml -i -n
This will just print out the command but not run. It's a good way to inspect the commandline.
Real run:
> python -m runx.runx scripts/eval_cityscapes.yml -i
The reported IOU should be 86.92. This evaluates with scales of 0.5, 1.0. and 2.0. You will find evaluation results in ./logs/eval_cityscapes/...
Run inference on Mapillary
> python -m runx.runx scripts/eval_mapillary.yml -i
The reported IOU should be 61.05. Note that this must be run on a 32GB node and the use of 'O3' mode for amp is critical in order to avoid GPU out of memory. Results in logs/eval_mapillary/...
Dump images for Cityscapes
> python -m runx.runx scripts/dump_cityscapes.yml -i
This will dump network output and composited images from running evaluation with the Cityscapes validation set.
Run inference and dump images on a folder of images
> python -m runx.runx scripts/dump_folder.yml -i
You should end up seeing images that look like the following:
Train a model
Train cityscapes, using HRNet + OCR + multi-scale attention with fine data and mapillary-pretrained model
> python -m runx.runx scripts/train_cityscapes.yml -i
The first time this command is run, a centroid file has to be built for the dataset. It'll take about 10 minutes. The centroid file is used during training to know how to sample from the dataset in a class-uniform way.
This training run should deliver a model that achieves 84.7 IOU.
Train SOTA default train-val split
> python -m runx.runx scripts/train_cityscapes_sota.yml -i
Again, use -n
to do a dry run and just print out the command. This should result in a model with 86.8 IOU. If you run out of memory, try to lower the crop size or turn off rmi_loss.
Top Related Projects
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Models and examples built with TensorFlow
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
Datasets, Transforms and Models specific to Computer Vision
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot