Mask_RCNN
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Top Related Projects
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Models and examples built with TensorFlow
OpenMMLab Detection Toolbox and Benchmark
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Quick Overview
Mask R-CNN is an implementation of the Mask R-CNN framework for object instance segmentation. It extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression.
Pros
- High accuracy in object detection and instance segmentation tasks
- Flexible architecture that can be used with various backbone networks
- Supports both training and inference on custom datasets
- Well-documented with example notebooks and pre-trained models
Cons
- Computationally intensive, requiring significant GPU resources
- Complex architecture that may be challenging for beginners to understand and modify
- Requires careful hyperparameter tuning for optimal performance
- Limited to 2D image segmentation, not suitable for 3D or video data
Code Examples
- Loading a pre-trained model and performing inference:
import mrcnn.model as modellib
from mrcnn import utils
from mrcnn.config import Config
class InferenceConfig(Config):
NAME = "coco"
GPU_COUNT = 1
IMAGES_PER_GPU = 1
NUM_CLASSES = 81
config = InferenceConfig()
model = modellib.MaskRCNN(mode="inference", config=config, model_dir="./")
model.load_weights("mask_rcnn_coco.h5", by_name=True)
image = skimage.io.imread("image.jpg")
results = model.detect([image], verbose=1)
- Visualizing detection results:
import mrcnn.visualize as visualize
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
class_names, r['scores'])
- Training on a custom dataset:
from mrcnn.config import Config
from mrcnn import utils
import mrcnn.model as modellib
class CustomConfig(Config):
NAME = "custom"
IMAGES_PER_GPU = 2
NUM_CLASSES = 1 + 1 # Background + custom class
config = CustomConfig()
model = modellib.MaskRCNN(mode="training", config=config, model_dir="./")
model.load_weights("mask_rcnn_coco.h5", by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=30,
layers='heads')
Getting Started
-
Install dependencies:
pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/matterport/Mask_RCNN.git cd Mask_RCNN
-
Download pre-trained COCO weights:
wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5
-
Run a demo:
python3 samples/demo.py
Competitor Comparisons
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- More extensive and up-to-date model zoo with state-of-the-art implementations
- Better performance and faster training/inference times
- Modular design allowing easier customization and extension
Cons of Detectron2
- Steeper learning curve due to more complex architecture
- Requires PyTorch, which may not be preferred by TensorFlow users
- Less beginner-friendly documentation compared to Mask_RCNN
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(WEIGHTS_PATH, by_name=True)
results = model.detect([image], verbose=1)
Detectron2:
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file(MODEL_ZOO_CONFIG_PATH)
predictor = DefaultPredictor(cfg)
outputs = predictor(image)
Both repositories provide powerful implementations of instance segmentation models, but Detectron2 offers more flexibility and better performance at the cost of increased complexity. Mask_RCNN may be more suitable for beginners or those preferring TensorFlow, while Detectron2 is better for advanced users seeking cutting-edge performance and customization options.
Models and examples built with TensorFlow
Pros of models
- Broader scope: Includes implementations of various models and architectures
- Official TensorFlow repository: Regularly updated and maintained by the TensorFlow team
- Extensive documentation and examples for different use cases
Cons of models
- Less focused: May require more effort to find and implement specific models
- Potentially steeper learning curve due to the wide range of models and architectures
Code comparison
models:
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
model = tf.saved_model.load('path/to/saved_model')
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS)
Mask_RCNN:
import mrcnn.model as modellib
from mrcnn import utils
class InferenceConfig(coco.CocoConfig):
GPU_COUNT = 1
IMAGES_PER_GPU = 1
model = modellib.MaskRCNN(mode="inference", config=InferenceConfig(), model_dir=MODEL_DIR)
model.load_weights(WEIGHTS_PATH, by_name=True)
The models repository provides a more general-purpose framework for various TensorFlow models, while Mask_RCNN focuses specifically on instance segmentation using the Mask R-CNN architecture. models offers greater flexibility but may require more setup, whereas Mask_RCNN provides a more streamlined implementation for its specific use case.
OpenMMLab Detection Toolbox and Benchmark
Pros of mmdetection
- Supports a wider range of object detection algorithms and models
- More actively maintained with frequent updates and contributions
- Provides comprehensive documentation and tutorials
Cons of mmdetection
- Steeper learning curve due to its more complex architecture
- Requires more setup and configuration compared to Mask_RCNN
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)
mmdetection:
from mmdet.apis import init_detector, inference_detector
config_file = 'configs/mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/mask_rcnn_r50_fpn_1x_coco_20200205-d4b0c5d6.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, img)
Both repositories provide implementations of Mask R-CNN, but mmdetection offers a more flexible and extensive framework for object detection tasks. While Mask_RCNN is simpler to use for beginners, mmdetection provides more advanced features and supports a broader range of models and algorithms.
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
Pros of maskrcnn-benchmark
- Higher performance and faster training due to PyTorch 1.0 optimizations
- More flexible architecture allowing easier customization and extension
- Better support for distributed training on multiple GPUs
Cons of maskrcnn-benchmark
- Steeper learning curve, requiring more advanced PyTorch knowledge
- Less extensive documentation compared to Mask_RCNN
- Fewer pre-trained models available out-of-the-box
Code Comparison
Mask_RCNN (Keras/TensorFlow):
model = modellib.MaskRCNN(mode="training", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_WEIGHTS_PATH, by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc", "mrcnn_bbox", "mrcnn_mask"])
model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=40, layers='heads')
maskrcnn-benchmark (PyTorch):
model = build_detection_model(cfg)
optimizer = make_optimizer(cfg, model)
scheduler = make_lr_scheduler(cfg, optimizer)
arguments = {}
arguments["iteration"] = 0
checkpointer = DetectronCheckpointer(cfg, model, optimizer, scheduler, output_dir, save_to_disk=True)
train(model, data_loader, optimizer, scheduler, checkpointer, device, checkpoint_period, arguments)
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Pros of YOLOv5
- Faster inference speed and real-time object detection capabilities
- More lightweight and efficient, suitable for deployment on edge devices
- Extensive documentation and active community support
Cons of YOLOv5
- Lower accuracy in instance segmentation tasks
- Less suitable for complex, multi-object scenes with overlapping objects
- Limited ability to handle objects with varying scales and aspect ratios
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(WEIGHTS_PATH, by_name=True)
results = model.detect([image], verbose=1)
YOLOv5:
import torch
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
results = model(image)
results.print()
YOLOv5 offers a more straightforward implementation with fewer lines of code, while Mask_RCNN requires more setup and configuration. YOLOv5's simplicity makes it easier to integrate into existing projects, but Mask_RCNN provides more detailed instance segmentation capabilities.
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Pros of Detectron
- More comprehensive and feature-rich, supporting a wider range of object detection algorithms
- Better performance and faster training times due to optimized implementation
- Actively maintained by Facebook AI Research, ensuring regular updates and improvements
Cons of Detectron
- Steeper learning curve and more complex setup process
- Requires Caffe2 and PyTorch, which may not be as widely used as TensorFlow
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)
Detectron:
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)
outputs = predictor(image)
Both repositories provide powerful tools for object detection and instance segmentation, but Detectron offers more flexibility and performance at the cost of increased complexity.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Mask R-CNN for Object Detection and Segmentation
This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.
The repository includes:
- Source code of Mask R-CNN built on FPN and ResNet101.
- Training code for MS COCO
- Pre-trained weights for MS COCO
- Jupyter notebooks to visualize the detection pipeline at every step
- ParallelModel class for multi-GPU training
- Evaluation on MS COCO metrics (AP)
- Example of training on your own dataset
The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below). If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.
Getting Started
-
demo.ipynb Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images. It includes code to run object detection and instance segmentation on arbitrary images.
-
train_shapes.ipynb shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.
-
(model.py, utils.py, config.py): These files contain the main Mask RCNN implementation.
-
inspect_data.ipynb. This notebook visualizes the different pre-processing steps to prepare the training data.
-
inspect_model.ipynb This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.
-
inspect_weights.ipynb This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.
Step by Step Detection
To help with debugging and understanding the model, there are 3 notebooks (inspect_data.ipynb, inspect_model.ipynb, inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:
1. Anchor sorting and filtering
Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.
2. Bounding Box Refinement
This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.
3. Mask Generation
Examples of generated masks. These then get scaled and placed on the image in the right location.
4.Layer activations
Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).
5. Weight Histograms
Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.
6. Logging to TensorBoard
TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.
6. Composing the different pieces into a final result
Training on MS COCO
We're providing pre-trained weights for MS COCO to make it easier to start. You can
use those weights as a starting point to train your own variation on the network.
Training and evaluation code is in samples/coco/coco.py
. You can import this
module in Jupyter notebook (see the provided notebooks for examples) or you
can run it directly from the command line as such:
# Train a new model starting from pre-trained COCO weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco
# Train a new model starting from ImageNet weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet
# Continue training a model that you had trained earlier
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5
# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last
You can also run the COCO evaluation code with:
# Run COCO evaluation on the last trained model
python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last
The training schedule, learning rate, and other parameters should be set in samples/coco/coco.py
.
Training on Your Own Dataset
Start by reading this blog post about the balloon color splash sample. It covers the process starting from annotating images to training to using the results in a sample application.
In summary, to train the model on your own dataset you'll need to extend two classes:
Config
This class contains the default configuration. Subclass it and modify the attributes you need to change.
Dataset
This class provides a consistent way to work with any dataset.
It allows you to use new datasets for training without having to change
the code of the model. It also supports loading multiple datasets at the
same time, which is useful if the objects you want to detect are not
all available in one dataset.
See examples in samples/shapes/train_shapes.ipynb
, samples/coco/coco.py
, samples/balloon/balloon.py
, and samples/nucleus/nucleus.py
.
Differences from the Official Paper
This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.
-
Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.
-
Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.
To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.
-
Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.
Citation
Use this bibtex to cite this repository:
@misc{matterport_maskrcnn_2017,
title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},
author={Waleed Abdulla},
year={2017},
publisher={Github},
journal={GitHub repository},
howpublished={\url{https://github.com/matterport/Mask_RCNN}},
}
Contributing
Contributions to this repository are welcome. Examples of things you can contribute:
- Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
- Training on other datasets.
- Accuracy Improvements.
- Visualizations and examples.
You can also join our team and help us build even more projects like this one.
Requirements
Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt
.
MS COCO Requirements:
To train or test on MS COCO, you'll also need:
- pycocotools (installation instructions below)
- MS COCO Dataset
- Download the 5K minival and the 35K validation-minus-minival subsets. More details in the original Faster R-CNN implementation.
If you use Docker, the code has been verified to work on this Docker container.
Installation
-
Clone this repository
-
Install dependencies
pip3 install -r requirements.txt
-
Run setup from the repository root directory
python3 setup.py install
-
Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.
-
(Optional) To train or test on MS COCO install
pycocotools
from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)
Projects Using this Model
If you extend this model to other datasets or build projects that use it, we'd love to hear from you.
4K Video Demo by Karol Majek.
Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.
Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.
Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl
Code is in the samples/nucleus
directory.
Detection and Segmentation for Surgery Robots by the NUS Control & Mechatronics Lab.
Reconstructing 3D buildings from aerial LiDAR
A proof of concept project by Esri, in collaboration with Nvidia and Miami-Dade County. Along with a great write up and code by Dmitry Kudinov, Daniel Hedges, and Omar Maher.
Usiigaci: Label-free Cell Tracking in Phase Contrast Microscopy
A project from Japan to automatically track cells in a microfluidics platform. Paper is pending, but the source code is released.
Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery
Research project to understand the complex processes between degradations in the Arctic and climate change. By Weixing Zhang, Chandi Witharana, Anna Liljedahl, and Mikhail Kanevskiy.
Mask-RCNN Shiny
A computer vision class project by HU Shiyu to apply the color pop effect on people with beautiful results.
Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.
GRASS GIS Addon to generate vector masks from geospatial imagery. Based on a Master's thesis by OndÅej PeÅ¡ek.
Top Related Projects
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Models and examples built with TensorFlow
OpenMMLab Detection Toolbox and Benchmark
Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot