chuanqi305 logoMobileNet-SSD

Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.


Quick Overview

MobileNet-SSD is a GitHub repository that implements a lightweight object detection model combining MobileNet and Single Shot Detector (SSD) architectures. It's designed for efficient real-time object detection on mobile and embedded devices, offering a balance between speed and accuracy.


  • Lightweight and efficient, suitable for mobile and embedded devices
  • Provides good accuracy while maintaining real-time performance
  • Implements a popular and well-established object detection architecture
  • Includes pre-trained models for quick deployment


  • Limited documentation and usage instructions
  • Not actively maintained (last update was several years ago)
  • May not include the latest improvements in object detection techniques
  • Lacks extensive examples and use cases

Code Examples

# Load the MobileNet-SSD model
net = cv2.dnn.readNetFromCaffe('MobileNetSSD_deploy.prototxt', 'MobileNetSSD_deploy.caffemodel')

# Prepare input image
blob = cv2.dnn.blobFromImage(image, 0.007843, (300, 300), 127.5)

# Set the input and perform forward pass
detections = net.forward()

This code snippet demonstrates how to load the MobileNet-SSD model and perform object detection on an input image.

# Loop over detections and draw bounding boxes
for i in range(detections.shape[2]):
    confidence = detections[0, 0, i, 2]
    if confidence > 0.2:
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")
        cv2.rectangle(image, (startX, startY), (endX, endY), (0, 255, 0), 2)

This example shows how to process the detection results and draw bounding boxes around detected objects.

Getting Started

  1. Clone the repository:

    git clone
  2. Download the pre-trained model files:

    • MobileNetSSD_deploy.caffemodel
    • MobileNetSSD_deploy.prototxt
  3. Install dependencies:

    pip install opencv-python numpy
  4. Use the code examples provided above to implement object detection in your Python script.

  5. Run your script with an input image to perform object detection using MobileNet-SSD.

A caffe implementation of MobileNet-SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.



  1. Download SSD source code and compile (follow the SSD README).
  2. Download the pretrained deploy weights from the link above.
  3. Put all the files in SSD_HOME/examples/
  4. Run to show the detection result.
  5. You can run to generate a no bn model, it will be much faster.

Create LMDB for your own dataset

  1. Place the Images directory and Labels directory into same directory. (Each image in Images folder should have a unique label file in Labels folder with same name)
  2. cd create_lmdb/code
  3. Modify the labelmap.prototxt file according to your classes.
  4. Modify the paths and directories in and as specified in same file in comments.
  5. run bash, which will create trainval.txt, test.txt and test_name_size.txt
  6. run bash, which will generate the LMDB in Dataset directory.
  7. Delete trainval.txt, test.txt, test_name_size.txt before creation of next LMDB.

Train your own dataset

  1. Convert your own dataset to lmdb database (follow the SSD README), and create symlinks to current directory.
ln -s PATH_TO_YOUR_TRAIN_LMDB trainval_lmdb
ln -s PATH_TO_YOUR_TEST_LMDB test_lmdb
  1. Create the labelmap.prototxt file and put it into current directory.
  2. Use to generate your own training prototxt.
  3. Download the training weights from the link above, and run, after about 30000 iterations, the loss should be 1.5 - 2.5.
  4. Run to evaluate the result.
  5. Run to generate your own no-bn caffemodel if necessary.
python --model example/MobileNetSSD_deploy.prototxt --weights snapshot/mobilenet_iter_xxxxxx.caffemodel

About some details

There are 2 primary differences between this model and MobileNet-SSD on tensorflow:

  1. ReLU6 layer is replaced by ReLU.
  2. For the conv11_mbox_prior layer, the anchors are [(0.2, 1.0), (0.2, 2.0), (0.2, 0.5)] vs tensorflow's [(0.1, 1.0), (0.2, 2.0), (0.2, 0.5)].

Reproduce the result

I trained this model from a MobileNet classifier(caffemodel and prototxt) converted from tensorflow. I first trained the model on MS-COCO and then fine-tuned on VOC0712. Without MS-COCO pretraining, it can only get mAP=0.68.

Mobile Platform

You can run it on Android with my another project rscnn.