Ultra-Light-Fast-Generic-Face-Detector-1MB

💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

7,318

1,540

7,318

130

View on GitHub

Top Related Projects

Pytorch_Retinaface

2,749

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

libfacedetection

12,450

An open source library for face detection in images. The face detection speed can reach 1000FPS.

MobileNet-SSD

2,066

Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.

ncnn

21,382

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Quick Overview

Ultra-Light-Fast-Generic-Face-Detector-1MB is a lightweight face detection model designed for edge computing devices. It offers a balance between speed and accuracy, with a model size of only 1MB, making it suitable for applications with limited computational resources.

Pros

Extremely small model size (1MB), ideal for mobile and embedded devices
Fast inference speed, suitable for real-time applications
Good accuracy for its size, especially in common scenarios
Supports various deep learning frameworks (PyTorch, MNN, NCNN, TensorRT)

Cons

May struggle with detecting faces in challenging conditions (e.g., extreme angles, poor lighting)
Limited to face detection only, doesn't provide additional facial analysis features
Requires some setup and configuration for optimal performance
May not be as accurate as larger, more complex models

Code Examples

Loading the model and performing inference using PyTorch:

import torch
from vision.ssd.config.fd_config import define_img_size
from vision.ssd.mb_tiny_RFB_fd import create_Mb_Tiny_RFB_fd

# Define input size and load model
define_img_size(320)
net = create_Mb_Tiny_RFB_fd(2, is_test=True)
net.load("model/pretrained/version-RFB-320.pth")
net.eval()

# Perform inference
confidences, boxes = net.forward(input_image)

Preprocessing an image for inference:

import cv2
import numpy as np

def preprocess(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (320, 240))
    image = image.astype(np.float32)
    image -= (104, 117, 123)
    image = image.transpose(2, 0, 1)
    return image

Post-processing detection results:

def post_process(confidences, boxes, orig_image, threshold=0.7):
    height, width = orig_image.shape[:2]
    boxes = boxes[0]
    confidences = confidences[0]
    
    for i in range(boxes.shape[0]):
        box = boxes[i, :]
        conf = confidences[i]
        if conf < threshold:
            continue
        x1, y1, x2, y2 = box
        x1 = int(x1 * width)
        y1 = int(y1 * height)
        x2 = int(x2 * width)
        y2 = int(y2 * height)
        cv2.rectangle(orig_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
    
    return orig_image

Getting Started

Clone the repository:

git clone https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB.git

Install dependencies:
```
pip install -r requirements.txt
```
Download pre-trained models from the repository's release page.
Run the demo script:
```
python run_video_face_detect.py
```

This will start face detection on your default webcam using the pre-trained model.

Competitor Comparisons

Pytorch_Retinaface

2,749

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

Pros of Pytorch_Retinaface

Higher accuracy in face detection, especially for small faces
More robust feature extraction using ResNet50 backbone
Supports both CPU and GPU inference

Cons of Pytorch_Retinaface

Larger model size, requiring more computational resources
Slower inference speed compared to Ultra-Light-Fast-Generic-Face-Detector-1MB
More complex implementation and setup process

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector-1MB:

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

Pytorch_Retinaface:

from models.retinaface import RetinaFace
cfg = cfg_re50
net = RetinaFace(cfg=cfg, phase='test')
net = load_model(net, args.trained_model, args.cpu)
net.eval()

Both repositories provide face detection solutions, but they cater to different use cases. Ultra-Light-Fast-Generic-Face-Detector-1MB focuses on lightweight deployment and fast inference, making it suitable for mobile and edge devices. Pytorch_Retinaface, on the other hand, prioritizes accuracy and robustness, making it more appropriate for scenarios where computational resources are less constrained and high precision is required.

libfacedetection

12,450

An open source library for face detection in images. The face detection speed can reach 1000FPS.

Pros of libfacedetection

More mature project with a longer development history
Supports multiple programming languages (C++, Python, Java, etc.)
Offers both CPU and GPU acceleration

Cons of libfacedetection

Larger model size compared to Ultra-Light-Fast-Generic-Face-Detector-1MB
May have slightly slower inference speed on some devices
Less focus on mobile deployment

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector-1MB (Python):

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

libfacedetection (C++):

#include "facedetectcnn.h"
#define DETECT_BUFFER_SIZE 0x20000
unsigned char * pBuffer = (unsigned char *)malloc(DETECT_BUFFER_SIZE);
int * pResults = facedetect_cnn(pBuffer, (unsigned char*)(rgbImageData), width, height, stride);

Both libraries offer efficient face detection capabilities, but Ultra-Light-Fast-Generic-Face-Detector-1MB is more focused on lightweight models for mobile devices, while libfacedetection provides a broader range of features and language support. The code examples demonstrate the initialization process for each library, highlighting their different approaches to face detection implementation.

MobileNet-SSD

2,066

Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.

Pros of MobileNet-SSD

More versatile object detection (not limited to faces)
Larger community and wider adoption
Better documentation and examples

Cons of MobileNet-SSD

Larger model size (>5MB vs 1MB for Ultra-Light-Fast-Generic-Face-Detector)
Potentially slower inference time on mobile devices
Less optimized for face detection specifically

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector:

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

MobileNet-SSD:

from vision.ssd.mobilenet_v2_ssd_lite import create_mobilenetv2_ssd_lite
net = create_mobilenetv2_ssd_lite(21, is_test=True)

Both repositories provide implementations of lightweight object detection models, but they have different focuses. Ultra-Light-Fast-Generic-Face-Detector is specifically optimized for face detection with a very small model size, making it ideal for mobile and embedded devices. MobileNet-SSD, on the other hand, offers a more general-purpose object detection solution with support for multiple object classes. The choice between the two depends on the specific use case and requirements of the project.

ncnn

21,382

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Pros of ncnn

More versatile: supports a wide range of neural network operations and models
Better performance: optimized for mobile and embedded devices
Larger community and more frequent updates

Cons of ncnn

Steeper learning curve: requires more expertise to use effectively
Less focused: not specifically designed for face detection

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector-1MB:

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

ncnn:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in = ncnn::Mat::from_pixels_resize(image_data, ncnn::Mat::PIXEL_BGR, width, height, target_width, target_height);

Both repositories offer efficient solutions for mobile and embedded devices, but they serve different purposes. Ultra-Light-Fast-Generic-Face-Detector-1MB is specifically designed for face detection with a small model size, while ncnn is a more general-purpose neural network inference framework. The code examples show the simplicity of using Ultra-Light-Fast-Generic-Face-Detector-1MB for face detection, compared to the more flexible but potentially complex setup required for ncnn.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

English | ä¸æç®ä½

Ultra-Light-Fast-Generic-Face-Detector-1MB

Ultra-lightweight face detection model

This model is a lightweight facedetection model designed for edge computing devices.

In terms of model size, the default FP32 precision (.pth) file size is 1.04~1.1MB, and the inference framework int8 quantization size is about 300KB.
In terms of the calculation amount of the model, the input resolution of 320x240 is about 90~109 MFlops.
There are two versions of the model, version-slim (network backbone simplification,slightly faster) and version-RFB (with the modified RFB module, higher precision).
Widerface training pre-training model with different input resolutions of 320x240 and 640x480 is provided to better work in different application scenarios.
Support for onnx export for ease of migration and inference.
Provide NCNN C++ inference code.
Provide MNN C++ inference code, MNN Python inference code, FP32/INT8 quantized models.
Provide Caffe model and onnx2caffe conversion code.
Caffe python inference code and OpencvDNN inference code.

Tested the environment that works

Ubuntu16.04ãUbuntu18.04ãWindows 10ï¼for inferenceï¼
Python3.6
Pytorch1.2
CUDA10.0 + CUDNN7.6

Accuracy, speed, model size comparison

The training set is the VOC format data set generated by using the cleaned widerface labels provided by Retinaface in conjunction with the widerface data set (PS: the following test results were obtained by myself, and the results may be partially inconsistent).

Widerface test

Test accuracy in the WIDER FACE val set (single-scale input resolution: 320*240 or scaling by the maximum side length of 320)

Model	Easy Set	Medium Set	Hard Set
libfacedetection v1ï¼caffeï¼	0.65	0.5	0.233
libfacedetection v2ï¼caffeï¼	0.714	0.585	0.306
Retinaface-Mobilenet-0.25 (Mxnet)	0.745	0.553	0.232
version-slim	0.77	0.671	0.395
version-RFB	0.787	0.698	0.438

Test accuracy in the WIDER FACE val set (single-scale input resolution: VGA 640*480 or scaling by the maximum side length of 640 )

Model	Easy Set	Medium Set	Hard Set
libfacedetection v1ï¼caffeï¼	0.741	0.683	0.421
libfacedetection v2ï¼caffeï¼	0.773	0.718	0.485
Retinaface-Mobilenet-0.25 (Mxnet)	0.879	0.807	0.481
version-slim	0.853	0.819	0.539
version-RFB	0.855	0.822	0.579

This part mainly tests the effect of the test set under the medium and small resolutions.

RetinaFace-mnet (Retinaface-Mobilenet-0.25), from a great job insightface, when testing this network, the original image is scaled by 320 or 640 as the maximum side length, so the face will not be deformed, and the rest of the networks will have a fixed size resize. At the same time, the result of the RetinaFace-mnet optimal 1600 single-scale val set was 0.887 (Easy) / 0.87 (Medium) / 0.791 (Hard).

Terminal device inference speed

Raspberry Pi 4B MNN Inference Latency (unit: ms) (ARM/A72x4/1.5GHz/input resolution: 320x240 /int8 quantization)

Model	1 core	2 core	3 core	4 core
libfacedetection v1	28	16	12	9.7
Official Retinaface-Mobilenet-0.25 (Mxnet)	46	25	18.5	15
version-slim	29	16	12	9.5
version-RFB	35	19.6	14.8	11

iPhone 6s Plus MNN (version tagï¼0.2.1.5) Inference Latency ( input resolution : 320x240 )Data comes from MNN official

Model	Inference Latency(ms)
slim-320	6.33
RFB-320	7.8

Kendryte K210 NNCase Inference Latency (RISC-V/400MHz/input resolution: 320x240 /int8 quantization)Data comes from NNCase

Model	Inference Latency(ms)
slim-320	65.6
RFB-320	164.8

Model size comparison

Comparison of several open source lightweight face detection models:

Model	model file sizeï¼MBï¼
libfacedetection v1ï¼caffeï¼	2.58
libfacedetection v2ï¼caffeï¼	3.34
Official Retinaface-Mobilenet-0.25 (Mxnet)	1.68
version-slim	1.04
version-RFB	1.11

Generate VOC format training data set and training process

Download the wideface official website dataset or download the training set I provided and extract it into the ./data folder:

(1) The clean widerface data pack after filtering out the 10px*10px small face: Baidu cloud disk (extraction code: cbiu) ãGoogle Drive

(2) Complete widerface data compression package without filtering small faces: Baidu cloud disk (extraction code: ievk)ãGoogle Drive

(PS: If you download the filtered packets in (1) above, you don't need to perform this step) Because the wideface has many small and unclear faces, which is not conducive to the convergence of efficient models, it needs to be filtered for training.By default,faces smaller than 10 pixels by 10 pixels will be filtered. run ./data/wider_face_2_voc_add_landmark.py

 python3 ./data/wider_face_2_voc_add_landmark.py

After the program is run and finished, the wider_face_add_lm_10_10 folder will be generated in the ./data directory. The folder data and data package (1) are the same after decompression. The complete directory structure is as follows:

  data/
    retinaface_labels/
      test/
      train/
      val/
    wider_face/
      WIDER_test/
      WIDER_train/
      WIDER_val/
    wider_face_add_lm_10_10/
      Annotations/
      ImageSets/
      JPEGImages/
    wider_face_2_voc_add_landmark.py

At this point, the VOC training set is ready. There are two scripts: train-version-slim.sh and train-version-RFB.sh in the root directory of the project. The former is used to train the slim version model, and the latter is used. Training RFB version model, the default parameters have been set, if the parameters need to be changed, please refer to the description of each training parameter in ./train.py.
Run train-version-slim.sh train-version-RFB.sh

sh train-version-slim.sh or sh train-version-RFB.sh

Detecting image effects (input resolution: 640x480)

PS

If the actual production scene is medium-distance, large face, and small number of faces, it is recommended to use input size input_size: 320 (320x240) resolution for training, and use 320x240 ,160x120 or 128x96 image size input for inference, such as using the provided pre-training model version-slim-320.pth or version-RFB-320.pth .
If the actual production scene is medium or long distance, medium or small face and large face number, it is recommended to adopt:

(1) Optimal: input size input_size: 640 (640x480) resolution training, and use the same or larger input size for inference, such as using the provided pre-training model version-slim-640.pth or version-RFB-640.pth for inference, lower False positives.

Â (2) Sub-optimal: input size input_size: 320 (320x240) resolution training, and use 480x360 or 640x480 size input for predictive reasoning, more sensitive to small faces, false positives will increase. Â

The best results for each scene require adjustment of the input resolution to strike a balance between speed and accuracy.
Excessive input resolution will enhance the recall rate of small faces, but it will also increase the false positive rate of large and close-range faces, and the speed of inference will increase exponentially.
Too small input resolution will significantly speed up the inference, but it will greatly reduce the recall rate of small faces.
The input resolution of the production scene should be as consistent as possible with the input resolution of the model training, and the up and down floating should not be too large.

TODO LIST

Add some test data

Completed list

Third-party related projects

Reference

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot