Ultra-Light-Fast-Generic-Face-Detector-1MB
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
Top Related Projects
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Quick Overview
Ultra-Light-Fast-Generic-Face-Detector-1MB is a lightweight face detection model designed for edge computing devices. It offers a balance between speed and accuracy, with a model size of only 1MB, making it suitable for applications with limited computational resources.
Pros
- Extremely small model size (1MB), ideal for mobile and embedded devices
- Fast inference speed, suitable for real-time applications
- Good accuracy for its size, especially in common scenarios
- Supports various deep learning frameworks (PyTorch, MNN, NCNN, TensorRT)
Cons
- May struggle with detecting faces in challenging conditions (e.g., extreme angles, poor lighting)
- Limited to face detection only, doesn't provide additional facial analysis features
- Requires some setup and configuration for optimal performance
- May not be as accurate as larger, more complex models
Code Examples
- Loading the model and performing inference using PyTorch:
import torch
from vision.ssd.config.fd_config import define_img_size
from vision.ssd.mb_tiny_RFB_fd import create_Mb_Tiny_RFB_fd
# Define input size and load model
define_img_size(320)
net = create_Mb_Tiny_RFB_fd(2, is_test=True)
net.load("model/pretrained/version-RFB-320.pth")
net.eval()
# Perform inference
confidences, boxes = net.forward(input_image)
- Preprocessing an image for inference:
import cv2
import numpy as np
def preprocess(image):
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (320, 240))
image = image.astype(np.float32)
image -= (104, 117, 123)
image = image.transpose(2, 0, 1)
return image
- Post-processing detection results:
def post_process(confidences, boxes, orig_image, threshold=0.7):
height, width = orig_image.shape[:2]
boxes = boxes[0]
confidences = confidences[0]
for i in range(boxes.shape[0]):
box = boxes[i, :]
conf = confidences[i]
if conf < threshold:
continue
x1, y1, x2, y2 = box
x1 = int(x1 * width)
y1 = int(y1 * height)
x2 = int(x2 * width)
y2 = int(y2 * height)
cv2.rectangle(orig_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
return orig_image
Getting Started
-
Clone the repository:
git clone https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB.git
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models from the repository's release page.
-
Run the demo script:
python run_video_face_detect.py
This will start face detection on your default webcam using the pre-trained model.
Competitor Comparisons
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
Pros of Pytorch_Retinaface
- Higher accuracy in face detection, especially for small faces
- More robust feature extraction using ResNet50 backbone
- Supports both CPU and GPU inference
Cons of Pytorch_Retinaface
- Larger model size, requiring more computational resources
- Slower inference speed compared to Ultra-Light-Fast-Generic-Face-Detector-1MB
- More complex implementation and setup process
Code Comparison
Ultra-Light-Fast-Generic-Face-Detector-1MB:
from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)
Pytorch_Retinaface:
from models.retinaface import RetinaFace
cfg = cfg_re50
net = RetinaFace(cfg=cfg, phase='test')
net = load_model(net, args.trained_model, args.cpu)
net.eval()
Both repositories provide face detection solutions, but they cater to different use cases. Ultra-Light-Fast-Generic-Face-Detector-1MB focuses on lightweight deployment and fast inference, making it suitable for mobile and edge devices. Pytorch_Retinaface, on the other hand, prioritizes accuracy and robustness, making it more appropriate for scenarios where computational resources are less constrained and high precision is required.
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Pros of libfacedetection
- More mature project with a longer development history
- Supports multiple programming languages (C++, Python, Java, etc.)
- Offers both CPU and GPU acceleration
Cons of libfacedetection
- Larger model size compared to Ultra-Light-Fast-Generic-Face-Detector-1MB
- May have slightly slower inference speed on some devices
- Less focus on mobile deployment
Code Comparison
Ultra-Light-Fast-Generic-Face-Detector-1MB (Python):
from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)
libfacedetection (C++):
#include "facedetectcnn.h"
#define DETECT_BUFFER_SIZE 0x20000
unsigned char * pBuffer = (unsigned char *)malloc(DETECT_BUFFER_SIZE);
int * pResults = facedetect_cnn(pBuffer, (unsigned char*)(rgbImageData), width, height, stride);
Both libraries offer efficient face detection capabilities, but Ultra-Light-Fast-Generic-Face-Detector-1MB is more focused on lightweight models for mobile devices, while libfacedetection provides a broader range of features and language support. The code examples demonstrate the initialization process for each library, highlighting their different approaches to face detection implementation.
Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.
Pros of MobileNet-SSD
- More versatile object detection (not limited to faces)
- Larger community and wider adoption
- Better documentation and examples
Cons of MobileNet-SSD
- Larger model size (>5MB vs 1MB for Ultra-Light-Fast-Generic-Face-Detector)
- Potentially slower inference time on mobile devices
- Less optimized for face detection specifically
Code Comparison
Ultra-Light-Fast-Generic-Face-Detector:
from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)
MobileNet-SSD:
from vision.ssd.mobilenet_v2_ssd_lite import create_mobilenetv2_ssd_lite
net = create_mobilenetv2_ssd_lite(21, is_test=True)
Both repositories provide implementations of lightweight object detection models, but they have different focuses. Ultra-Light-Fast-Generic-Face-Detector is specifically optimized for face detection with a very small model size, making it ideal for mobile and embedded devices. MobileNet-SSD, on the other hand, offers a more general-purpose object detection solution with support for multiple object classes. The choice between the two depends on the specific use case and requirements of the project.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Pros of ncnn
- More versatile: supports a wide range of neural network operations and models
- Better performance: optimized for mobile and embedded devices
- Larger community and more frequent updates
Cons of ncnn
- Steeper learning curve: requires more expertise to use effectively
- Less focused: not specifically designed for face detection
Code Comparison
Ultra-Light-Fast-Generic-Face-Detector-1MB:
from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)
ncnn:
ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in = ncnn::Mat::from_pixels_resize(image_data, ncnn::Mat::PIXEL_BGR, width, height, target_width, target_height);
Both repositories offer efficient solutions for mobile and embedded devices, but they serve different purposes. Ultra-Light-Fast-Generic-Face-Detector-1MB is specifically designed for face detection with a small model size, while ncnn is a more general-purpose neural network inference framework. The code examples show the simplicity of using Ultra-Light-Fast-Generic-Face-Detector-1MB for face detection, compared to the more flexible but potentially complex setup required for ncnn.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Ultra-Light-Fast-Generic-Face-Detector-1MB
Ultra-lightweight face detection model
This model is a lightweight facedetection model designed for edge computing devices.
- In terms of model size, the default FP32 precision (.pth) file size is 1.04~1.1MB, and the inference framework int8 quantization size is about 300KB.
- In terms of the calculation amount of the model, the input resolution of 320x240 is about 90~109 MFlops.
- There are two versions of the model, version-slim (network backbone simplification,slightly faster) and version-RFB (with the modified RFB module, higher precision).
- Widerface training pre-training model with different input resolutions of 320x240 and 640x480 is provided to better work in different application scenarios.
- Support for onnx export for ease of migration and inference.
- Provide NCNN C++ inference code.
- Provide MNN C++ inference code, MNN Python inference code, FP32/INT8 quantized models.
- Provide Caffe model and onnx2caffe conversion code.
- Caffe python inference code and OpencvDNN inference code.
Tested the environment that works
- Ubuntu16.04ãUbuntu18.04ãWindows 10ï¼for inferenceï¼
- Python3.6
- Pytorch1.2
- CUDA10.0 + CUDNN7.6
Accuracy, speed, model size comparison
The training set is the VOC format data set generated by using the cleaned widerface labels provided by Retinaface in conjunction with the widerface data set (PS: the following test results were obtained by myself, and the results may be partially inconsistent).
Widerface test
- Test accuracy in the WIDER FACE val set (single-scale input resolution: 320*240 or scaling by the maximum side length of 320)
Model | Easy Set | Medium Set | Hard Set |
---|---|---|---|
libfacedetection v1ï¼caffeï¼ | 0.65 | 0.5 | 0.233 |
libfacedetection v2ï¼caffeï¼ | 0.714 | 0.585 | 0.306 |
Retinaface-Mobilenet-0.25 (Mxnet) | 0.745 | 0.553 | 0.232 |
version-slim | 0.77 | 0.671 | 0.395 |
version-RFB | 0.787 | 0.698 | 0.438 |
- Test accuracy in the WIDER FACE val set (single-scale input resolution: VGA 640*480 or scaling by the maximum side length of 640 )
Model | Easy Set | Medium Set | Hard Set |
---|---|---|---|
libfacedetection v1ï¼caffeï¼ | 0.741 | 0.683 | 0.421 |
libfacedetection v2ï¼caffeï¼ | 0.773 | 0.718 | 0.485 |
Retinaface-Mobilenet-0.25 (Mxnet) | 0.879 | 0.807 | 0.481 |
version-slim | 0.853 | 0.819 | 0.539 |
version-RFB | 0.855 | 0.822 | 0.579 |
- This part mainly tests the effect of the test set under the medium and small resolutions.
- RetinaFace-mnet (Retinaface-Mobilenet-0.25), from a great job insightface, when testing this network, the original image is scaled by 320 or 640 as the maximum side length, so the face will not be deformed, and the rest of the networks will have a fixed size resize. At the same time, the result of the RetinaFace-mnet optimal 1600 single-scale val set was 0.887 (Easy) / 0.87 (Medium) / 0.791 (Hard).
Terminal device inference speed
- Raspberry Pi 4B MNN Inference Latency (unit: ms) (ARM/A72x4/1.5GHz/input resolution: 320x240 /int8 quantization)
Model | 1 core | 2 core | 3 core | 4 core |
---|---|---|---|---|
libfacedetection v1 | 28 | 16 | 12 | 9.7 |
Official Retinaface-Mobilenet-0.25 (Mxnet) | 46 | 25 | 18.5 | 15 |
version-slim | 29 | 16 | 12 | 9.5 |
version-RFB | 35 | 19.6 | 14.8 | 11 |
- iPhone 6s Plus MNN (version tagï¼0.2.1.5) Inference Latency ( input resolution : 320x240 )Data comes from MNN official
Model | Inference Latency(ms) |
---|---|
slim-320 | 6.33 |
RFB-320 | 7.8 |
- Kendryte K210 NNCase Inference Latency (RISC-V/400MHz/input resolution: 320x240 /int8 quantization)Data comes from NNCase
Model | Inference Latency(ms) |
---|---|
slim-320 | 65.6 |
RFB-320 | 164.8 |
Model size comparison
- Comparison of several open source lightweight face detection models:
Model | model file sizeï¼MBï¼ |
---|---|
libfacedetection v1ï¼caffeï¼ | 2.58 |
libfacedetection v2ï¼caffeï¼ | 3.34 |
Official Retinaface-Mobilenet-0.25 (Mxnet) | 1.68 |
version-slim | 1.04 |
version-RFB | 1.11 |
Generate VOC format training data set and training process
- Download the wideface official website dataset or download the training set I provided and extract it into the ./data folder:
(1) The clean widerface data pack after filtering out the 10px*10px small face: Baidu cloud disk (extraction code: cbiu) ãGoogle Drive
(2) Complete widerface data compression package without filtering small faces: Baidu cloud disk (extraction code: ievk)ãGoogle Drive
- (PS: If you download the filtered packets in (1) above, you don't need to perform this step) Because the wideface has many small and unclear faces, which is not conducive to the convergence of efficient models, it needs to be filtered for training.By default,faces smaller than 10 pixels by 10 pixels will be filtered. run ./data/wider_face_2_voc_add_landmark.py
python3 ./data/wider_face_2_voc_add_landmark.py
After the program is run and finished, the wider_face_add_lm_10_10 folder will be generated in the ./data directory. The folder data and data package (1) are the same after decompression. The complete directory structure is as follows:
data/
retinaface_labels/
test/
train/
val/
wider_face/
WIDER_test/
WIDER_train/
WIDER_val/
wider_face_add_lm_10_10/
Annotations/
ImageSets/
JPEGImages/
wider_face_2_voc_add_landmark.py
-
At this point, the VOC training set is ready. There are two scripts: train-version-slim.sh and train-version-RFB.sh in the root directory of the project. The former is used to train the slim version model, and the latter is used. Training RFB version model, the default parameters have been set, if the parameters need to be changed, please refer to the description of each training parameter in ./train.py.
-
Run train-version-slim.sh train-version-RFB.sh
sh train-version-slim.sh or sh train-version-RFB.sh
Detecting image effects (input resolution: 640x480)
PS
- If the actual production scene is medium-distance, large face, and small number of faces, it is recommended to use input size input_size: 320 (320x240) resolution for training, and use 320x240 ,160x120 or 128x96 image size input for inference, such as using the provided pre-training model version-slim-320.pth or version-RFB-320.pth .
- If the actual production scene is medium or long distance, medium or small face and large face number, it is recommended to adopt:
(1) Optimal: input size input_size: 640 (640x480) resolution training, and use the same or larger input size for inference, such as using the provided pre-training model version-slim-640.pth or version-RFB-640.pth for inference, lower False positives.
 (2) Sub-optimal: input size input_size: 320 (320x240) resolution training, and use 480x360 or 640x480 size input for predictive reasoning, more sensitive to small faces, false positives will increase. Â
- The best results for each scene require adjustment of the input resolution to strike a balance between speed and accuracy.
- Excessive input resolution will enhance the recall rate of small faces, but it will also increase the false positive rate of large and close-range faces, and the speed of inference will increase exponentially.
- Too small input resolution will significantly speed up the inference, but it will greatly reduce the recall rate of small faces.
- The input resolution of the production scene should be as consistent as possible with the input resolution of the model training, and the up and down floating should not be too large.
TODO LIST
- Add some test data
Completed list
- Widerface test code
- NCNN C++ inference code (vealocia)
- MNN C++ inference code, MNN Python inference code
- Caffe model and onnx2caffe conversion code
- Caffe python inference code and OpencvDNN inference code
Third-party related projects
- NNCase C++ inference code
- UltraFaceDotNet (C#)
- faceDetect-ios
- Android-FaceDetection-UltraNet-MNN
- Ultra-Tensorflow-Model-Converter
- UltraFace TNN C++ Demo
Reference
Top Related Projects
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot