Convert Figma logo to code with AI

Linzaer logoUltra-Light-Fast-Generic-Face-Detector-1MB

💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

7,143
1,541
7,143
128

Top Related Projects

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

An open source library for face detection in images. The face detection speed can reach 1000FPS.

Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.

20,095

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Quick Overview

Ultra-Light-Fast-Generic-Face-Detector-1MB is a lightweight face detection model designed for edge computing devices. It offers a balance between speed and accuracy, with a model size of only 1MB, making it suitable for applications with limited computational resources.

Pros

  • Extremely small model size (1MB), ideal for mobile and embedded devices
  • Fast inference speed, suitable for real-time applications
  • Good accuracy for its size, especially in common scenarios
  • Supports various deep learning frameworks (PyTorch, MNN, NCNN, TensorRT)

Cons

  • May struggle with detecting faces in challenging conditions (e.g., extreme angles, poor lighting)
  • Limited to face detection only, doesn't provide additional facial analysis features
  • Requires some setup and configuration for optimal performance
  • May not be as accurate as larger, more complex models

Code Examples

  1. Loading the model and performing inference using PyTorch:
import torch
from vision.ssd.config.fd_config import define_img_size
from vision.ssd.mb_tiny_RFB_fd import create_Mb_Tiny_RFB_fd

# Define input size and load model
define_img_size(320)
net = create_Mb_Tiny_RFB_fd(2, is_test=True)
net.load("model/pretrained/version-RFB-320.pth")
net.eval()

# Perform inference
confidences, boxes = net.forward(input_image)
  1. Preprocessing an image for inference:
import cv2
import numpy as np

def preprocess(image):
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = cv2.resize(image, (320, 240))
    image = image.astype(np.float32)
    image -= (104, 117, 123)
    image = image.transpose(2, 0, 1)
    return image
  1. Post-processing detection results:
def post_process(confidences, boxes, orig_image, threshold=0.7):
    height, width = orig_image.shape[:2]
    boxes = boxes[0]
    confidences = confidences[0]
    
    for i in range(boxes.shape[0]):
        box = boxes[i, :]
        conf = confidences[i]
        if conf < threshold:
            continue
        x1, y1, x2, y2 = box
        x1 = int(x1 * width)
        y1 = int(y1 * height)
        x2 = int(x2 * width)
        y2 = int(y2 * height)
        cv2.rectangle(orig_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
    
    return orig_image

Getting Started

  1. Clone the repository:

    git clone https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB.git
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models from the repository's release page.

  4. Run the demo script:

    python run_video_face_detect.py
    

This will start face detection on your default webcam using the pre-trained model.

Competitor Comparisons

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

Pros of Pytorch_Retinaface

  • Higher accuracy in face detection, especially for small faces
  • More robust feature extraction using ResNet50 backbone
  • Supports both CPU and GPU inference

Cons of Pytorch_Retinaface

  • Larger model size, requiring more computational resources
  • Slower inference speed compared to Ultra-Light-Fast-Generic-Face-Detector-1MB
  • More complex implementation and setup process

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector-1MB:

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

Pytorch_Retinaface:

from models.retinaface import RetinaFace
cfg = cfg_re50
net = RetinaFace(cfg=cfg, phase='test')
net = load_model(net, args.trained_model, args.cpu)
net.eval()

Both repositories provide face detection solutions, but they cater to different use cases. Ultra-Light-Fast-Generic-Face-Detector-1MB focuses on lightweight deployment and fast inference, making it suitable for mobile and edge devices. Pytorch_Retinaface, on the other hand, prioritizes accuracy and robustness, making it more appropriate for scenarios where computational resources are less constrained and high precision is required.

An open source library for face detection in images. The face detection speed can reach 1000FPS.

Pros of libfacedetection

  • More mature project with a longer development history
  • Supports multiple programming languages (C++, Python, Java, etc.)
  • Offers both CPU and GPU acceleration

Cons of libfacedetection

  • Larger model size compared to Ultra-Light-Fast-Generic-Face-Detector-1MB
  • May have slightly slower inference speed on some devices
  • Less focus on mobile deployment

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector-1MB (Python):

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

libfacedetection (C++):

#include "facedetectcnn.h"
#define DETECT_BUFFER_SIZE 0x20000
unsigned char * pBuffer = (unsigned char *)malloc(DETECT_BUFFER_SIZE);
int * pResults = facedetect_cnn(pBuffer, (unsigned char*)(rgbImageData), width, height, stride);

Both libraries offer efficient face detection capabilities, but Ultra-Light-Fast-Generic-Face-Detector-1MB is more focused on lightweight models for mobile devices, while libfacedetection provides a broader range of features and language support. The code examples demonstrate the initialization process for each library, highlighting their different approaches to face detection implementation.

Caffe implementation of Google MobileNet SSD detection network, with pretrained weights on VOC0712 and mAP=0.727.

Pros of MobileNet-SSD

  • More versatile object detection (not limited to faces)
  • Larger community and wider adoption
  • Better documentation and examples

Cons of MobileNet-SSD

  • Larger model size (>5MB vs 1MB for Ultra-Light-Fast-Generic-Face-Detector)
  • Potentially slower inference time on mobile devices
  • Less optimized for face detection specifically

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector:

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

MobileNet-SSD:

from vision.ssd.mobilenet_v2_ssd_lite import create_mobilenetv2_ssd_lite
net = create_mobilenetv2_ssd_lite(21, is_test=True)

Both repositories provide implementations of lightweight object detection models, but they have different focuses. Ultra-Light-Fast-Generic-Face-Detector is specifically optimized for face detection with a very small model size, making it ideal for mobile and embedded devices. MobileNet-SSD, on the other hand, offers a more general-purpose object detection solution with support for multiple object classes. The choice between the two depends on the specific use case and requirements of the project.

20,095

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Pros of ncnn

  • More versatile: supports a wide range of neural network operations and models
  • Better performance: optimized for mobile and embedded devices
  • Larger community and more frequent updates

Cons of ncnn

  • Steeper learning curve: requires more expertise to use effectively
  • Less focused: not specifically designed for face detection

Code Comparison

Ultra-Light-Fast-Generic-Face-Detector-1MB:

from vision.ssd.config.fd_config import define_img_size
input_size = 320
define_img_size(input_size)
from vision.ssd.mb_tiny_fd import create_mb_tiny_fd
net = create_mb_tiny_fd(2, is_test=True)

ncnn:

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in = ncnn::Mat::from_pixels_resize(image_data, ncnn::Mat::PIXEL_BGR, width, height, target_width, target_height);

Both repositories offer efficient solutions for mobile and embedded devices, but they serve different purposes. Ultra-Light-Fast-Generic-Face-Detector-1MB is specifically designed for face detection with a small model size, while ncnn is a more general-purpose neural network inference framework. The code examples show the simplicity of using Ultra-Light-Fast-Generic-Face-Detector-1MB for face detection, compared to the more flexible but potentially complex setup required for ncnn.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

English | 中文简体

Ultra-Light-Fast-Generic-Face-Detector-1MB

Ultra-lightweight face detection model

img1 This model is a lightweight facedetection model designed for edge computing devices.

Tested the environment that works

  • Ubuntu16.04、Ubuntu18.04、Windows 10(for inference)
  • Python3.6
  • Pytorch1.2
  • CUDA10.0 + CUDNN7.6

Accuracy, speed, model size comparison

The training set is the VOC format data set generated by using the cleaned widerface labels provided by Retinaface in conjunction with the widerface data set (PS: the following test results were obtained by myself, and the results may be partially inconsistent).

Widerface test

  • Test accuracy in the WIDER FACE val set (single-scale input resolution: 320*240 or scaling by the maximum side length of 320)
ModelEasy SetMedium SetHard Set
libfacedetection v1(caffe)0.650.50.233
libfacedetection v2(caffe)0.7140.5850.306
Retinaface-Mobilenet-0.25 (Mxnet)0.7450.5530.232
version-slim0.770.6710.395
version-RFB0.7870.6980.438
  • Test accuracy in the WIDER FACE val set (single-scale input resolution: VGA 640*480 or scaling by the maximum side length of 640 )
ModelEasy SetMedium SetHard Set
libfacedetection v1(caffe)0.7410.6830.421
libfacedetection v2(caffe)0.7730.7180.485
Retinaface-Mobilenet-0.25 (Mxnet)0.8790.8070.481
version-slim0.8530.8190.539
version-RFB0.8550.8220.579
  • This part mainly tests the effect of the test set under the medium and small resolutions.
  • RetinaFace-mnet (Retinaface-Mobilenet-0.25), from a great job insightface, when testing this network, the original image is scaled by 320 or 640 as the maximum side length, so the face will not be deformed, and the rest of the networks will have a fixed size resize. At the same time, the result of the RetinaFace-mnet optimal 1600 single-scale val set was 0.887 (Easy) / 0.87 (Medium) / 0.791 (Hard).

Terminal device inference speed

  • Raspberry Pi 4B MNN Inference Latency (unit: ms) (ARM/A72x4/1.5GHz/input resolution: 320x240 /int8 quantization)
Model1 core2 core3 core4 core
libfacedetection v12816129.7
Official Retinaface-Mobilenet-0.25 (Mxnet)462518.515
version-slim2916129.5
version-RFB3519.614.811
ModelInference Latency(ms)
slim-3206.33
RFB-3207.8
ModelInference Latency(ms)
slim-32065.6
RFB-320164.8

Model size comparison

  • Comparison of several open source lightweight face detection models:
Modelmodel file size(MB)
libfacedetection v1(caffe)2.58
libfacedetection v2(caffe)3.34
Official Retinaface-Mobilenet-0.25 (Mxnet)1.68
version-slim1.04
version-RFB1.11

Generate VOC format training data set and training process

  1. Download the wideface official website dataset or download the training set I provided and extract it into the ./data folder:

(1) The clean widerface data pack after filtering out the 10px*10px small face: Baidu cloud disk (extraction code: cbiu) 、Google Drive

(2) Complete widerface data compression package without filtering small faces: Baidu cloud disk (extraction code: ievk)、Google Drive

  1. (PS: If you download the filtered packets in (1) above, you don't need to perform this step) Because the wideface has many small and unclear faces, which is not conducive to the convergence of efficient models, it needs to be filtered for training.By default,faces smaller than 10 pixels by 10 pixels will be filtered. run ./data/wider_face_2_voc_add_landmark.py
 python3 ./data/wider_face_2_voc_add_landmark.py

After the program is run and finished, the wider_face_add_lm_10_10 folder will be generated in the ./data directory. The folder data and data package (1) are the same after decompression. The complete directory structure is as follows:

  data/
    retinaface_labels/
      test/
      train/
      val/
    wider_face/
      WIDER_test/
      WIDER_train/
      WIDER_val/
    wider_face_add_lm_10_10/
      Annotations/
      ImageSets/
      JPEGImages/
    wider_face_2_voc_add_landmark.py
  1. At this point, the VOC training set is ready. There are two scripts: train-version-slim.sh and train-version-RFB.sh in the root directory of the project. The former is used to train the slim version model, and the latter is used. Training RFB version model, the default parameters have been set, if the parameters need to be changed, please refer to the description of each training parameter in ./train.py.

  2. Run train-version-slim.sh train-version-RFB.sh

sh train-version-slim.sh or sh train-version-RFB.sh

Detecting image effects (input resolution: 640x480)

img1 img1 img1

PS

  • If the actual production scene is medium-distance, large face, and small number of faces, it is recommended to use input size input_size: 320 (320x240) resolution for training, and use 320x240 ,160x120 or 128x96 image size input for inference, such as using the provided pre-training model version-slim-320.pth or version-RFB-320.pth .
  • If the actual production scene is medium or long distance, medium or small face and large face number, it is recommended to adopt:

(1) Optimal: input size input_size: 640 (640x480) resolution training, and use the same or larger input size for inference, such as using the provided pre-training model version-slim-640.pth or version-RFB-640.pth for inference, lower False positives.

 (2) Sub-optimal: input size input_size: 320 (320x240) resolution training, and use 480x360 or 640x480 size input for predictive reasoning, more sensitive to small faces, false positives will increase.  

  • The best results for each scene require adjustment of the input resolution to strike a balance between speed and accuracy.
  • Excessive input resolution will enhance the recall rate of small faces, but it will also increase the false positive rate of large and close-range faces, and the speed of inference will increase exponentially.
  • Too small input resolution will significantly speed up the inference, but it will greatly reduce the recall rate of small faces.
  • The input resolution of the production scene should be as consistent as possible with the input resolution of the model training, and the up and down floating should not be too large.

TODO LIST

  • Add some test data

Completed list

Third-party related projects

Reference