Convert Figma logo to code with AI

ZheC logoRealtime_Multi-Person_Pose_Estimation

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

5,103
1,367
5,103
107

Top Related Projects

31,643

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

24,600

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pretrained models for TensorFlow.js

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

Quick Overview

Realtime_Multi-Person_Pose_Estimation is a GitHub repository that implements a real-time approach for multi-person pose estimation. It uses Convolutional Neural Networks (CNNs) to detect human body parts and associate them with individuals in images or video streams. This project is based on the paper "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" and provides both training and testing code.

Pros

  • High accuracy in detecting and associating body parts in real-time
  • Supports multi-person pose estimation in complex scenes
  • Provides pre-trained models for quick implementation
  • Includes both Caffe and PyTorch implementations

Cons

  • Requires significant computational resources for real-time performance
  • Limited documentation for customization and fine-tuning
  • Dependency on specific versions of libraries may cause compatibility issues
  • Primarily focused on 2D pose estimation, lacking 3D capabilities

Code Examples

  1. Loading the model and performing inference:
import cv2
from src import model
from src import util
from src.body import Body

body_estimation = Body('model/body_pose_model.pth')
test_image = cv2.imread('sample_image.jpg')
candidate, subset = body_estimation(test_image)
canvas = util.draw_bodypose(test_image, candidate, subset)
cv2.imwrite('result.jpg', canvas)
  1. Processing video input:
import cv2
from src import model
from src import util
from src.body import Body

body_estimation = Body('model/body_pose_model.pth')
cap = cv2.VideoCapture('sample_video.mp4')

while True:
    ret, frame = cap.read()
    if not ret:
        break
    candidate, subset = body_estimation(frame)
    canvas = util.draw_bodypose(frame, candidate, subset)
    cv2.imshow('Pose Estimation', canvas)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
  1. Extracting specific body part coordinates:
import numpy as np
from src.body import Body

body_estimation = Body('model/body_pose_model.pth')
image = cv2.imread('sample_image.jpg')
candidate, subset = body_estimation(image)

# Get coordinates of right wrist (keypoint index 4)
right_wrist = np.array(candidate[int(subset[0][4])][0:2])
print(f"Right wrist coordinates: {right_wrist}")

Getting Started

  1. Clone the repository:

    git clone https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation.git
    cd Realtime_Multi-Person_Pose_Estimation
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models:

    wget https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation/releases/download/v1.0/body_pose_model.pth -O model/body_pose_model.pth
    
  4. Run the demo:

    python demo.py --image sample_images/demo.jpg
    

Competitor Comparisons

31,643

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Pros of openpose

  • More comprehensive documentation and installation guides
  • Wider range of supported platforms (Windows, Linux, Mac, Python, C++, Unity)
  • Active development and regular updates

Cons of openpose

  • Higher system requirements and potentially slower performance
  • More complex setup process due to additional dependencies
  • Larger codebase, which may be harder to customize or integrate

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from config_reader import config_reader
from model import get_testing_model

# Load model
model = get_testing_model()
model.load_weights('model.h5')

openpose:

import pyopenpose as op

# Custom Params
params = dict()
params["model_folder"] = "models/"

# Starting OpenPose
opWrapper = op.WrapperPython()
opWrapper.configure(params)
opWrapper.start()

Both repositories focus on real-time multi-person pose estimation, but openpose offers a more comprehensive and actively maintained solution. It supports a wider range of platforms and provides better documentation, making it more accessible for beginners. However, this comes at the cost of higher system requirements and a more complex setup process.

Realtime_Multi-Person_Pose_Estimation may be a lighter-weight alternative with potentially faster performance, but it lacks the extensive features and support of openpose. The code comparison shows that openpose uses a dedicated Python wrapper, while Realtime_Multi-Person_Pose_Estimation relies on a more straightforward model loading approach.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive and versatile, supporting a wide range of computer vision tasks beyond pose estimation
  • Actively maintained by Facebook AI Research, with frequent updates and improvements
  • Extensive documentation and community support

Cons of Detectron2

  • Higher computational requirements, potentially slower for real-time applications
  • Steeper learning curve due to its broader scope and more complex architecture

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)

Realtime_Multi-Person_Pose_Estimation focuses specifically on multi-person pose estimation, while Detectron2 offers a more general-purpose computer vision framework. The former may be more suitable for projects requiring fast, real-time pose estimation, while the latter provides a broader range of capabilities and is better suited for complex, multi-task computer vision projects.

24,600

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

  • Provides instance segmentation in addition to object detection and classification
  • Supports training on custom datasets with flexible configuration options
  • Includes pre-trained models for quick start and transfer learning

Cons of Mask_RCNN

  • Generally slower inference speed compared to Realtime_Multi-Person_Pose_Estimation
  • Requires more computational resources for training and inference
  • May have lower accuracy in real-time scenarios due to its focus on instance segmentation

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)

Realtime_Multi-Person_Pose_Estimation:

from scipy.ndimage.filters import gaussian_filter
from model import get_testing_model
model = get_testing_model()
model.load_weights('model/keras/model.h5')
heatmaps, pafs = model.predict(input_image)

Both repositories offer powerful computer vision solutions, but they focus on different aspects. Mask_RCNN provides more comprehensive instance segmentation capabilities, while Realtime_Multi-Person_Pose_Estimation specializes in real-time pose estimation for multiple people. The choice between them depends on the specific requirements of your project, considering factors such as speed, accuracy, and the type of analysis needed.

Pretrained models for TensorFlow.js

Pros of tfjs-models

  • Browser-based implementation, enabling client-side pose estimation
  • Supports multiple pre-trained models for various tasks (pose estimation, object detection, etc.)
  • Easier integration with web applications and JavaScript frameworks

Cons of tfjs-models

  • Generally slower performance compared to native implementations
  • Limited customization options for the underlying models
  • May have lower accuracy in complex scenarios or with multiple people

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')
heatmaps, pafs = model.predict(input_image)

tfjs-models:

const net = await posenet.load();
const pose = await net.estimateSinglePose(imageElement);
const keypoints = pose.keypoints;

The Realtime_Multi-Person_Pose_Estimation example shows loading a custom model and making predictions, while the tfjs-models code demonstrates the simplicity of using a pre-trained PoseNet model in JavaScript.

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

Pros of openpifpaf

  • More actively maintained with regular updates and contributions
  • Supports a wider range of pose estimation tasks, including animal pose estimation
  • Better documentation and examples for ease of use

Cons of openpifpaf

  • May have slightly lower real-time performance compared to Realtime_Multi-Person_Pose_Estimation
  • Requires more setup and configuration for specific use cases

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from config_reader import config_reader
from model import get_testing_model

model = get_testing_model()
model.load_weights('model.h5')

openpifpaf:

import openpifpaf

predictor = openpifpaf.Predictor(checkpoint='shufflenetv2k16')
predictions, gt_anns, image_meta = predictor.numpy_image(image)

Both repositories provide implementations for multi-person pose estimation, but openpifpaf offers a more user-friendly API and broader functionality. Realtime_Multi-Person_Pose_Estimation may have an edge in real-time performance for specific scenarios. openpifpaf's code is more modular and easier to integrate into existing projects, while Realtime_Multi-Person_Pose_Estimation requires more manual setup and configuration.

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

Pros of human-pose-estimation.pytorch

  • Implemented in PyTorch, offering better flexibility and ease of use for deep learning researchers
  • Provides pre-trained models and evaluation scripts for quick deployment
  • Includes data augmentation techniques for improved model performance

Cons of human-pose-estimation.pytorch

  • Limited to single-person pose estimation, unlike the multi-person capability of Realtime_Multi-Person_Pose_Estimation
  • May have slower inference speed compared to the real-time performance of Realtime_Multi-Person_Pose_Estimation
  • Less extensive documentation and community support

Code Comparison

Realtime_Multi-Person_Pose_Estimation (C++):

#include <openpose/pose/poseExtractorCaffe.hpp>
// Initialize OpenPose
op::PoseExtractorCaffe poseExtractorCaffe{poseModel, modelFolder, 0, heatMapTypes, scaleMode, 1};

human-pose-estimation.pytorch (Python):

from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load(model_file))

Both repositories focus on human pose estimation but differ in implementation languages, supported features, and ease of use for different applications.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Realtime Multi-Person Pose Estimation

By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh.

Introduction

Code repo for winning 2016 MSCOCO Keypoints Challenge, 2016 ECCV Best Demo Award, and 2017 CVPR Oral paper.

Watch our video result in YouTube or our website.

We present a bottom-up approach for realtime multi-person pose estimation, without using any person detector. For more details, refer to our CVPR'17 paper, our oral presentation video recording at CVPR 2017 or our presentation slides at ILSVRC and COCO workshop 2016.

This project is licensed under the terms of the license.

Other Implementations

Thank you all for the efforts for the reimplementation! If you have new implementation and want to share with others, feel free to make a pull request or email me!

Contents

  1. Testing
  2. Training
  3. Citation

Testing

C++ (realtime version, for demo purpose)

  • Please use OpenPose, now it can run in CPU/ GPU and windows /Ubuntu.
  • Three input options: images, video, webcam

Matlab (slower, for COCO evaluation)

  • Compatible with general Caffe. Compile matcaffe.
  • Run cd testing; get_model.sh to retrieve our latest MSCOCO model from our web server.
  • Change the caffepath in the config.m and run demo.m for an example usage.

Python

  • cd testing/python
  • ipython notebook
  • Open demo.ipynb and execute the code

Training

Network Architecture

Teaser?

Training Steps

  • Run cd training; bash getData.sh to obtain the COCO images in dataset/COCO/images/, keypoints annotations in dataset/COCO/annotations/ and COCO official toolbox in dataset/COCO/coco/.
  • Run getANNO.m in matlab to convert the annotation format from json to mat in dataset/COCO/mat/.
  • Run genCOCOMask.m in matlab to obatin the mask images for unlabeled person. You can use 'parfor' in matlab to speed up the code.
  • Run genJSON('COCO') to generate a json file in dataset/COCO/json/ folder. The json files contain raw informations needed for training.
  • Run python genLMDB.py to generate your LMDB. (You can also download our LMDB for the COCO dataset (189GB file) by: bash get_lmdb.sh)
  • Download our modified caffe: caffe_train. Compile pycaffe. It will be merged with caffe_rtpose (for testing) soon.
  • Run python setLayers.py --exp 1 to generate the prototxt and shell file for training.
  • Download VGG-19 model, we use it to initialize the first 10 layers for training.
  • Run bash train_pose.sh 0,1 (generated by setLayers.py) to start the training with two gpus.

Citation

Please cite the paper in your publications if it helps your research:

@inproceedings{cao2017realtime,
  author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
  booktitle = {CVPR},
  title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
  year = {2017}
  }
  
@inproceedings{wei2016cpm,
  author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
  booktitle = {CVPR},
  title = {Convolutional pose machines},
  year = {2016}
  }