Realtime_Multi-Person_Pose_Estimation

Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

5,119

1,362

5,119

107

View on GitHub

Top Related Projects

openpose

32,828

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Mask_RCNN

25,251

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

tfjs-models

14,585

Pretrained models for TensorFlow.js

openpifpaf

1,214

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

human-pose-estimation.pytorch

2,994

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

Quick Overview

Realtime_Multi-Person_Pose_Estimation is a GitHub repository that implements a real-time approach for multi-person pose estimation. It uses Convolutional Neural Networks (CNNs) to detect human body parts and associate them with individuals in images or video streams. This project is based on the paper "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" and provides both training and testing code.

Pros

High accuracy in detecting and associating body parts in real-time
Supports multi-person pose estimation in complex scenes
Provides pre-trained models for quick implementation
Includes both Caffe and PyTorch implementations

Cons

Requires significant computational resources for real-time performance
Limited documentation for customization and fine-tuning
Dependency on specific versions of libraries may cause compatibility issues
Primarily focused on 2D pose estimation, lacking 3D capabilities

Code Examples

Loading the model and performing inference:

import cv2
from src import model
from src import util
from src.body import Body

body_estimation = Body('model/body_pose_model.pth')
test_image = cv2.imread('sample_image.jpg')
candidate, subset = body_estimation(test_image)
canvas = util.draw_bodypose(test_image, candidate, subset)
cv2.imwrite('result.jpg', canvas)

Processing video input:

import cv2
from src import model
from src import util
from src.body import Body

body_estimation = Body('model/body_pose_model.pth')
cap = cv2.VideoCapture('sample_video.mp4')

while True:
    ret, frame = cap.read()
    if not ret:
        break
    candidate, subset = body_estimation(frame)
    canvas = util.draw_bodypose(frame, candidate, subset)
    cv2.imshow('Pose Estimation', canvas)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Extracting specific body part coordinates:

import numpy as np
from src.body import Body

body_estimation = Body('model/body_pose_model.pth')
image = cv2.imread('sample_image.jpg')
candidate, subset = body_estimation(image)

# Get coordinates of right wrist (keypoint index 4)
right_wrist = np.array(candidate[int(subset[0][4])][0:2])
print(f"Right wrist coordinates: {right_wrist}")

Getting Started

Clone the repository:

git clone https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation.git
cd Realtime_Multi-Person_Pose_Estimation

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

wget https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation/releases/download/v1.0/body_pose_model.pth -O model/body_pose_model.pth

Run the demo:

python demo.py --image sample_images/demo.jpg

Competitor Comparisons

openpose

32,828

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Pros of openpose

More comprehensive documentation and installation guides
Wider range of supported platforms (Windows, Linux, Mac, Python, C++, Unity)
Active development and regular updates

Cons of openpose

Higher system requirements and potentially slower performance
More complex setup process due to additional dependencies
Larger codebase, which may be harder to customize or integrate

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from config_reader import config_reader
from model import get_testing_model

# Load model
model = get_testing_model()
model.load_weights('model.h5')

openpose:

import pyopenpose as op

# Custom Params
params = dict()
params["model_folder"] = "models/"

# Starting OpenPose
opWrapper = op.WrapperPython()
opWrapper.configure(params)
opWrapper.start()

Both repositories focus on real-time multi-person pose estimation, but openpose offers a more comprehensive and actively maintained solution. It supports a wider range of platforms and provides better documentation, making it more accessible for beginners. However, this comes at the cost of higher system requirements and a more complex setup process.

Realtime_Multi-Person_Pose_Estimation may be a lighter-weight alternative with potentially faster performance, but it lacks the extensive features and support of openpose. The code comparison shows that openpose uses a dedicated Python wrapper, while Realtime_Multi-Person_Pose_Estimation relies on a more straightforward model loading approach.

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

More comprehensive and versatile, supporting a wide range of computer vision tasks beyond pose estimation
Actively maintained by Facebook AI Research, with frequent updates and improvements
Extensive documentation and community support

Cons of Detectron2

Higher computational requirements, potentially slower for real-time applications
Steeper learning curve due to its broader scope and more complex architecture

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)

Realtime_Multi-Person_Pose_Estimation focuses specifically on multi-person pose estimation, while Detectron2 offers a more general-purpose computer vision framework. The former may be more suitable for projects requiring fast, real-time pose estimation, while the latter provides a broader range of capabilities and is better suited for complex, multi-task computer vision projects.

Mask_RCNN

25,251

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

Provides instance segmentation in addition to object detection and classification
Supports training on custom datasets with flexible configuration options
Includes pre-trained models for quick start and transfer learning

Cons of Mask_RCNN

Generally slower inference speed compared to Realtime_Multi-Person_Pose_Estimation
Requires more computational resources for training and inference
May have lower accuracy in real-time scenarios due to its focus on instance segmentation

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)

Realtime_Multi-Person_Pose_Estimation:

from scipy.ndimage.filters import gaussian_filter
from model import get_testing_model
model = get_testing_model()
model.load_weights('model/keras/model.h5')
heatmaps, pafs = model.predict(input_image)

Both repositories offer powerful computer vision solutions, but they focus on different aspects. Mask_RCNN provides more comprehensive instance segmentation capabilities, while Realtime_Multi-Person_Pose_Estimation specializes in real-time pose estimation for multiple people. The choice between them depends on the specific requirements of your project, considering factors such as speed, accuracy, and the type of analysis needed.

tfjs-models

14,585

Pretrained models for TensorFlow.js

Pros of tfjs-models

Browser-based implementation, enabling client-side pose estimation
Supports multiple pre-trained models for various tasks (pose estimation, object detection, etc.)
Easier integration with web applications and JavaScript frameworks

Cons of tfjs-models

Generally slower performance compared to native implementations
Limited customization options for the underlying models
May have lower accuracy in complex scenarios or with multiple people

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')
heatmaps, pafs = model.predict(input_image)

tfjs-models:

const net = await posenet.load();
const pose = await net.estimateSinglePose(imageElement);
const keypoints = pose.keypoints;

The Realtime_Multi-Person_Pose_Estimation example shows loading a custom model and making predictions, while the tfjs-models code demonstrates the simplicity of using a pre-trained PoseNet model in JavaScript.

openpifpaf

1,214

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

Pros of openpifpaf

More actively maintained with regular updates and contributions
Supports a wider range of pose estimation tasks, including animal pose estimation
Better documentation and examples for ease of use

Cons of openpifpaf

May have slightly lower real-time performance compared to Realtime_Multi-Person_Pose_Estimation
Requires more setup and configuration for specific use cases

Code Comparison

Realtime_Multi-Person_Pose_Estimation:

from config_reader import config_reader
from model import get_testing_model

model = get_testing_model()
model.load_weights('model.h5')

openpifpaf:

import openpifpaf

predictor = openpifpaf.Predictor(checkpoint='shufflenetv2k16')
predictions, gt_anns, image_meta = predictor.numpy_image(image)

Both repositories provide implementations for multi-person pose estimation, but openpifpaf offers a more user-friendly API and broader functionality. Realtime_Multi-Person_Pose_Estimation may have an edge in real-time performance for specific scenarios. openpifpaf's code is more modular and easier to integrate into existing projects, while Realtime_Multi-Person_Pose_Estimation requires more manual setup and configuration.

human-pose-estimation.pytorch

2,994

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

Pros of human-pose-estimation.pytorch

Implemented in PyTorch, offering better flexibility and ease of use for deep learning researchers
Provides pre-trained models and evaluation scripts for quick deployment
Includes data augmentation techniques for improved model performance

Cons of human-pose-estimation.pytorch

Limited to single-person pose estimation, unlike the multi-person capability of Realtime_Multi-Person_Pose_Estimation
May have slower inference speed compared to the real-time performance of Realtime_Multi-Person_Pose_Estimation
Less extensive documentation and community support

Code Comparison

Realtime_Multi-Person_Pose_Estimation (C++):

#include <openpose/pose/poseExtractorCaffe.hpp>
// Initialize OpenPose
op::PoseExtractorCaffe poseExtractorCaffe{poseModel, modelFolder, 0, heatMapTypes, scaleMode, 1};

human-pose-estimation.pytorch (Python):

from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load(model_file))

Both repositories focus on human pose estimation but differ in implementation languages, supported features, and ease of use for different applications.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Realtime Multi-Person Pose Estimation

By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh.

Introduction

Code repo for winning 2016 MSCOCO Keypoints Challenge, 2016 ECCV Best Demo Award, and 2017 CVPR Oral paper.

Watch our video result in YouTube or our website.

We present a bottom-up approach for realtime multi-person pose estimation, without using any person detector. For more details, refer to our CVPR'17 paper, our oral presentation video recording at CVPR 2017 or our presentation slides at ILSVRC and COCO workshop 2016.

This project is licensed under the terms of the license.

Other Implementations

Thank you all for the efforts for the reimplementation! If you have new implementation and want to share with others, feel free to make a pull request or email me!

Our new C++ library OpenPose (testing only)
Tensorflow [version 1] | [version 2] | [version 3] | [version 4] | [version 5] | [version 6] | [version 7 - TF2.1]
Pytorch [version 1] | [version 2] | [version 3]
Caffe2 [version 1]
Chainer [version 1]
MXnet [version 1]
MatConvnet [version 1]
CNTK [version 1]

Testing
Training
Citation

Testing

C++ (realtime version, for demo purpose)

Please use OpenPose, now it can run in CPU/ GPU and windows /Ubuntu.
Three input options: images, video, webcam

Matlab (slower, for COCO evaluation)

Compatible with general Caffe. Compile matcaffe.
Run cd testing; get_model.sh to retrieve our latest MSCOCO model from our web server.
Change the caffepath in the config.m and run demo.m for an example usage.

Python

cd testing/python
ipython notebook
Open demo.ipynb and execute the code

Training

Network Architecture

Teaser?

Training Steps

Run cd training; bash getData.sh to obtain the COCO images in dataset/COCO/images/, keypoints annotations in dataset/COCO/annotations/ and COCO official toolbox in dataset/COCO/coco/.
Run getANNO.m in matlab to convert the annotation format from json to mat in dataset/COCO/mat/.
Run genCOCOMask.m in matlab to obatin the mask images for unlabeled person. You can use 'parfor' in matlab to speed up the code.
Run genJSON('COCO') to generate a json file in dataset/COCO/json/ folder. The json files contain raw informations needed for training.
Run python genLMDB.py to generate your LMDB. (You can also download our LMDB for the COCO dataset (189GB file) by: bash get_lmdb.sh)
Download our modified caffe: caffe_train. Compile pycaffe. It will be merged with caffe_rtpose (for testing) soon.
Run python setLayers.py --exp 1 to generate the prototxt and shell file for training.
Download VGG-19 model, we use it to initialize the first 10 layers for training.
Run bash train_pose.sh 0,1 (generated by setLayers.py) to start the training with two gpus.

Citation

Please cite the paper in your publications if it helps your research:

@inproceedings{cao2017realtime,
  author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
  booktitle = {CVPR},
  title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
  year = {2017}
  }
  
@inproceedings{wei2016cpm,
  author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
  booktitle = {CVPR},
  title = {Convolutional pose machines},
  year = {2016}
  }

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of openpose

Cons of openpose

Code Comparison

Pros of Detectron2

Cons of Detectron2

Code Comparison

Pros of Mask_RCNN

Cons of Mask_RCNN

Code Comparison

Pros of tfjs-models

Cons of tfjs-models

Code Comparison

Pros of openpifpaf

Cons of openpifpaf

Code Comparison

Pros of human-pose-estimation.pytorch

Cons of human-pose-estimation.pytorch

Code Comparison

Convert designs to code with AI

README

Realtime Multi-Person Pose Estimation

Introduction

Other Implementations

Contents

Testing

C++ (realtime version, for demo purpose)

Matlab (slower, for COCO evaluation)

Python

Training

Network Architecture

Training Steps

Citation

Top Related Projects

Convert designs to code with AI