Realtime_Multi-Person_Pose_Estimation
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
Top Related Projects
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Pretrained models for TensorFlow.js
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.
The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"
Quick Overview
Realtime_Multi-Person_Pose_Estimation is a GitHub repository that implements a real-time approach for multi-person pose estimation. It uses Convolutional Neural Networks (CNNs) to detect human body parts and associate them with individuals in images or video streams. This project is based on the paper "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" and provides both training and testing code.
Pros
- High accuracy in detecting and associating body parts in real-time
- Supports multi-person pose estimation in complex scenes
- Provides pre-trained models for quick implementation
- Includes both Caffe and PyTorch implementations
Cons
- Requires significant computational resources for real-time performance
- Limited documentation for customization and fine-tuning
- Dependency on specific versions of libraries may cause compatibility issues
- Primarily focused on 2D pose estimation, lacking 3D capabilities
Code Examples
- Loading the model and performing inference:
import cv2
from src import model
from src import util
from src.body import Body
body_estimation = Body('model/body_pose_model.pth')
test_image = cv2.imread('sample_image.jpg')
candidate, subset = body_estimation(test_image)
canvas = util.draw_bodypose(test_image, candidate, subset)
cv2.imwrite('result.jpg', canvas)
- Processing video input:
import cv2
from src import model
from src import util
from src.body import Body
body_estimation = Body('model/body_pose_model.pth')
cap = cv2.VideoCapture('sample_video.mp4')
while True:
ret, frame = cap.read()
if not ret:
break
candidate, subset = body_estimation(frame)
canvas = util.draw_bodypose(frame, candidate, subset)
cv2.imshow('Pose Estimation', canvas)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
- Extracting specific body part coordinates:
import numpy as np
from src.body import Body
body_estimation = Body('model/body_pose_model.pth')
image = cv2.imread('sample_image.jpg')
candidate, subset = body_estimation(image)
# Get coordinates of right wrist (keypoint index 4)
right_wrist = np.array(candidate[int(subset[0][4])][0:2])
print(f"Right wrist coordinates: {right_wrist}")
Getting Started
-
Clone the repository:
git clone https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation.git cd Realtime_Multi-Person_Pose_Estimation
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models:
wget https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation/releases/download/v1.0/body_pose_model.pth -O model/body_pose_model.pth
-
Run the demo:
python demo.py --image sample_images/demo.jpg
Competitor Comparisons
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Pros of openpose
- More comprehensive documentation and installation guides
- Wider range of supported platforms (Windows, Linux, Mac, Python, C++, Unity)
- Active development and regular updates
Cons of openpose
- Higher system requirements and potentially slower performance
- More complex setup process due to additional dependencies
- Larger codebase, which may be harder to customize or integrate
Code Comparison
Realtime_Multi-Person_Pose_Estimation:
from config_reader import config_reader
from model import get_testing_model
# Load model
model = get_testing_model()
model.load_weights('model.h5')
openpose:
import pyopenpose as op
# Custom Params
params = dict()
params["model_folder"] = "models/"
# Starting OpenPose
opWrapper = op.WrapperPython()
opWrapper.configure(params)
opWrapper.start()
Both repositories focus on real-time multi-person pose estimation, but openpose offers a more comprehensive and actively maintained solution. It supports a wider range of platforms and provides better documentation, making it more accessible for beginners. However, this comes at the cost of higher system requirements and a more complex setup process.
Realtime_Multi-Person_Pose_Estimation may be a lighter-weight alternative with potentially faster performance, but it lacks the extensive features and support of openpose. The code comparison shows that openpose uses a dedicated Python wrapper, while Realtime_Multi-Person_Pose_Estimation relies on a more straightforward model loading approach.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- More comprehensive and versatile, supporting a wide range of computer vision tasks beyond pose estimation
- Actively maintained by Facebook AI Research, with frequent updates and improvements
- Extensive documentation and community support
Cons of Detectron2
- Higher computational requirements, potentially slower for real-time applications
- Steeper learning curve due to its broader scope and more complex architecture
Code Comparison
Realtime_Multi-Person_Pose_Estimation:
from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')
Detectron2:
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
Realtime_Multi-Person_Pose_Estimation focuses specifically on multi-person pose estimation, while Detectron2 offers a more general-purpose computer vision framework. The former may be more suitable for projects requiring fast, real-time pose estimation, while the latter provides a broader range of capabilities and is better suited for complex, multi-task computer vision projects.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Pros of Mask_RCNN
- Provides instance segmentation in addition to object detection and classification
- Supports training on custom datasets with flexible configuration options
- Includes pre-trained models for quick start and transfer learning
Cons of Mask_RCNN
- Generally slower inference speed compared to Realtime_Multi-Person_Pose_Estimation
- Requires more computational resources for training and inference
- May have lower accuracy in real-time scenarios due to its focus on instance segmentation
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(COCO_MODEL_PATH, by_name=True)
results = model.detect([image], verbose=1)
Realtime_Multi-Person_Pose_Estimation:
from scipy.ndimage.filters import gaussian_filter
from model import get_testing_model
model = get_testing_model()
model.load_weights('model/keras/model.h5')
heatmaps, pafs = model.predict(input_image)
Both repositories offer powerful computer vision solutions, but they focus on different aspects. Mask_RCNN provides more comprehensive instance segmentation capabilities, while Realtime_Multi-Person_Pose_Estimation specializes in real-time pose estimation for multiple people. The choice between them depends on the specific requirements of your project, considering factors such as speed, accuracy, and the type of analysis needed.
Pretrained models for TensorFlow.js
Pros of tfjs-models
- Browser-based implementation, enabling client-side pose estimation
- Supports multiple pre-trained models for various tasks (pose estimation, object detection, etc.)
- Easier integration with web applications and JavaScript frameworks
Cons of tfjs-models
- Generally slower performance compared to native implementations
- Limited customization options for the underlying models
- May have lower accuracy in complex scenarios or with multiple people
Code Comparison
Realtime_Multi-Person_Pose_Estimation:
from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')
heatmaps, pafs = model.predict(input_image)
tfjs-models:
const net = await posenet.load();
const pose = await net.estimateSinglePose(imageElement);
const keypoints = pose.keypoints;
The Realtime_Multi-Person_Pose_Estimation example shows loading a custom model and making predictions, while the tfjs-models code demonstrates the simplicity of using a pre-trained PoseNet model in JavaScript.
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.
Pros of openpifpaf
- More actively maintained with regular updates and contributions
- Supports a wider range of pose estimation tasks, including animal pose estimation
- Better documentation and examples for ease of use
Cons of openpifpaf
- May have slightly lower real-time performance compared to Realtime_Multi-Person_Pose_Estimation
- Requires more setup and configuration for specific use cases
Code Comparison
Realtime_Multi-Person_Pose_Estimation:
from config_reader import config_reader
from model import get_testing_model
model = get_testing_model()
model.load_weights('model.h5')
openpifpaf:
import openpifpaf
predictor = openpifpaf.Predictor(checkpoint='shufflenetv2k16')
predictions, gt_anns, image_meta = predictor.numpy_image(image)
Both repositories provide implementations for multi-person pose estimation, but openpifpaf offers a more user-friendly API and broader functionality. Realtime_Multi-Person_Pose_Estimation may have an edge in real-time performance for specific scenarios. openpifpaf's code is more modular and easier to integrate into existing projects, while Realtime_Multi-Person_Pose_Estimation requires more manual setup and configuration.
The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"
Pros of human-pose-estimation.pytorch
- Implemented in PyTorch, offering better flexibility and ease of use for deep learning researchers
- Provides pre-trained models and evaluation scripts for quick deployment
- Includes data augmentation techniques for improved model performance
Cons of human-pose-estimation.pytorch
- Limited to single-person pose estimation, unlike the multi-person capability of Realtime_Multi-Person_Pose_Estimation
- May have slower inference speed compared to the real-time performance of Realtime_Multi-Person_Pose_Estimation
- Less extensive documentation and community support
Code Comparison
Realtime_Multi-Person_Pose_Estimation (C++):
#include <openpose/pose/poseExtractorCaffe.hpp>
// Initialize OpenPose
op::PoseExtractorCaffe poseExtractorCaffe{poseModel, modelFolder, 0, heatMapTypes, scaleMode, 1};
human-pose-estimation.pytorch (Python):
from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load(model_file))
Both repositories focus on human pose estimation but differ in implementation languages, supported features, and ease of use for different applications.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Realtime Multi-Person Pose Estimation
By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh.
Introduction
Code repo for winning 2016 MSCOCO Keypoints Challenge, 2016 ECCV Best Demo Award, and 2017 CVPR Oral paper.
Watch our video result in YouTube or our website.
We present a bottom-up approach for realtime multi-person pose estimation, without using any person detector. For more details, refer to our CVPR'17 paper, our oral presentation video recording at CVPR 2017 or our presentation slides at ILSVRC and COCO workshop 2016.
This project is licensed under the terms of the license.
Other Implementations
Thank you all for the efforts for the reimplementation! If you have new implementation and want to share with others, feel free to make a pull request or email me!
- Our new C++ library OpenPose (testing only)
- Tensorflow [version 1] | [version 2] | [version 3] | [version 4] | [version 5] | [version 6] | [version 7 - TF2.1]
- Pytorch [version 1] | [version 2] | [version 3]
- Caffe2 [version 1]
- Chainer [version 1]
- MXnet [version 1]
- MatConvnet [version 1]
- CNTK [version 1]
Contents
Testing
C++ (realtime version, for demo purpose)
- Please use OpenPose, now it can run in CPU/ GPU and windows /Ubuntu.
- Three input options: images, video, webcam
Matlab (slower, for COCO evaluation)
- Compatible with general Caffe. Compile matcaffe.
- Run
cd testing; get_model.sh
to retrieve our latest MSCOCO model from our web server. - Change the caffepath in the
config.m
and rundemo.m
for an example usage.
Python
cd testing/python
ipython notebook
- Open
demo.ipynb
and execute the code
Training
Network Architecture
Training Steps
- Run
cd training; bash getData.sh
to obtain the COCO images indataset/COCO/images/
, keypoints annotations indataset/COCO/annotations/
and COCO official toolbox indataset/COCO/coco/
. - Run
getANNO.m
in matlab to convert the annotation format from json to mat indataset/COCO/mat/
. - Run
genCOCOMask.m
in matlab to obatin the mask images for unlabeled person. You can use 'parfor' in matlab to speed up the code. - Run
genJSON('COCO')
to generate a json file indataset/COCO/json/
folder. The json files contain raw informations needed for training. - Run
python genLMDB.py
to generate your LMDB. (You can also download our LMDB for the COCO dataset (189GB file) by:bash get_lmdb.sh
) - Download our modified caffe: caffe_train. Compile pycaffe. It will be merged with caffe_rtpose (for testing) soon.
- Run
python setLayers.py --exp 1
to generate the prototxt and shell file for training. - Download VGG-19 model, we use it to initialize the first 10 layers for training.
- Run
bash train_pose.sh 0,1
(generated by setLayers.py) to start the training with two gpus.
Citation
Please cite the paper in your publications if it helps your research:
@inproceedings{cao2017realtime,
author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
booktitle = {CVPR},
title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
year = {2017}
}
@inproceedings{wei2016cpm,
author = {Shih-En Wei and Varun Ramakrishna and Takeo Kanade and Yaser Sheikh},
booktitle = {CVPR},
title = {Convolutional pose machines},
year = {2016}
}
Top Related Projects
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Pretrained models for TensorFlow.js
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.
The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot