human-pose-estimation.pytorch

The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking(https://arxiv.org/abs/1804.06208)"

2,933

601

2,933

100

View on GitHub

Top Related Projects

detectron2

29,935

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

openpose

30,786

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

mmpose

5,562

OpenMMLab Pose Estimation Toolbox and Benchmark.

tfjs-models

13,990

Pretrained models for TensorFlow.js

Quick Overview

Microsoft's human-pose-estimation.pytorch is an open-source project for human pose estimation using PyTorch. It implements state-of-the-art deep learning models for detecting and tracking human body keypoints in images and videos. The repository provides pre-trained models, training scripts, and evaluation tools for researchers and developers working on human pose estimation tasks.

Pros

High accuracy and performance on standard pose estimation benchmarks
Supports both 2D and 3D pose estimation
Includes pre-trained models for quick deployment
Comprehensive documentation and example usage

Cons

Requires significant computational resources for training
Limited to human pose estimation (not suitable for other object types)
Dependency on specific versions of PyTorch and other libraries
May require fine-tuning for specific use cases or datasets

Code Examples

Loading a pre-trained model:

from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load('path/to/pretrained_model.pth'))

Performing inference on an image:

from utils.transforms import get_affine_transform
input = cv2.imread('path/to/image.jpg')
input = cv2.cvtColor(input, cv2.COLOR_BGR2RGB)
input = get_affine_transform(input, center, scale, rotation, cfg.MODEL.IMAGE_SIZE)
input = torch.from_numpy(input).unsqueeze(0).float()
output = model(input)

Visualizing the detected keypoints:

from utils.vis import save_batch_image_with_joints
save_batch_image_with_joints(input, output, 'output_image.jpg')

Getting Started

Clone the repository:

git clone https://github.com/microsoft/human-pose-estimation.pytorch.git
cd human-pose-estimation.pytorch

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

mkdir models
wget https://download.pytorch.org/models/resnet50-19c8e357.pth -O models/resnet50-19c8e357.pth

Run inference on an image:

python tools/inference.py --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml --checkpoint models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar --image examples/demo.jpg

Competitor Comparisons

detectron2

29,935

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

Broader scope: Supports multiple computer vision tasks beyond pose estimation
More active development: Frequent updates and larger community
Modular architecture: Easier to extend and customize

Cons of Detectron2

Steeper learning curve: More complex due to its broader scope
Higher resource requirements: May need more powerful hardware

Code Comparison

Human-pose-estimation.pytorch:

from models.pose_resnet import get_pose_net

model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load(model_path))

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Keypoints/keypoint_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)

Human-pose-estimation.pytorch focuses specifically on pose estimation, making it more straightforward for this task. Detectron2 offers a more comprehensive framework for various computer vision tasks, including pose estimation, object detection, and instance segmentation. The code comparison shows that Detectron2 requires more setup but provides a more flexible configuration system.

openpose

30,786

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Pros of openpose

More comprehensive body keypoint detection, including face and hand keypoints
Real-time performance on CPU and GPU
Extensive documentation and community support

Cons of openpose

Larger model size and higher computational requirements
Less flexibility in terms of customization and fine-tuning

Code comparison

openpose:

#include <openpose/flags.hpp>
#include <openpose/headers.hpp>

op::Wrapper opWrapper{op::ThreadManagerMode::Asynchronous};
opWrapper.start();

human-pose-estimation.pytorch:

from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
model.eval()

The openpose example shows C++ code for initializing the wrapper, while human-pose-estimation.pytorch uses Python to create and evaluate the model. openpose offers a higher-level API, while human-pose-estimation.pytorch provides more direct access to the underlying model.

mmpose

5,562

OpenMMLab Pose Estimation Toolbox and Benchmark.

Pros of mmpose

More comprehensive, supporting a wider range of pose estimation tasks and models
Actively maintained with frequent updates and new features
Modular design allowing for easy customization and extension

Cons of mmpose

Steeper learning curve due to its more complex architecture
Potentially higher computational requirements for some models

Code Comparison

mmpose:

from mmpose.apis import inference_top_down_pose_model, init_pose_model

model = init_pose_model(config, checkpoint)
results = inference_top_down_pose_model(model, image, person_results)

human-pose-estimation.pytorch:

from pose_estimation import get_pose_net

model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load(model_path))
output = model(input_image)

Both repositories provide implementations for human pose estimation, but mmpose offers a more extensive framework with additional features and flexibility. While human-pose-estimation.pytorch is simpler and easier to get started with, mmpose provides a more robust solution for complex pose estimation tasks and research. The code comparison shows that mmpose has a more structured API, while human-pose-estimation.pytorch offers a more straightforward approach to model initialization and inference.

tfjs-models

13,990

Pretrained models for TensorFlow.js

Pros of tfjs-models

Runs in web browsers, enabling client-side inference
Supports multiple model types beyond pose estimation
Easier integration with web applications

Cons of tfjs-models

Generally slower performance compared to PyTorch implementation
May have lower accuracy for complex pose estimation tasks
Limited customization options for advanced users

Code Comparison

human-pose-estimation.pytorch:

from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
model.load_state_dict(torch.load(model_path))

tfjs-models:

const net = await posenet.load();
const pose = await net.estimateSinglePose(imageElement);

The PyTorch implementation offers more flexibility in model architecture and training, while the TensorFlow.js version provides a simpler API for quick integration in web applications. The PyTorch code allows for more customization, whereas the TensorFlow.js code is more straightforward for basic pose estimation tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Simple Baselines for Human Pose Estimation and Tracking

News

Our new work High-Resolution Representations for Labeling Pixels and Regions is available at HRNet. Our HRNet has been applied to a wide range of vision tasks, such as image classification, objection detection, semantic segmentation and facial landmark.
Our new work Deep High-Resolution Representation Learning for Human Pose Estimation has already been released at https://github.com/leoxiaobin/deep-high-resolution-net.pytorch. The best single HRNet can obtain an AP of 77.0 on COCO test-dev2017 dataset and 92.3% of PCKh@0.5 on MPII test set. The new repositoty also support the SimpleBaseline method, and you are welcomed to try it.
Our entry using this repo has won the winner of PoseTrack2018 Multi-person Pose Tracking Challenge!
Our entry using this repo ranked 2nd place in the keypoint detection task of COCO 2018!

Introduction

This is an official pytorch implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best single model achieves 74.3 of mAP. You can reproduce our results using this repo. All models are provided for research purpose.

Main Results

Results on MPII val

Arch	Head	Shoulder	Elbow	Wrist	Hip	Knee	Ankle	Mean	Mean@0.1
256x256_pose_resnet_50_d256d256d256	96.351	95.329	88.989	83.176	88.420	83.960	79.594	88.532	33.911
384x384_pose_resnet_50_d256d256d256	96.658	95.754	89.790	84.614	88.523	84.666	79.287	89.066	38.046
256x256_pose_resnet_101_d256d256d256	96.862	95.873	89.518	84.376	88.437	84.486	80.703	89.131	34.020
384x384_pose_resnet_101_d256d256d256	96.965	95.907	90.268	85.780	89.597	85.935	82.098	90.003	38.860
256x256_pose_resnet_152_d256d256d256	97.033	95.941	90.046	84.976	89.164	85.311	81.271	89.620	35.025
384x384_pose_resnet_152_d256d256d256	96.794	95.618	90.080	86.225	89.700	86.862	82.853	90.200	39.433

Note:

Flip test is used.

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
256x192_pose_resnet_50_d256d256d256	0.704	0.886	0.783	0.671	0.772	0.763	0.929	0.834	0.721	0.824
384x288_pose_resnet_50_d256d256d256	0.722	0.893	0.789	0.681	0.797	0.776	0.932	0.838	0.728	0.846
256x192_pose_resnet_101_d256d256d256	0.714	0.893	0.793	0.681	0.781	0.771	0.934	0.840	0.730	0.832
384x288_pose_resnet_101_d256d256d256	0.736	0.896	0.803	0.699	0.811	0.791	0.936	0.851	0.745	0.858
256x192_pose_resnet_152_d256d256d256	0.720	0.893	0.798	0.687	0.789	0.778	0.934	0.846	0.736	0.839
384x288_pose_resnet_152_d256d256d256	0.743	0.896	0.811	0.705	0.816	0.797	0.937	0.858	0.751	0.863

Results on Caffe-style ResNet

Arch	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
256x192_pose_resnet_50_caffe_d256d256d256	0.704	0.914	0.782	0.677	0.744	0.735	0.921	0.805	0.704	0.783
256x192_pose_resnet_101_caffe_d256d256d256	0.720	0.915	0.803	0.693	0.764	0.753	0.928	0.821	0.720	0.802
256x192_pose_resnet_152_caffe_d256d256d256	0.728	0.925	0.804	0.702	0.766	0.760	0.931	0.828	0.729	0.806

Note:

Flip test is used.
Person detector has person AP of 56.4 on COCO val2017 dataset.
Difference between PyTorch-style and Caffe-style ResNet is the position of stride=2 convolution

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

Install pytorch >= v0.4.0 following official instruction.

Disable cudnn for batch_norm:

# PYTORCH=/path/to/pytorch
# for pytorch v0.4.0
sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py
# for pytorch v0.4.1
sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py

Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.

Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
Install dependencies:
```
pip install -r requirements.txt
```
Make libs:
```
cd ${POSE_ROOT}/lib
make
```

Install COCOAPI:

# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python3 setup.py install --user

Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

Download pytorch imagenet pretrained models from pytorch model zoo and caffe-style pretrained models from GoogleDrive.

Download mpii and coco pretrained models from OneDrive or GoogleDrive. Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:

${POSE_ROOT}
 `-- models
     `-- pytorch
         |-- imagenet
         |   |-- resnet50-19c8e357.pth
         |   |-- resnet50-caffe.pth.tar
         |   |-- resnet101-5d3b4d8f.pth
         |   |-- resnet101-caffe.pth.tar
         |   |-- resnet152-b121ed2d.pth
         |   `-- resnet152-caffe.pth.tar
         |-- pose_coco
         |   |-- pose_resnet_101_256x192.pth.tar
         |   |-- pose_resnet_101_384x288.pth.tar
         |   |-- pose_resnet_152_256x192.pth.tar
         |   |-- pose_resnet_152_384x288.pth.tar
         |   |-- pose_resnet_50_256x192.pth.tar
         |   `-- pose_resnet_50_384x288.pth.tar
         `-- pose_mpii
             |-- pose_resnet_101_256x256.pth.tar
             |-- pose_resnet_101_384x384.pth.tar
             |-- pose_resnet_152_256x256.pth.tar
             |-- pose_resnet_152_384x384.pth.tar
             |-- pose_resnet_50_256x256.pth.tar
             `-- pose_resnet_50_384x384.pth.tar

Init output(training model output directory) and log(tensorboard log directory) directory:

mkdir output 
mkdir log

Your directory tree should look like this:

${POSE_ROOT}
âââ data
âââ experiments
âââ lib
âââ log
âââ models
âââ output
âââ pose_estimation
âââ README.md
âââ requirements.txt

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ...

Valid on MPII using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Training on MPII

python pose_estimation/train.py \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml

Valid on COCO val2017 using pretrained models

python pose_estimation/valid.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar

Training on COCO train2017

python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

Other Implementations

TensorFlow [Version1]
PaddlePaddle [Version1]
Gluon [Version1]

Citation

If you use our code or models in your research, please cite with:

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot