The project is an official implement of our ECCV2018 paper "Simple Baselines for Human Pose Estimation and Tracking("


Quick Overview

Microsoft's human-pose-estimation.pytorch is an open-source project for human pose estimation using PyTorch. It implements state-of-the-art deep learning models for detecting and tracking human body keypoints in images and videos. The repository provides pre-trained models, training scripts, and evaluation tools for researchers and developers working on human pose estimation tasks.


  • High accuracy and performance on standard pose estimation benchmarks
  • Supports both 2D and 3D pose estimation
  • Includes pre-trained models for quick deployment
  • Comprehensive documentation and example usage


  • Requires significant computational resources for training
  • Limited to human pose estimation (not suitable for other object types)
  • Dependency on specific versions of PyTorch and other libraries
  • May require fine-tuning for specific use cases or datasets

Code Examples

  1. Loading a pre-trained model:
from models.pose_resnet import get_pose_net
model = get_pose_net(cfg, is_train=False)
  1. Performing inference on an image:
from utils.transforms import get_affine_transform
input = cv2.imread('path/to/image.jpg')
input = cv2.cvtColor(input, cv2.COLOR_BGR2RGB)
input = get_affine_transform(input, center, scale, rotation, cfg.MODEL.IMAGE_SIZE)
input = torch.from_numpy(input).unsqueeze(0).float()
output = model(input)
  1. Visualizing the detected keypoints:
from utils.vis import save_batch_image_with_joints
save_batch_image_with_joints(input, output, 'output_image.jpg')

Getting Started

  1. Clone the repository:

    git clone
    cd human-pose-estimation.pytorch
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download pre-trained models:

    mkdir models
    wget -O models/resnet50-19c8e357.pth
  4. Run inference on an image:

    python tools/ --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml --checkpoint models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar --image examples/demo.jpg

Simple Baselines for Human Pose Estimation and Tracking



This is an official pytorch implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks. On COCO keypoints valid dataset, our best single model achieves 74.3 of mAP. You can reproduce our results using this repo. All models are provided for research purpose.

Main Results

Results on MPII val



  • Flip test is used.

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

ArchAPAp .5AP .75AP (M)AP (L)ARAR .5AR .75AR (M)AR (L)

Results on Caffe-style ResNet

ArchAPAp .5AP .75AP (M)AP (L)ARAR .5AR .75AR (M)AR (L)


  • Flip test is used.
  • Person detector has person AP of 56.4 on COCO val2017 dataset.
  • Difference between PyTorch-style and Caffe-style ResNet is the position of stride=2 convolution


The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start


  1. Install pytorch >= v0.4.0 following official instruction.

  2. Disable cudnn for batch_norm:

    # PYTORCH=/path/to/pytorch
    # for pytorch v0.4.0
    sed -i "1194s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/
    # for pytorch v0.4.1
    sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/

    Note that instructions like # PYTORCH=/path/to/pytorch indicate that you should pick a path where you'd like to have pytorch installed and then set an environment variable (PYTORCH in this case) accordingly.

  3. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  4. Install dependencies:

    pip install -r requirements.txt
  5. Make libs:

    cd ${POSE_ROOT}/lib
  6. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 install --user

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  7. Download pytorch imagenet pretrained models from pytorch model zoo and caffe-style pretrained models from GoogleDrive.

  8. Download mpii and coco pretrained models from OneDrive or GoogleDrive. Please download them under ${POSE_ROOT}/models/pytorch, and make them look like this:

     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet50-caffe.pth.tar
             |   |-- resnet101-5d3b4d8f.pth
             |   |-- resnet101-caffe.pth.tar
             |   |-- resnet152-b121ed2d.pth
             |   `-- resnet152-caffe.pth.tar
             |-- pose_coco
             |   |-- pose_resnet_101_256x192.pth.tar
             |   |-- pose_resnet_101_384x288.pth.tar
             |   |-- pose_resnet_152_256x192.pth.tar
             |   |-- pose_resnet_152_384x288.pth.tar
             |   |-- pose_resnet_50_256x192.pth.tar
             |   `-- pose_resnet_50_384x288.pth.tar
             `-- pose_mpii
                 |-- pose_resnet_101_256x256.pth.tar
                 |-- pose_resnet_101_384x384.pth.tar
                 |-- pose_resnet_152_256x256.pth.tar
                 |-- pose_resnet_152_384x384.pth.tar
                 |-- pose_resnet_50_256x256.pth.tar
                 `-- pose_resnet_50_384x384.pth.tar
  9. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log

    Your directory tree should look like this:

    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── pose_estimation
    └── requirements.txt

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Valid on MPII using pretrained models

python pose_estimation/ \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_mpii/pose_resnet_50_256x256.pth.tar

Training on MPII

python pose_estimation/ \
    --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml

Valid on COCO val2017 using pretrained models

python pose_estimation/ \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml \
    --flip-test \
    --model-file models/pytorch/pose_coco/pose_resnet_50_256x192.pth.tar

Training on COCO train2017

python pose_estimation/ \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

If you use our code or models in your research, please cite with:

    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}