YOLOX

YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/

9,723

2,283

9,723

750

View on GitHub

Top Related Projects

yolov5

53,032

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

yolov7

13,553

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

detectron2

31,544

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

YOLOv6

5,775

YOLOv6: a single-stage object detection framework dedicated to industrial applications.

mmdetection

30,080

OpenMMLab Detection Toolbox and Benchmark

darknet

21,869

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Quick Overview

YOLOX is a high-performance object detection framework based on YOLO (You Only Look Once). It introduces several improvements over previous YOLO versions, including decoupled head and SimOTA label assignment, making it more accurate and efficient for real-time object detection tasks.

Pros

Excellent performance-speed trade-off, suitable for real-time applications
Flexible architecture allowing easy customization and deployment
Supports various backbones and model sizes for different use cases
Well-documented and actively maintained

Cons

Requires significant computational resources for training
May be overkill for simpler object detection tasks
Learning curve can be steep for beginners in deep learning
Limited to object detection, not suitable for other computer vision tasks without modification

Code Examples

Loading a pre-trained YOLOX model:

from yolox.exp import get_exp
from yolox.utils import postprocess
from yolox.utils.model_utils import get_model_info

exp = get_exp('yolox_s', 'nano')
model = exp.get_model()
ckpt = torch.load("yolox_s.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()

Performing inference on an image:

import cv2
import torch

image = cv2.imread("test_image.jpg")
img, ratio = preproc(image, input_size)
img = torch.from_numpy(img).unsqueeze(0).float()

with torch.no_grad():
    outputs = model(img)
    outputs = postprocess(outputs, num_classes, conf_thres=0.7, nms_thres=0.45)

Visualizing detection results:

from yolox.utils import vis

if outputs[0] is not None:
    output = outputs[0].cpu()
    bboxes = output[:, 0:4]
    cls = output[:, 6]
    scores = output[:, 4] * output[:, 5]
    
    vis_res = vis(image, bboxes, scores, cls, conf=0.7, class_names=COCO_CLASSES)
    cv2.imwrite("result.jpg", vis_res)

Getting Started

Install YOLOX:

git clone https://github.com/Megvii-BaseDetection/YOLOX.git
cd YOLOX
pip install -v -e .

Download a pre-trained model:

wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth

Run inference on an image:

from yolox.exp import get_exp
from yolox.utils import postprocess, vis
from yolox.data.data_augment import preproc
import torch
import cv2

exp = get_exp('yolox_s', 'nano')
model = exp.get_model()
ckpt = torch.load("yolox_s.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()

image = cv2.imread("test_image.jpg")
img, ratio = preproc(image, (640, 640))
img = torch.from_numpy(img).unsqueeze(0).float()

with torch.no_grad():
    outputs = model(img)
    outputs = postprocess(outputs, 80, conf_thres=0.7, nms_thres=0.45)

# Visualize results (assuming COCO_CLASSES is defined)
if outputs[0] is not None:
    output = outputs[0].cpu()
    vis_res = vis(image, output[:, 0:4], output[:, 4] * output[:, 5], output[:, 6], conf=0.7, class_names=COCO_CLASSES)
    cv2.imwrite("result.jpg

Competitor Comparisons

yolov5

53,032

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Pros of YOLOv5

More extensive documentation and tutorials
Larger community support and frequent updates
Easier integration with various deployment platforms

Cons of YOLOv5

Slightly lower accuracy compared to YOLOX in some benchmarks
More complex architecture, potentially leading to longer training times

Code Comparison

YOLOX example:

from yolox.exp import Exp as MyExp

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()
        self.depth = 0.33
        self.width = 0.50
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

YOLOv5 example:

from models.yolo import Model
from utils.torch_utils import intersect_dicts

model = Model(cfg='models/yolov5s.yaml')
ckpt = torch.load('yolov5s.pt', map_location='cpu')
csd = ckpt['model'].float().state_dict()
model.load_state_dict(intersect_dicts(csd, model.state_dict(), exclude=['anchor']), strict=False)

Both repositories offer powerful object detection capabilities, with YOLOX focusing on improved accuracy and YOLOv5 providing a more user-friendly experience and broader ecosystem support.

yolov7

13,553

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Pros of yolov7

Higher accuracy and faster inference speed on various datasets
Includes additional features like instance segmentation and pose estimation
More active development and frequent updates

Cons of yolov7

More complex architecture, potentially harder to understand and modify
Requires more computational resources for training and inference
Less extensive documentation compared to YOLOX

Code Comparison

YOLOX example:

from yolox.exp import Exp as MyExp

class Exp(MyExp):
    def __init__(self):
        super(Exp, self).__init__()
        self.depth = 0.33
        self.width = 0.50
        self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]

yolov7 example:

from models.yolo import Model
from utils.torch_utils import intersect_dicts

model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)
state_dict = torch.load(weights, map_location=device)['model']
state_dict = intersect_dicts(state_dict, model.state_dict(), exclude=['anchor'])
model.load_state_dict(state_dict, strict=False)

Both repositories provide powerful object detection frameworks, but yolov7 offers more advanced features and potentially better performance at the cost of increased complexity and resource requirements. YOLOX may be more suitable for simpler use cases or when working with limited computational resources.

detectron2

31,544

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

Extensive library with a wide range of object detection and segmentation models
Well-documented and actively maintained by Facebook AI Research
Modular design allows for easy customization and extension

Cons of Detectron2

Steeper learning curve due to its comprehensive nature
Heavier and more resource-intensive compared to YOLOX
May be overkill for simpler object detection tasks

Code Comparison

YOLOX example:

from yolox.exp import get_exp
from yolox.utils import postprocess

exp = get_exp('yolox_s', 'nano')
model = exp.get_model()
outputs = model(img)
results = postprocess(outputs, num_classes, conf_thre=0.5, nms_thre=0.45)

Detectron2 example:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(img)

Both repositories offer powerful object detection capabilities, but YOLOX is more focused on YOLO-based models, while Detectron2 provides a broader range of algorithms and features for various computer vision tasks.

YOLOv6

5,775

YOLOv6: a single-stage object detection framework dedicated to industrial applications.

Pros of YOLOv6

Higher accuracy and faster inference speed on various datasets
More efficient network architecture with RepOpt and Network Reparameterization
Better support for deployment on different hardware platforms

Cons of YOLOv6

Less extensive documentation compared to YOLOX
Fewer pre-trained models available for different tasks
Limited community contributions and third-party implementations

Code Comparison

YOLOv6:

from yolov6.core.evaler import Evaler
from yolov6.utils.config import Config

cfg = Config.fromfile('configs/yolov6s.py')
evaler = Evaler(cfg, img_size=640)
evaler.eval()

YOLOX:

from yolox.exp import get_exp
from yolox.utils import get_model_info

exp = get_exp('yolox_s.py', 'yolox_s')
model = exp.get_model()
print(get_model_info(model, exp.test_size))

Both repositories provide easy-to-use interfaces for model evaluation and inference. YOLOv6 uses a configuration-based approach, while YOLOX employs an experiment-based system. YOLOX offers more flexibility in customizing model architectures, while YOLOv6 focuses on optimized performance out-of-the-box.

mmdetection

30,080

OpenMMLab Detection Toolbox and Benchmark

Pros of mmdetection

Extensive model zoo with a wide variety of pre-trained models
Highly modular and flexible architecture for easy customization
Comprehensive documentation and tutorials

Cons of mmdetection

Steeper learning curve due to its complexity and extensive features
Potentially slower inference speed compared to YOLOX

Code Comparison

YOLOX example:

from yolox.exp import get_exp
from yolox.utils import postprocess

exp = get_exp('yolox_s', 'coco')
model = exp.get_model()

mmdetection example:

from mmdet.apis import init_detector, inference_detector

config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')

Both repositories offer powerful object detection frameworks, but mmdetection provides a more comprehensive toolkit with a wider range of models and customization options. YOLOX, on the other hand, focuses on a specific architecture and may offer better performance in certain scenarios.

darknet

21,869

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Pros of darknet

Supports a wide range of YOLO versions (YOLOv2, YOLOv3, YOLOv4, etc.)
Includes pre-trained models for various tasks
Offers both CPU and GPU support

Cons of darknet

Written in C, which may be less accessible for some developers
Requires manual compilation and setup
Less modular architecture compared to YOLOX

Code Comparison

darknet:

layer make_yolo_layer(int batch, int w, int h, int n, int total, int *mask, int classes)
{
    int i;
    layer l = {0};
    l.type = YOLO;

YOLOX:

class YOLOXHead(nn.Module):
    def __init__(self, num_classes, width=1.0, in_channels=[256, 512, 1024], act="silu", depthwise=False):
        super().__init__()
        self.n_anchors = 1
        self.num_classes = num_classes

The darknet code is in C and focuses on low-level layer creation, while YOLOX uses Python and PyTorch for a more high-level, object-oriented approach. YOLOX's implementation is generally more readable and easier to modify for most developers familiar with modern deep learning frameworks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Introduction

YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our report on Arxiv.

This repo is an implementation of PyTorch version YOLOX, there is also a MegEngine implementation.

Updates!!

ã2023/02/28ã We support assignment visualization tool, see doc here.
ã2022/04/14ã We support jit compile op.
ã2021/08/19ã We optimize the training process with 2x faster training and ~1% higher performance! See notes for more details.
ã2021/08/05ã We release MegEngine version YOLOX.
ã2021/07/28ã We fix the fatal error of memory leak
ã2021/07/26ã We now support MegEngine deployment.
ã2021/07/20ã We have released our technical report on Arxiv.

Benchmark

Standard Models.

Model	size	mAP^val 0.5:0.95	mAP^{test 0.5:0.95}	Speed V100 (ms)	Params (M)	FLOPs (G)	weights
YOLOX-s	640	40.5	40.5	9.8	9.0	26.8	github
YOLOX-m	640	46.9	47.2	12.3	25.3	73.8	github
YOLOX-l	640	49.7	50.1	14.5	54.2	155.6	github
YOLOX-x	640	51.1	51.5	17.3	99.1	281.9	github
YOLOX-Darknet53	640	47.7	48.0	11.1	63.7	185.3	github

Legacy models

Model	size	mAP^{test 0.5:0.95}	Speed V100 (ms)	Params (M)	FLOPs (G)	weights
YOLOX-s	640	39.6	9.8	9.0	26.8	onedrive/github
YOLOX-m	640	46.4	12.3	25.3	73.8	onedrive/github
YOLOX-l	640	50.0	14.5	54.2	155.6	onedrive/github
YOLOX-x	640	51.2	17.3	99.1	281.9	onedrive/github
YOLOX-Darknet53	640	47.4	11.1	63.7	185.3	onedrive/github

Light Models.

Model	size	mAP^val 0.5:0.95	Params (M)	FLOPs (G)	weights
YOLOX-Nano	416	25.8	0.91	1.08	github
YOLOX-Tiny	416	32.8	5.06	6.45	github

Legacy models

Model	size	mAP^val 0.5:0.95	Params (M)	FLOPs (G)	weights
YOLOX-Nano	416	25.3	0.91	1.08	github
YOLOX-Tiny	416	32.8	5.06	6.45	github

Quick Start

Installation

Step1. Install YOLOX from source.

git clone git@github.com:Megvii-BaseDetection/YOLOX.git
cd YOLOX
pip3 install -v -e .  # or  python3 setup.py develop

Demo

Step1. Download a pretrained model from the benchmark table.

Step2. Use either -n or -f to specify your detector's config. For example:

python tools/demo.py image -n yolox-s -c /path/to/your/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu]

python tools/demo.py image -f exps/default/yolox_s.py -c /path/to/your/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu]

Demo for video:

python tools/demo.py video -n yolox-s -c /path/to/your/yolox_s.pth --path /path/to/your/video --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu]

Reproduce our results on COCO

Step1. Prepare COCO dataset

cd <YOLOX_HOME>
ln -s /path/to/your/COCO ./datasets/COCO

Step2. Reproduce our results on COCO by specifying -n:

python -m yolox.tools.train -n yolox-s -d 8 -b 64 --fp16 -o [--cache]
                               yolox-m
                               yolox-l
                               yolox-x

-d: number of gpu devices
-b: total batch size, the recommended number for -b is num-gpu * 8
--fp16: mixed precision training
--cache: caching imgs into RAM to accelarate training, which need large system RAM.

When using -f, the above commands are equivalent to:

python -m yolox.tools.train -f exps/default/yolox_s.py -d 8 -b 64 --fp16 -o [--cache]
                               exps/default/yolox_m.py
                               exps/default/yolox_l.py
                               exps/default/yolox_x.py

Multi Machine Training

We also support multi-nodes training. Just add the following args:

--num_machines: num of your total training nodes
--machine_rank: specify the rank of each node

Suppose you want to train YOLOX on 2 machines, and your master machines's IP is 123.123.123.123, use port 12312 and TCP.

On master machine, run

python tools/train.py -n yolox-s -b 128 --dist-url tcp://123.123.123.123:12312 --num_machines 2 --machine_rank 0

On the second machine, run

python tools/train.py -n yolox-s -b 128 --dist-url tcp://123.123.123.123:12312 --num_machines 2 --machine_rank 1

Logging to Weights & Biases

To log metrics, predictions and model checkpoints to W&B use the command line argument --logger wandb and use the prefix "wandb-" to specify arguments for initializing the wandb run.

python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o [--cache] --logger wandb wandb-project <project name>
                         yolox-m
                         yolox-l
                         yolox-x

An example wandb dashboard is available here

Others

See more information with the following command:

python -m yolox.tools.train --help

Evaluation

We support batch testing for fast evaluation:

python -m yolox.tools.eval -n  yolox-s -c yolox_s.pth -b 64 -d 8 --conf 0.001 [--fp16] [--fuse]
                               yolox-m
                               yolox-l
                               yolox-x

--fuse: fuse conv and bn
-d: number of GPUs used for evaluation. DEFAULT: All GPUs available will be used.
-b: total batch size across on all GPUs

To reproduce speed test, we use the following command:

python -m yolox.tools.eval -n  yolox-s -c yolox_s.pth -b 1 -d 1 --conf 0.001 --fp16 --fuse
                               yolox-m
                               yolox-l
                               yolox-x

Tutorials

Deployment

Third-party resources

YOLOX for streaming perception: StreamYOLO (CVPR 2022 Oral)
The YOLOX-s and YOLOX-nano are Integrated into ModelScope. Try out the Online Demo at YOLOX-s and YOLOX-Nano respectively ð.
Integrated into Huggingface Spaces ð¤ using Gradio. Try out the Web Demo:
The ncnn android app with video support: ncnn-android-yolox from FeiGeChuanShu
YOLOX with Tengine support: Tengine from BUG1989
YOLOX + ROS2 Foxy: YOLOX-ROS from Ar-Ray
YOLOX Deploy DeepStream: YOLOX-deepstream from nanmi
YOLOX MNN/TNN/ONNXRuntime: YOLOX-MNNãYOLOX-TNN and YOLOX-ONNXRuntime C++ from DefTruth
Converting darknet or yolov5 datasets to COCO format for YOLOX: YOLO2COCO from Daniel

Cite YOLOX

If you use YOLOX in your research, please cite our work by using the following BibTeX entry:

 @article{yolox2021,
  title={YOLOX: Exceeding YOLO Series in 2021},
  author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
  journal={arXiv preprint arXiv:2107.08430},
  year={2021}
}

In memory of Dr. Jian Sun

Without the guidance of Dr. Jian Sun, YOLOX would not have been released and open sourced to the community. The passing away of Dr. Sun is a huge loss to the Computer Vision field. We add this section here to express our remembrance and condolences to our captain Dr. Sun. It is hoped that every AI practitioner in the world will stick to the belief of "continuous innovation to expand cognitive boundaries, and extraordinary technology to achieve product value" and move forward all the way.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot