YOLOX
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Top Related Projects
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
OpenMMLab Detection Toolbox and Benchmark
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
Quick Overview
YOLOX is a high-performance object detection framework based on YOLO (You Only Look Once). It introduces several improvements over previous YOLO versions, including decoupled head and SimOTA label assignment, making it more accurate and efficient for real-time object detection tasks.
Pros
- Excellent performance-speed trade-off, suitable for real-time applications
- Flexible architecture allowing easy customization and deployment
- Supports various backbones and model sizes for different use cases
- Well-documented and actively maintained
Cons
- Requires significant computational resources for training
- May be overkill for simpler object detection tasks
- Learning curve can be steep for beginners in deep learning
- Limited to object detection, not suitable for other computer vision tasks without modification
Code Examples
- Loading a pre-trained YOLOX model:
from yolox.exp import get_exp
from yolox.utils import postprocess
from yolox.utils.model_utils import get_model_info
exp = get_exp('yolox_s', 'nano')
model = exp.get_model()
ckpt = torch.load("yolox_s.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
- Performing inference on an image:
import cv2
import torch
image = cv2.imread("test_image.jpg")
img, ratio = preproc(image, input_size)
img = torch.from_numpy(img).unsqueeze(0).float()
with torch.no_grad():
outputs = model(img)
outputs = postprocess(outputs, num_classes, conf_thres=0.7, nms_thres=0.45)
- Visualizing detection results:
from yolox.utils import vis
if outputs[0] is not None:
output = outputs[0].cpu()
bboxes = output[:, 0:4]
cls = output[:, 6]
scores = output[:, 4] * output[:, 5]
vis_res = vis(image, bboxes, scores, cls, conf=0.7, class_names=COCO_CLASSES)
cv2.imwrite("result.jpg", vis_res)
Getting Started
- Install YOLOX:
git clone https://github.com/Megvii-BaseDetection/YOLOX.git
cd YOLOX
pip install -v -e .
- Download a pre-trained model:
wget https://github.com/Megvii-BaseDetection/YOLOX/releases/download/0.1.1rc0/yolox_s.pth
- Run inference on an image:
from yolox.exp import get_exp
from yolox.utils import postprocess, vis
from yolox.data.data_augment import preproc
import torch
import cv2
exp = get_exp('yolox_s', 'nano')
model = exp.get_model()
ckpt = torch.load("yolox_s.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
image = cv2.imread("test_image.jpg")
img, ratio = preproc(image, (640, 640))
img = torch.from_numpy(img).unsqueeze(0).float()
with torch.no_grad():
outputs = model(img)
outputs = postprocess(outputs, 80, conf_thres=0.7, nms_thres=0.45)
# Visualize results (assuming COCO_CLASSES is defined)
if outputs[0] is not None:
output = outputs[0].cpu()
vis_res = vis(image, output[:, 0:4], output[:, 4] * output[:, 5], output[:, 6], conf=0.7, class_names=COCO_CLASSES)
cv2.imwrite("result.jpg
Competitor Comparisons
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Pros of YOLOv5
- More extensive documentation and tutorials
- Larger community support and frequent updates
- Easier integration with various deployment platforms
Cons of YOLOv5
- Slightly lower accuracy compared to YOLOX in some benchmarks
- More complex architecture, potentially leading to longer training times
Code Comparison
YOLOX example:
from yolox.exp import Exp as MyExp
class Exp(MyExp):
def __init__(self):
super(Exp, self).__init__()
self.depth = 0.33
self.width = 0.50
self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
YOLOv5 example:
from models.yolo import Model
from utils.torch_utils import intersect_dicts
model = Model(cfg='models/yolov5s.yaml')
ckpt = torch.load('yolov5s.pt', map_location='cpu')
csd = ckpt['model'].float().state_dict()
model.load_state_dict(intersect_dicts(csd, model.state_dict(), exclude=['anchor']), strict=False)
Both repositories offer powerful object detection capabilities, with YOLOX focusing on improved accuracy and YOLOv5 providing a more user-friendly experience and broader ecosystem support.
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Pros of yolov7
- Higher accuracy and faster inference speed on various datasets
- Includes additional features like instance segmentation and pose estimation
- More active development and frequent updates
Cons of yolov7
- More complex architecture, potentially harder to understand and modify
- Requires more computational resources for training and inference
- Less extensive documentation compared to YOLOX
Code Comparison
YOLOX example:
from yolox.exp import Exp as MyExp
class Exp(MyExp):
def __init__(self):
super(Exp, self).__init__()
self.depth = 0.33
self.width = 0.50
self.exp_name = os.path.split(os.path.realpath(__file__))[1].split(".")[0]
yolov7 example:
from models.yolo import Model
from utils.torch_utils import intersect_dicts
model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)
state_dict = torch.load(weights, map_location=device)['model']
state_dict = intersect_dicts(state_dict, model.state_dict(), exclude=['anchor'])
model.load_state_dict(state_dict, strict=False)
Both repositories provide powerful object detection frameworks, but yolov7 offers more advanced features and potentially better performance at the cost of increased complexity and resource requirements. YOLOX may be more suitable for simpler use cases or when working with limited computational resources.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- Extensive library with a wide range of object detection and segmentation models
- Well-documented and actively maintained by Facebook AI Research
- Modular design allows for easy customization and extension
Cons of Detectron2
- Steeper learning curve due to its comprehensive nature
- Heavier and more resource-intensive compared to YOLOX
- May be overkill for simpler object detection tasks
Code Comparison
YOLOX example:
from yolox.exp import get_exp
from yolox.utils import postprocess
exp = get_exp('yolox_s', 'nano')
model = exp.get_model()
outputs = model(img)
results = postprocess(outputs, num_classes, conf_thre=0.5, nms_thre=0.45)
Detectron2 example:
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(img)
Both repositories offer powerful object detection capabilities, but YOLOX is more focused on YOLO-based models, while Detectron2 provides a broader range of algorithms and features for various computer vision tasks.
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
Pros of YOLOv6
- Higher accuracy and faster inference speed on various datasets
- More efficient network architecture with RepOpt and Network Reparameterization
- Better support for deployment on different hardware platforms
Cons of YOLOv6
- Less extensive documentation compared to YOLOX
- Fewer pre-trained models available for different tasks
- Limited community contributions and third-party implementations
Code Comparison
YOLOv6:
from yolov6.core.evaler import Evaler
from yolov6.utils.config import Config
cfg = Config.fromfile('configs/yolov6s.py')
evaler = Evaler(cfg, img_size=640)
evaler.eval()
YOLOX:
from yolox.exp import get_exp
from yolox.utils import get_model_info
exp = get_exp('yolox_s.py', 'yolox_s')
model = exp.get_model()
print(get_model_info(model, exp.test_size))
Both repositories provide easy-to-use interfaces for model evaluation and inference. YOLOv6 uses a configuration-based approach, while YOLOX employs an experiment-based system. YOLOX offers more flexibility in customizing model architectures, while YOLOv6 focuses on optimized performance out-of-the-box.
OpenMMLab Detection Toolbox and Benchmark
Pros of mmdetection
- Extensive model zoo with a wide variety of pre-trained models
- Highly modular and flexible architecture for easy customization
- Comprehensive documentation and tutorials
Cons of mmdetection
- Steeper learning curve due to its complexity and extensive features
- Potentially slower inference speed compared to YOLOX
Code Comparison
YOLOX example:
from yolox.exp import get_exp
from yolox.utils import postprocess
exp = get_exp('yolox_s', 'coco')
model = exp.get_model()
mmdetection example:
from mmdet.apis import init_detector, inference_detector
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
Both repositories offer powerful object detection frameworks, but mmdetection provides a more comprehensive toolkit with a wider range of models and customization options. YOLOX, on the other hand, focuses on a specific architecture and may offer better performance in certain scenarios.
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
Pros of darknet
- Supports a wide range of YOLO versions (YOLOv2, YOLOv3, YOLOv4, etc.)
- Includes pre-trained models for various tasks
- Offers both CPU and GPU support
Cons of darknet
- Written in C, which may be less accessible for some developers
- Requires manual compilation and setup
- Less modular architecture compared to YOLOX
Code Comparison
darknet:
layer make_yolo_layer(int batch, int w, int h, int n, int total, int *mask, int classes)
{
int i;
layer l = {0};
l.type = YOLO;
YOLOX:
class YOLOXHead(nn.Module):
def __init__(self, num_classes, width=1.0, in_channels=[256, 512, 1024], act="silu", depthwise=False):
super().__init__()
self.n_anchors = 1
self.num_classes = num_classes
The darknet code is in C and focuses on low-level layer creation, while YOLOX uses Python and PyTorch for a more high-level, object-oriented approach. YOLOX's implementation is generally more readable and easier to modify for most developers familiar with modern deep learning frameworks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Introduction
YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and industrial communities. For more details, please refer to our report on Arxiv.
This repo is an implementation of PyTorch version YOLOX, there is also a MegEngine implementation.
Updates!!
- ã2023/02/28ã We support assignment visualization tool, see doc here.
- ã2022/04/14ã We support jit compile op.
- ã2021/08/19ã We optimize the training process with 2x faster training and ~1% higher performance! See notes for more details.
- ã2021/08/05ã We release MegEngine version YOLOX.
- ã2021/07/28ã We fix the fatal error of memory leak
- ã2021/07/26ã We now support MegEngine deployment.
- ã2021/07/20ã We have released our technical report on Arxiv.
Benchmark
Standard Models.
Model | size | mAPval 0.5:0.95 | mAPtest 0.5:0.95 | Speed V100 (ms) | Params (M) | FLOPs (G) | weights |
---|---|---|---|---|---|---|---|
YOLOX-s | 640 | 40.5 | 40.5 | 9.8 | 9.0 | 26.8 | github |
YOLOX-m | 640 | 46.9 | 47.2 | 12.3 | 25.3 | 73.8 | github |
YOLOX-l | 640 | 49.7 | 50.1 | 14.5 | 54.2 | 155.6 | github |
YOLOX-x | 640 | 51.1 | 51.5 | 17.3 | 99.1 | 281.9 | github |
YOLOX-Darknet53 | 640 | 47.7 | 48.0 | 11.1 | 63.7 | 185.3 | github |
Legacy models
Model | size | mAPtest 0.5:0.95 | Speed V100 (ms) | Params (M) | FLOPs (G) | weights |
---|---|---|---|---|---|---|
YOLOX-s | 640 | 39.6 | 9.8 | 9.0 | 26.8 | onedrive/github |
YOLOX-m | 640 | 46.4 | 12.3 | 25.3 | 73.8 | onedrive/github |
YOLOX-l | 640 | 50.0 | 14.5 | 54.2 | 155.6 | onedrive/github |
YOLOX-x | 640 | 51.2 | 17.3 | 99.1 | 281.9 | onedrive/github |
YOLOX-Darknet53 | 640 | 47.4 | 11.1 | 63.7 | 185.3 | onedrive/github |
Light Models.
Model | size | mAPval 0.5:0.95 | Params (M) | FLOPs (G) | weights |
---|---|---|---|---|---|
YOLOX-Nano | 416 | 25.8 | 0.91 | 1.08 | github |
YOLOX-Tiny | 416 | 32.8 | 5.06 | 6.45 | github |
Legacy models
Model | size | mAPval 0.5:0.95 | Params (M) | FLOPs (G) | weights |
---|---|---|---|---|---|
YOLOX-Nano | 416 | 25.3 | 0.91 | 1.08 | github |
YOLOX-Tiny | 416 | 32.8 | 5.06 | 6.45 | github |
Quick Start
Installation
Step1. Install YOLOX from source.
git clone git@github.com:Megvii-BaseDetection/YOLOX.git
cd YOLOX
pip3 install -v -e . # or python3 setup.py develop
Demo
Step1. Download a pretrained model from the benchmark table.
Step2. Use either -n or -f to specify your detector's config. For example:
python tools/demo.py image -n yolox-s -c /path/to/your/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu]
or
python tools/demo.py image -f exps/default/yolox_s.py -c /path/to/your/yolox_s.pth --path assets/dog.jpg --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu]
Demo for video:
python tools/demo.py video -n yolox-s -c /path/to/your/yolox_s.pth --path /path/to/your/video --conf 0.25 --nms 0.45 --tsize 640 --save_result --device [cpu/gpu]
Reproduce our results on COCO
Step1. Prepare COCO dataset
cd <YOLOX_HOME>
ln -s /path/to/your/COCO ./datasets/COCO
Step2. Reproduce our results on COCO by specifying -n:
python -m yolox.tools.train -n yolox-s -d 8 -b 64 --fp16 -o [--cache]
yolox-m
yolox-l
yolox-x
- -d: number of gpu devices
- -b: total batch size, the recommended number for -b is num-gpu * 8
- --fp16: mixed precision training
- --cache: caching imgs into RAM to accelarate training, which need large system RAM.
When using -f, the above commands are equivalent to:
python -m yolox.tools.train -f exps/default/yolox_s.py -d 8 -b 64 --fp16 -o [--cache]
exps/default/yolox_m.py
exps/default/yolox_l.py
exps/default/yolox_x.py
Multi Machine Training
We also support multi-nodes training. Just add the following args:
- --num_machines: num of your total training nodes
- --machine_rank: specify the rank of each node
Suppose you want to train YOLOX on 2 machines, and your master machines's IP is 123.123.123.123, use port 12312 and TCP.
On master machine, run
python tools/train.py -n yolox-s -b 128 --dist-url tcp://123.123.123.123:12312 --num_machines 2 --machine_rank 0
On the second machine, run
python tools/train.py -n yolox-s -b 128 --dist-url tcp://123.123.123.123:12312 --num_machines 2 --machine_rank 1
Logging to Weights & Biases
To log metrics, predictions and model checkpoints to W&B use the command line argument --logger wandb
and use the prefix "wandb-" to specify arguments for initializing the wandb run.
python tools/train.py -n yolox-s -d 8 -b 64 --fp16 -o [--cache] --logger wandb wandb-project <project name>
yolox-m
yolox-l
yolox-x
An example wandb dashboard is available here
Others
See more information with the following command:
python -m yolox.tools.train --help
Evaluation
We support batch testing for fast evaluation:
python -m yolox.tools.eval -n yolox-s -c yolox_s.pth -b 64 -d 8 --conf 0.001 [--fp16] [--fuse]
yolox-m
yolox-l
yolox-x
- --fuse: fuse conv and bn
- -d: number of GPUs used for evaluation. DEFAULT: All GPUs available will be used.
- -b: total batch size across on all GPUs
To reproduce speed test, we use the following command:
python -m yolox.tools.eval -n yolox-s -c yolox_s.pth -b 1 -d 1 --conf 0.001 --fp16 --fuse
yolox-m
yolox-l
yolox-x
Tutorials
Deployment
- MegEngine in C++ and Python
- ONNX export and an ONNXRuntime
- TensorRT in C++ and Python
- ncnn in C++ and Java
- OpenVINO in C++ and Python
- Accelerate YOLOX inference with nebullvm in Python
Third-party resources
- YOLOX for streaming perception: StreamYOLO (CVPR 2022 Oral)
- The YOLOX-s and YOLOX-nano are Integrated into ModelScope. Try out the Online Demo at YOLOX-s and YOLOX-Nano respectively ð.
- Integrated into Huggingface Spaces ð¤ using Gradio. Try out the Web Demo:
- The ncnn android app with video support: ncnn-android-yolox from FeiGeChuanShu
- YOLOX with Tengine support: Tengine from BUG1989
- YOLOX + ROS2 Foxy: YOLOX-ROS from Ar-Ray
- YOLOX Deploy DeepStream: YOLOX-deepstream from nanmi
- YOLOX MNN/TNN/ONNXRuntime: YOLOX-MNNãYOLOX-TNN and YOLOX-ONNXRuntime C++ from DefTruth
- Converting darknet or yolov5 datasets to COCO format for YOLOX: YOLO2COCO from Daniel
Cite YOLOX
If you use YOLOX in your research, please cite our work by using the following BibTeX entry:
@article{yolox2021,
title={YOLOX: Exceeding YOLO Series in 2021},
author={Ge, Zheng and Liu, Songtao and Wang, Feng and Li, Zeming and Sun, Jian},
journal={arXiv preprint arXiv:2107.08430},
year={2021}
}
In memory of Dr. Jian Sun
Without the guidance of Dr. Jian Sun, YOLOX would not have been released and open sourced to the community. The passing away of Dr. Sun is a huge loss to the Computer Vision field. We add this section here to express our remembrance and condolences to our captain Dr. Sun. It is hoped that every AI practitioner in the world will stick to the belief of "continuous innovation to expand cognitive boundaries, and extraordinary technology to achieve product value" and move forward all the way.
没æååå士çæ导ï¼YOLOXä¹ä¸ä¼é®ä¸å¹¶å¼æºç»ç¤¾åºä½¿ç¨ã ååå士ç离å»æ¯CVé¢åçä¸å¤§æ失ï¼æ们å¨æ¤ç¹å«æ·»å äºè¿ä¸ªé¨åæ¥è¡¨è¾¾å¯¹æ们çâè¹é¿âåèå¸ç纪念ååæã å¸æä¸çä¸çæ¯ä¸ªAIä»ä¸è ç§æçâæç»åæ°æå±è®¤ç¥è¾¹çï¼éå¡ç§ææ就产åä»·å¼âçè§å¿µï¼ä¸è·¯ååãTop Related Projects
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
OpenMMLab Detection Toolbox and Benchmark
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot