yolor
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
Top Related Projects
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Models and examples built with TensorFlow
OpenMMLab Detection Toolbox and Benchmark
Quick Overview
The WongKinYiu/yolor repository is a PyTorch implementation of the YOLOR (You Only Look Once Refined) object detection model. YOLOR is a real-time object detection system that aims to achieve high accuracy and efficiency on a wide range of tasks and datasets.
Pros
- High Accuracy: YOLOR has been shown to achieve state-of-the-art performance on several object detection benchmarks, including COCO and Pascal VOC.
- Real-Time Performance: The model is designed to run at high frame rates, making it suitable for real-time applications such as video surveillance and autonomous vehicles.
- Versatility: YOLOR can be applied to a variety of object detection tasks and can be easily fine-tuned on different datasets.
- Open-Source: The PyTorch implementation is available on GitHub, allowing for easy customization and contribution to the project.
Cons
- Complexity: The YOLOR model is relatively complex, with a large number of parameters and a sophisticated architecture, which may make it more challenging to understand and modify.
- Hardware Requirements: Running YOLOR efficiently may require powerful hardware, such as high-end GPUs, which can be a barrier for some users.
- Limited Documentation: The project's documentation could be more comprehensive, making it harder for new users to get started with the library.
- Ongoing Development: As an active project, YOLOR may undergo frequent updates and changes, which could make it more difficult to maintain a stable integration in production environments.
Code Examples
Here are a few code examples demonstrating the usage of the YOLOR PyTorch implementation:
- Loading a Pre-Trained Model:
import torch
from yolor.models.experimental import attempt_load
# Load a pre-trained YOLOR model
model = attempt_load('yolor_p6.pt', map_location=torch.device('cpu'))
- Performing Object Detection:
import cv2
from yolor.utils.datasets import letterbox
from yolor.utils.general import non_max_suppression, scale_coords
# Load an image and perform object detection
img = cv2.imread('image.jpg')
img, ratio, (dw, dh) = letterbox(img, auto=True)
pred = model(img)[0]
pred = non_max_suppression(pred, 0.4, 0.5)
for i, det in enumerate(pred):
det[:4] = scale_coords(img.shape[2:], det[:4], img.shape).round()
# Visualize the detected objects
# ...
- Evaluating Model Performance:
from yolor.utils.metrics import ap_per_class
from yolor.utils.general import coco80_to_coco91_class
# Evaluate the model's performance on the COCO dataset
targets, predictions = load_coco_data()
names = coco80_to_coco91_class()
p, r, ap, f1, ap_class = ap_per_class(predictions, targets, names=names)
# Print the results
# ...
Getting Started
To get started with the YOLOR PyTorch implementation, follow these steps:
- Clone the repository:
git clone https://github.com/WongKinYiu/yolor.git
- Install the required dependencies:
cd yolor
pip install -r requirements.txt
- Download a pre-trained YOLOR model:
wget https://github.com/WongKinYiu/yolor/releases/download/v0.1/yolor_p6.pt
- Run the object detection example:
from yolor.detect import run
run(
weights='yolor_p6.pt',
source='0', # webcam
imgsz=1280,
conf_thres=0.25,
iou_thres=0.45,
max_det=1000,
device='0',
view_img=True,
save_txt=False,
save_conf=False,
save_crop=False,
nosave=False,
Competitor Comparisons
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Pros of YOLOv5
- More extensive documentation and tutorials
- Larger community and more frequent updates
- Better integration with PyTorch ecosystem
Cons of YOLOv5
- Slightly lower performance on some benchmarks
- Less focus on novel architectural improvements
Code Comparison
YOLOv5:
from models.yolo import Model
from utils.torch_utils import select_device
device = select_device('0')
model = Model('yolov5s.yaml', ch=3, nc=80).to(device)
YOLOR:
from models.yolo import Model
from utils.torch_utils import select_device
device = select_device('0')
model = Model('cfg/yolor_p6.cfg', ch=3, nc=80).to(device)
The code structure is similar, but YOLOR uses a different configuration file format (.cfg) compared to YOLOv5's YAML-based configuration.
YOLOv5 offers a more standardized approach with better documentation, making it easier for beginners to get started. However, YOLOR introduces some architectural innovations that can lead to improved performance in certain scenarios.
Both projects are actively maintained and offer state-of-the-art object detection capabilities. The choice between them often depends on specific project requirements and user familiarity with each framework.
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
Pros of darknet
- More established and widely used in the computer vision community
- Supports a broader range of YOLO versions (v2, v3, v4) and other architectures
- Extensive documentation and community support
Cons of darknet
- Less focus on recent YOLO improvements and optimizations
- May have slower inference speed compared to YOLOR's optimized implementation
- Requires more manual configuration for advanced features
Code Comparison
darknet:
layer make_yolo_layer(int batch, int w, int h, int n, int total, int *mask, int classes)
{
int i;
layer l = {0};
l.type = YOLO;
YOLOR:
class YOLOR(nn.Module):
def __init__(self, nc=80, anchors=()):
super(YOLOR, self).__init__()
self.nc = nc # number of classes
self.no = nc + 5 # number of outputs per anchor
The code snippets show the different implementation languages and approaches. darknet uses C for lower-level control, while YOLOR utilizes Python with PyTorch for easier integration and development.
YOLOR focuses on recent YOLO improvements and optimizations, potentially offering better performance in certain scenarios. However, darknet provides a more comprehensive set of features and broader architecture support, making it suitable for a wider range of applications.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- More comprehensive and feature-rich, supporting a wider range of computer vision tasks
- Better documentation and community support, making it easier to use and extend
- Modular architecture allows for easier customization and experimentation
Cons of Detectron2
- Steeper learning curve due to its complexity and extensive features
- Potentially slower inference time compared to YOLOR's optimized architecture
- Requires more computational resources for training and inference
Code Comparison
YOLOR (simplified detection code):
from models.models import *
from utils.datasets import *
from utils.utils import *
model = Darknet('cfg/yolor_p6.cfg', img_size)
model.load_state_dict(torch.load('yolor_p6.pt'))
img = torch.zeros((1, 3, img_size, img_size))
output = model(img)
Detectron2 (simplified detection code):
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)
Both repositories focus on object detection, but Detectron2 offers a more comprehensive toolkit for various computer vision tasks. YOLOR is more specialized and optimized for real-time object detection, while Detectron2 provides a flexible framework for a broader range of applications.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Pros of Mask_RCNN
- Provides instance segmentation in addition to object detection
- Well-documented and easier to understand for beginners
- Supports both TensorFlow and Keras backends
Cons of Mask_RCNN
- Generally slower inference speed compared to YOLOR
- May require more computational resources for training and inference
- Less suitable for real-time applications
Code Comparison
Mask_RCNN:
import mrcnn.model as modellib
from mrcnn import utils
model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(WEIGHTS_PATH, by_name=True)
results = model.detect([image], verbose=1)
YOLOR:
from models.models import *
from utils.datasets import *
from utils.utils import *
model = Darknet(cfg, img_size)
model.load_state_dict(torch.load(weights, map_location=device)['model'])
pred = model(img.to(device))[0]
The code snippets show the basic setup and inference process for both models. Mask_RCNN uses a more straightforward approach with built-in functions, while YOLOR requires more manual setup but offers more flexibility in terms of customization.
Models and examples built with TensorFlow
Pros of TensorFlow Models
- Comprehensive collection of models and examples across various domains
- Backed by Google, with extensive documentation and community support
- Supports multiple deep learning tasks beyond object detection
Cons of TensorFlow Models
- Steeper learning curve due to the breadth of the repository
- May require more setup and configuration for specific tasks
- Potentially slower inference compared to YOLOR's optimized architecture
Code Comparison
YOLOR (PyTorch):
from models.models import *
from utils.utils import *
model = Darknet('cfg/yolor_p6.cfg', imgsize)
model.load_state_dict(torch.load('yolor_p6.pt')['model'])
TensorFlow Models:
import tensorflow as tf
from object_detection import model_lib_v2
model_dir = 'path/to/model'
pipeline_config = 'path/to/pipeline.config'
model_lib_v2.train_loop(pipeline_config_path=pipeline_config, model_dir=model_dir)
Summary
YOLOR focuses on a single, highly optimized object detection model, while TensorFlow Models offers a diverse range of pre-trained models and examples. YOLOR may be easier to get started with for specific object detection tasks, while TensorFlow Models provides more flexibility and options for various deep learning applications.
OpenMMLab Detection Toolbox and Benchmark
Pros of mmdetection
- Comprehensive framework with support for multiple object detection algorithms
- Extensive documentation and community support
- Modular design allowing easy customization and extension
Cons of mmdetection
- Steeper learning curve due to its complexity
- Potentially slower inference time for some models compared to YOLOR
Code Comparison
YOLOR (model definition):
class YOLOR(nn.Module):
def __init__(self, nc=80, anchors=(), ch=()):
super(YOLOR, self).__init__()
self.backbone = CSPDarknet(ch)
self.head = YOLOHead(nc, anchors)
mmdetection (model configuration):
model = dict(
type='YOLOV3',
backbone=dict(type='Darknet', depth=53),
neck=dict(type='YOLOV3Neck'),
bbox_head=dict(type='YOLOV3Head', num_classes=80)
)
YOLOR focuses on a single, highly optimized architecture, while mmdetection provides a flexible framework for implementing various object detection algorithms. YOLOR may offer better performance for specific use cases, while mmdetection provides more options and easier experimentation with different models and techniques.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
YOLOR
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks
To get the results on the table, please use this branch.
Model | Test Size | APtest | AP50test | AP75test | batch1 throughput | batch32 inference |
---|---|---|---|---|---|---|
YOLOR-CSP | 640 | 52.8% | 71.2% | 57.6% | 106 fps | 3.2 ms |
YOLOR-CSP-X | 640 | 54.8% | 73.1% | 59.7% | 87 fps | 5.5 ms |
YOLOR-P6 | 1280 | 55.7% | 73.3% | 61.0% | 76 fps | 8.3 ms |
YOLOR-W6 | 1280 | 56.9% | 74.4% | 62.2% | 66 fps | 10.7 ms |
YOLOR-E6 | 1280 | 57.6% | 75.2% | 63.0% | 45 fps | 17.1 ms |
YOLOR-D6 | 1280 | 58.2% | 75.8% | 63.8% | 34 fps | 21.8 ms |
YOLOv4-P5 | 896 | 51.8% | 70.3% | 56.6% | 41 fps (old) | - |
YOLOv4-P6 | 1280 | 54.5% | 72.6% | 59.8% | 30 fps (old) | - |
YOLOv4-P7 | 1536 | 55.5% | 73.4% | 60.8% | 16 fps (old) | - |
- Fix the speed bottleneck on our NFS, many thanks to NCHC, TWCC, and NARLabs support teams.
Model | Test Size | APval | AP50val | AP75val | APSval | APMval | APLval | weights |
---|---|---|---|---|---|---|---|---|
YOLOv4-CSP | 640 | 49.1% | 67.7% | 53.8% | 32.1% | 54.4% | 63.2% | - |
YOLOR-CSP | 640 | 49.2% | 67.6% | 53.7% | 32.9% | 54.4% | 63.0% | weights |
YOLOR-CSP* | 640 | 50.0% | 68.7% | 54.3% | 34.2% | 55.1% | 64.3% | weights |
YOLOv4-CSP-X | 640 | 50.9% | 69.3% | 55.4% | 35.3% | 55.8% | 64.8% | - |
YOLOR-CSP-X | 640 | 51.1% | 69.6% | 55.7% | 35.7% | 56.0% | 65.2% | weights |
YOLOR-CSP-X* | 640 | 51.5% | 69.9% | 56.1% | 35.8% | 56.8% | 66.1% | weights |
Developing...
Model | Test Size | APtest | AP50test | AP75test | APStest | APMtest | APLtest |
---|---|---|---|---|---|---|---|
YOLOR-CSP | 640 | 51.1% | 69.6% | 55.7% | 31.7% | 55.3% | 64.7% |
YOLOR-CSP-X | 640 | 53.0% | 71.4% | 57.9% | 33.7% | 57.1% | 66.8% |
Train from scratch for 300 epochs...
Model | Info | Test Size | AP |
---|---|---|---|
YOLOR-CSP | evolution | 640 | 48.0% |
YOLOR-CSP | strategy | 640 | 50.0% |
YOLOR-CSP | strategy + simOTA | 640 | 51.1% |
YOLOR-CSP-X | strategy | 640 | 51.5% |
YOLOR-CSP-X | strategy + simOTA | 640 | 53.0% |
Installation
Docker environment (recommended)
Expand
# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolor -it -v your_coco_path/:/coco/ -v your_code_path/:/yolor --shm-size=64g nvcr.io/nvidia/pytorch:20.11-py3
# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx
# pip install required packages
pip install seaborn thop
# install mish-cuda if you want to use mish activation
# https://github.com/thomasbrandon/mish-cuda
# https://github.com/JunnYu/mish-cuda
cd /
git clone https://github.com/JunnYu/mish-cuda
cd mish-cuda
python setup.py build install
# install pytorch_wavelets if you want to use dwt down-sampling module
# https://github.com/fbcotter/pytorch_wavelets
cd /
git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .
# go to code folder
cd /yolor
Colab environment
Expand
git clone https://github.com/WongKinYiu/yolor
cd yolor
# pip install required packages
pip install -qr requirements.txt
# install mish-cuda if you want to use mish activation
# https://github.com/thomasbrandon/mish-cuda
# https://github.com/JunnYu/mish-cuda
git clone https://github.com/JunnYu/mish-cuda
cd mish-cuda
python setup.py build install
cd ..
# install pytorch_wavelets if you want to use dwt down-sampling module
# https://github.com/fbcotter/pytorch_wavelets
git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .
cd ..
Prepare COCO dataset
Expand
cd /yolor
bash scripts/get_coco.sh
Prepare pretrained weight
Expand
cd /yolor
bash scripts/get_pretrain.sh
Testing
python test.py --data data/coco.yaml --img 1280 --batch 32 --conf 0.001 --iou 0.65 --device 0 --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --name yolor_p6_val
You will get the results:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.52510
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.70718
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.57520
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.37058
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.56878
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66102
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.39181
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.65229
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.71441
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.57755
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.75337
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.84013
Training
Single GPU training:
python train.py --batch-size 8 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0 --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
Multiple GPU training:
python -m torch.distributed.launch --nproc_per_node 2 --master_port 9527 train.py --batch-size 16 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0,1 --sync-bn --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
Training schedule in the paper:
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 tune.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights 'runs/train/yolor_p6/weights/last_298.pt' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6-tune --hyp hyp.finetune.1280.yaml --epochs 450
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights 'runs/train/yolor_p6-tune/weights/epoch_424.pt' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6-fine --hyp hyp.finetune.1280.yaml --epochs 450
Inference
python detect.py --source inference/images/horses.jpg --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --conf 0.25 --img-size 1280 --device 0
You will get the results:
Citation
@article{wang2023you,
title={You Only Learn One Representation: Unified Network for Multiple Tasks},
author={Wang, Chien-Yao and Yeh, I-Hau and Liao, Hong-Yuan Mark},
journal={Journal of Information Science and Engineering},
year={2023}
}
Acknowledgements
Top Related Projects
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Models and examples built with TensorFlow
OpenMMLab Detection Toolbox and Benchmark
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot