yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)

2,002

516

2,002

215

View on GitHub

Top Related Projects

yolov5

54,362

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

darknet

22,101

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Mask_RCNN

25,251

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

models

77,618

Models and examples built with TensorFlow

mmdetection

31,487

OpenMMLab Detection Toolbox and Benchmark

Quick Overview

The WongKinYiu/yolor repository is a PyTorch implementation of the YOLOR (You Only Look Once Refined) object detection model. YOLOR is a real-time object detection system that aims to achieve high accuracy and efficiency on a wide range of tasks and datasets.

Pros

High Accuracy: YOLOR has been shown to achieve state-of-the-art performance on several object detection benchmarks, including COCO and Pascal VOC.
Real-Time Performance: The model is designed to run at high frame rates, making it suitable for real-time applications such as video surveillance and autonomous vehicles.
Versatility: YOLOR can be applied to a variety of object detection tasks and can be easily fine-tuned on different datasets.
Open-Source: The PyTorch implementation is available on GitHub, allowing for easy customization and contribution to the project.

Cons

Complexity: The YOLOR model is relatively complex, with a large number of parameters and a sophisticated architecture, which may make it more challenging to understand and modify.
Hardware Requirements: Running YOLOR efficiently may require powerful hardware, such as high-end GPUs, which can be a barrier for some users.
Limited Documentation: The project's documentation could be more comprehensive, making it harder for new users to get started with the library.
Ongoing Development: As an active project, YOLOR may undergo frequent updates and changes, which could make it more difficult to maintain a stable integration in production environments.

Code Examples

Here are a few code examples demonstrating the usage of the YOLOR PyTorch implementation:

Loading a Pre-Trained Model:

import torch
from yolor.models.experimental import attempt_load

# Load a pre-trained YOLOR model
model = attempt_load('yolor_p6.pt', map_location=torch.device('cpu'))

Performing Object Detection:

import cv2
from yolor.utils.datasets import letterbox
from yolor.utils.general import non_max_suppression, scale_coords

# Load an image and perform object detection
img = cv2.imread('image.jpg')
img, ratio, (dw, dh) = letterbox(img, auto=True)
pred = model(img)[0]
pred = non_max_suppression(pred, 0.4, 0.5)
for i, det in enumerate(pred):
    det[:4] = scale_coords(img.shape[2:], det[:4], img.shape).round()
    # Visualize the detected objects
    # ...

Evaluating Model Performance:

from yolor.utils.metrics import ap_per_class
from yolor.utils.general import coco80_to_coco91_class

# Evaluate the model's performance on the COCO dataset
targets, predictions = load_coco_data()
names = coco80_to_coco91_class()
p, r, ap, f1, ap_class = ap_per_class(predictions, targets, names=names)
# Print the results
# ...

Getting Started

To get started with the YOLOR PyTorch implementation, follow these steps:

Clone the repository:

git clone https://github.com/WongKinYiu/yolor.git

Install the required dependencies:

cd yolor
pip install -r requirements.txt

Download a pre-trained YOLOR model:

wget https://github.com/WongKinYiu/yolor/releases/download/v0.1/yolor_p6.pt

Run the object detection example:

from yolor.detect import run
run(
    weights='yolor_p6.pt',
    source='0',  # webcam
    imgsz=1280,
    conf_thres=0.25,
    iou_thres=0.45,
    max_det=1000,
    device='0',
    view_img=True,
    save_txt=False,
    save_conf=False,
    save_crop=False,
    nosave=False,

Competitor Comparisons

yolov5

54,362

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Pros of YOLOv5

More extensive documentation and tutorials
Larger community and more frequent updates
Better integration with PyTorch ecosystem

Cons of YOLOv5

Slightly lower performance on some benchmarks
Less focus on novel architectural improvements

Code Comparison

YOLOv5:

from models.yolo import Model
from utils.torch_utils import select_device

device = select_device('0')
model = Model('yolov5s.yaml', ch=3, nc=80).to(device)

YOLOR:

from models.yolo import Model
from utils.torch_utils import select_device

device = select_device('0')
model = Model('cfg/yolor_p6.cfg', ch=3, nc=80).to(device)

The code structure is similar, but YOLOR uses a different configuration file format (.cfg) compared to YOLOv5's YAML-based configuration.

YOLOv5 offers a more standardized approach with better documentation, making it easier for beginners to get started. However, YOLOR introduces some architectural innovations that can lead to improved performance in certain scenarios.

Both projects are actively maintained and offer state-of-the-art object detection capabilities. The choice between them often depends on specific project requirements and user familiarity with each framework.

darknet

22,101

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

Pros of darknet

More established and widely used in the computer vision community
Supports a broader range of YOLO versions (v2, v3, v4) and other architectures
Extensive documentation and community support

Cons of darknet

Less focus on recent YOLO improvements and optimizations
May have slower inference speed compared to YOLOR's optimized implementation
Requires more manual configuration for advanced features

Code Comparison

darknet:

layer make_yolo_layer(int batch, int w, int h, int n, int total, int *mask, int classes)
{
    int i;
    layer l = {0};
    l.type = YOLO;

YOLOR:

class YOLOR(nn.Module):
    def __init__(self, nc=80, anchors=()): 
        super(YOLOR, self).__init__()
        self.nc = nc  # number of classes
        self.no = nc + 5  # number of outputs per anchor

The code snippets show the different implementation languages and approaches. darknet uses C for lower-level control, while YOLOR utilizes Python with PyTorch for easier integration and development.

YOLOR focuses on recent YOLO improvements and optimizations, potentially offering better performance in certain scenarios. However, darknet provides a more comprehensive set of features and broader architecture support, making it suitable for a wider range of applications.

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

More comprehensive and feature-rich, supporting a wider range of computer vision tasks
Better documentation and community support, making it easier to use and extend
Modular architecture allows for easier customization and experimentation

Cons of Detectron2

Steeper learning curve due to its complexity and extensive features
Potentially slower inference time compared to YOLOR's optimized architecture
Requires more computational resources for training and inference

Code Comparison

YOLOR (simplified detection code):

from models.models import *
from utils.datasets import *
from utils.utils import *

model = Darknet('cfg/yolor_p6.cfg', img_size)
model.load_state_dict(torch.load('yolor_p6.pt'))
img = torch.zeros((1, 3, img_size, img_size))
output = model(img)

Detectron2 (simplified detection code):

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

Both repositories focus on object detection, but Detectron2 offers a more comprehensive toolkit for various computer vision tasks. YOLOR is more specialized and optimized for real-time object detection, while Detectron2 provides a flexible framework for a broader range of applications.

Mask_RCNN

25,251

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

Provides instance segmentation in addition to object detection
Well-documented and easier to understand for beginners
Supports both TensorFlow and Keras backends

Cons of Mask_RCNN

Generally slower inference speed compared to YOLOR
May require more computational resources for training and inference
Less suitable for real-time applications

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib
from mrcnn import utils

model = modellib.MaskRCNN(mode="inference", config=config, model_dir=MODEL_DIR)
model.load_weights(WEIGHTS_PATH, by_name=True)
results = model.detect([image], verbose=1)

YOLOR:

from models.models import *
from utils.datasets import *
from utils.utils import *

model = Darknet(cfg, img_size)
model.load_state_dict(torch.load(weights, map_location=device)['model'])
pred = model(img.to(device))[0]

The code snippets show the basic setup and inference process for both models. Mask_RCNN uses a more straightforward approach with built-in functions, while YOLOR requires more manual setup but offers more flexibility in terms of customization.

models

77,618

Models and examples built with TensorFlow

Pros of TensorFlow Models

Comprehensive collection of models and examples across various domains
Backed by Google, with extensive documentation and community support
Supports multiple deep learning tasks beyond object detection

Cons of TensorFlow Models

Steeper learning curve due to the breadth of the repository
May require more setup and configuration for specific tasks
Potentially slower inference compared to YOLOR's optimized architecture

Code Comparison

YOLOR (PyTorch):

from models.models import *
from utils.utils import *

model = Darknet('cfg/yolor_p6.cfg', imgsize)
model.load_state_dict(torch.load('yolor_p6.pt')['model'])

TensorFlow Models:

import tensorflow as tf
from object_detection import model_lib_v2

model_dir = 'path/to/model'
pipeline_config = 'path/to/pipeline.config'
model_lib_v2.train_loop(pipeline_config_path=pipeline_config, model_dir=model_dir)

Summary

YOLOR focuses on a single, highly optimized object detection model, while TensorFlow Models offers a diverse range of pre-trained models and examples. YOLOR may be easier to get started with for specific object detection tasks, while TensorFlow Models provides more flexibility and options for various deep learning applications.

mmdetection

31,487

OpenMMLab Detection Toolbox and Benchmark

Pros of mmdetection

Comprehensive framework with support for multiple object detection algorithms
Extensive documentation and community support
Modular design allowing easy customization and extension

Cons of mmdetection

Steeper learning curve due to its complexity
Potentially slower inference time for some models compared to YOLOR

Code Comparison

YOLOR (model definition):

class YOLOR(nn.Module):
    def __init__(self, nc=80, anchors=(), ch=()):
        super(YOLOR, self).__init__()
        self.backbone = CSPDarknet(ch)
        self.head = YOLOHead(nc, anchors)

mmdetection (model configuration):

model = dict(
    type='YOLOV3',
    backbone=dict(type='Darknet', depth=53),
    neck=dict(type='YOLOV3Neck'),
    bbox_head=dict(type='YOLOV3Head', num_classes=80)
)

YOLOR focuses on a single, highly optimized architecture, while mmdetection provides a flexible framework for implementing various object detection algorithms. YOLOR may offer better performance for specific use cases, while mmdetection provides more options and easier experimentation with different models and techniques.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

YOLOR

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

Unified Network

To get the results on the table, please use this branch.

Model	Test Size	AP^test	AP₅₀^test	AP₇₅^test	batch1 throughput	batch32 inference
YOLOR-CSP	640	52.8%	71.2%	57.6%	106 fps	3.2 ms
YOLOR-CSP-X	640	54.8%	73.1%	59.7%	87 fps	5.5 ms
YOLOR-P6	1280	55.7%	73.3%	61.0%	76 fps	8.3 ms
YOLOR-W6	1280	56.9%	74.4%	62.2%	66 fps	10.7 ms
YOLOR-E6	1280	57.6%	75.2%	63.0%	45 fps	17.1 ms
YOLOR-D6	1280	58.2%	75.8%	63.8%	34 fps	21.8 ms

YOLOv4-P5	896	51.8%	70.3%	56.6%	41 fps (old)	-
YOLOv4-P6	1280	54.5%	72.6%	59.8%	30 fps (old)	-
YOLOv4-P7	1536	55.5%	73.4%	60.8%	16 fps (old)	-

Fix the speed bottleneck on our NFS, many thanks to NCHC, TWCC, and NARLabs support teams.

Model	Test Size	AP^val	AP₅₀^val	AP₇₅^val	AP_S^val	AP_M^val	AP_L^val	weights
YOLOv4-CSP	640	49.1%	67.7%	53.8%	32.1%	54.4%	63.2%	-
YOLOR-CSP	640	49.2%	67.6%	53.7%	32.9%	54.4%	63.0%	weights
YOLOR-CSP*	640	50.0%	68.7%	54.3%	34.2%	55.1%	64.3%	weights

YOLOv4-CSP-X	640	50.9%	69.3%	55.4%	35.3%	55.8%	64.8%	-
YOLOR-CSP-X	640	51.1%	69.6%	55.7%	35.7%	56.0%	65.2%	weights
YOLOR-CSP-X*	640	51.5%	69.9%	56.1%	35.8%	56.8%	66.1%	weights

Developing...

Model	Test Size	AP^test	AP₅₀^test	AP₇₅^test	AP_S^test	AP_M^test	AP_L^test
YOLOR-CSP	640	51.1%	69.6%	55.7%	31.7%	55.3%	64.7%
YOLOR-CSP-X	640	53.0%	71.4%	57.9%	33.7%	57.1%	66.8%

Train from scratch for 300 epochs...

Model	Info	Test Size	AP
YOLOR-CSP	evolution	640	48.0%
YOLOR-CSP	strategy	640	50.0%
YOLOR-CSP	strategy + simOTA	640	51.1%

YOLOR-CSP-X	strategy	640	51.5%
YOLOR-CSP-X	strategy + simOTA	640	53.0%

Installation

Docker environment (recommended)

Expand

# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolor -it -v your_coco_path/:/coco/ -v your_code_path/:/yolor --shm-size=64g nvcr.io/nvidia/pytorch:20.11-py3

# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx

# pip install required packages
pip install seaborn thop

# install mish-cuda if you want to use mish activation
# https://github.com/thomasbrandon/mish-cuda
# https://github.com/JunnYu/mish-cuda
cd /
git clone https://github.com/JunnYu/mish-cuda
cd mish-cuda
python setup.py build install

# install pytorch_wavelets if you want to use dwt down-sampling module
# https://github.com/fbcotter/pytorch_wavelets
cd /
git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .

# go to code folder
cd /yolor

Colab environment

Expand

git clone https://github.com/WongKinYiu/yolor
cd yolor

# pip install required packages
pip install -qr requirements.txt

# install mish-cuda if you want to use mish activation
# https://github.com/thomasbrandon/mish-cuda
# https://github.com/JunnYu/mish-cuda
git clone https://github.com/JunnYu/mish-cuda
cd mish-cuda
python setup.py build install
cd ..

# install pytorch_wavelets if you want to use dwt down-sampling module
# https://github.com/fbcotter/pytorch_wavelets
git clone https://github.com/fbcotter/pytorch_wavelets
cd pytorch_wavelets
pip install .
cd ..

Prepare COCO dataset

Expand

cd /yolor
bash scripts/get_coco.sh

Prepare pretrained weight

Expand

cd /yolor
bash scripts/get_pretrain.sh

Testing

yolor_p6.pt

python test.py --data data/coco.yaml --img 1280 --batch 32 --conf 0.001 --iou 0.65 --device 0 --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --name yolor_p6_val

You will get the results:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.52510
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.70718
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.57520
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.37058
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.56878
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66102
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.39181
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.65229
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.71441
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.57755
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.75337
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.84013

Training

Single GPU training:

python train.py --batch-size 8 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0 --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300

Multiple GPU training:

python -m torch.distributed.launch --nproc_per_node 2 --master_port 9527 train.py --batch-size 16 --img 1280 1280 --data coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0,1 --sync-bn --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300

Training schedule in the paper:

python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights '' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6 --hyp hyp.scratch.1280.yaml --epochs 300
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 tune.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights 'runs/train/yolor_p6/weights/last_298.pt' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6-tune --hyp hyp.finetune.1280.yaml --epochs 450
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data data/coco.yaml --cfg cfg/yolor_p6.cfg --weights 'runs/train/yolor_p6-tune/weights/epoch_424.pt' --device 0,1,2,3,4,5,6,7 --sync-bn --name yolor_p6-fine --hyp hyp.finetune.1280.yaml --epochs 450

Inference

yolor_p6.pt

python detect.py --source inference/images/horses.jpg --cfg cfg/yolor_p6.cfg --weights yolor_p6.pt --conf 0.25 --img-size 1280 --device 0

You will get the results:

horses

Citation

@article{wang2023you,
  title={You Only Learn One Representation: Unified Network for Multiple Tasks},
  author={Wang, Chien-Yao and Yeh, I-Hau and Liao, Hong-Yuan Mark},
  journal={Journal of Information Science and Engineering},
  year={2023}
}

Acknowledgements

Expand

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot