Convert Figma logo to code with AI

DingXiaoH logoRepVGG

RepVGG: Making VGG-style ConvNets Great Again

3,382
432
3,382
45

Top Related Projects

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

2,151

Codebase for Image Classification Research, written in PyTorch.

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

OpenMMLab Pre-training Toolbox and Benchmark

4,224

Official DeiT repository

Quick Overview

RepVGG is a simple but powerful architecture of convolutional neural networks. It has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. This unique design allows for high accuracy and efficient inference.

Pros

  • Simple and efficient architecture for inference
  • High accuracy comparable to more complex models
  • Easy to deploy on various hardware platforms
  • Supports model quantization for further optimization

Cons

  • Requires a specific training process due to its multi-branch topology
  • May have larger model size compared to some other efficient architectures
  • Limited flexibility in terms of architectural modifications
  • Relatively new, so it may have less community support compared to more established architectures

Code Examples

  1. Creating a RepVGG model:
from repvgg import create_RepVGG_A0
model = create_RepVGG_A0(deploy=False)
  1. Converting a trained model to deployment mode:
from repvgg import repvgg_model_convert
deploy_model = repvgg_model_convert(model, save_path='repvgg_deploy.pth')
  1. Loading a pre-trained RepVGG model:
import torch
from repvgg import create_RepVGG_A0
model = create_RepVGG_A0(deploy=True)
model.load_state_dict(torch.load('repvgg_a0_deploy.pth'))

Getting Started

To get started with RepVGG, follow these steps:

  1. Install the required dependencies:
pip install torch torchvision
  1. Clone the RepVGG repository:
git clone https://github.com/DingXiaoH/RepVGG.git
cd RepVGG
  1. Use RepVGG in your project:
from repvgg import create_RepVGG_A0
model = create_RepVGG_A0(deploy=False)
# Train your model
# ...
# Convert to deployment mode
from repvgg import repvgg_model_convert
deploy_model = repvgg_model_convert(model, save_path='repvgg_deploy.pth')

Competitor Comparisons

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

Pros of Efficient-AI-Backbones

  • Offers a wider range of efficient backbone architectures (e.g., GhostNet, TinyNet, MindSpore)
  • Provides implementations in multiple frameworks (PyTorch, TensorFlow, MindSpore)
  • Includes pre-trained models and benchmarks for various tasks

Cons of Efficient-AI-Backbones

  • May have a steeper learning curve due to multiple architectures and frameworks
  • Less focused on a single, specific architecture optimization technique
  • Potentially more complex to integrate into existing projects

Code Comparison

RepVGG:

class RepVGGBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False):
        super(RepVGGBlock, self).__init__()
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels

Efficient-AI-Backbones (GhostNet):

class GhostModule(nn.Module):
    def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
        super(GhostModule, self).__init__()
        self.oup = oup
        init_channels = math.ceil(oup / ratio)
        new_channels = init_channels*(ratio-1)

Both repositories focus on efficient neural network architectures, but Efficient-AI-Backbones offers a broader range of options and implementations across multiple frameworks. RepVGG, on the other hand, concentrates on a specific architecture optimization technique, which may be easier to understand and integrate for some users.

2,151

Codebase for Image Classification Research, written in PyTorch.

Pros of pycls

  • Comprehensive library for image classification research
  • Supports multiple architectures and datasets
  • Highly configurable and extensible

Cons of pycls

  • More complex setup and usage
  • Larger codebase, potentially harder to understand
  • Less focused on specific architecture optimizations

Code Comparison

RepVGG:

class RepVGGBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):
        super(RepVGGBlock, self).__init__()
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels

pycls:

class AnyNet(nn.Module):
    def __init__(self, stem_type, stem_w, block_type, ds, ws, ss, se_r, nc):
        super(AnyNet, self).__init__()
        self._construct_net(stem_type, stem_w, block_type, ds, ws, ss, se_r)
        self.head = nn.Linear(ws[-1], nc, bias=True)

RepVGG focuses on a specific block implementation for efficient inference, while pycls provides a more general framework for constructing various network architectures. RepVGG's code is more specialized, whereas pycls offers greater flexibility for different models and configurations.

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Pros of pytorch-image-models

  • Extensive collection of pre-trained models and architectures
  • Regular updates and active community support
  • Comprehensive documentation and examples

Cons of pytorch-image-models

  • Larger codebase, potentially more complex to navigate
  • May include unnecessary features for specific use cases
  • Higher memory footprint due to diverse model support

Code Comparison

RepVGG:

class RepVGGBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=3, stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False):
        super(RepVGGBlock, self).__init__()
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels

pytorch-image-models:

class ConvBnAct(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding='', dilation=1, groups=1, bias=False, apply_act=True, norm_layer=nn.BatchNorm2d, act_layer=nn.ReLU, aa_layer=None, drop_block=None):
        super(ConvBnAct, self).__init__()
        self.conv = create_conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias=bias)

RepVGG focuses on a specific architecture with simplified blocks, while pytorch-image-models offers a more versatile approach with customizable components. The code snippets highlight the difference in complexity and flexibility between the two repositories.

OpenMMLab Pre-training Toolbox and Benchmark

Pros of mmpretrain

  • Comprehensive library with support for multiple models and tasks
  • Extensive documentation and community support
  • Modular design for easy customization and extension

Cons of mmpretrain

  • Steeper learning curve due to its complexity
  • Potentially heavier resource requirements

Code comparison

RepVGG:

from repvgg import create_RepVGG_A0
model = create_RepVGG_A0(deploy=False)

mmpretrain:

from mmpretrain import get_model
model = get_model('resnet50', pretrained=True)

RepVGG focuses on a specific architecture, while mmpretrain offers a wide range of models and tasks. RepVGG's implementation is more straightforward, but mmpretrain provides a unified interface for various models.

mmpretrain's modular design allows for easier customization and extension, but it may require more time to understand and utilize effectively. RepVGG, being more focused, has a simpler API but is limited to its specific architecture.

Both repositories have their strengths, with RepVGG offering a specialized solution and mmpretrain providing a comprehensive toolkit for various computer vision tasks.

4,224

Official DeiT repository

Pros of DeiT

  • Focuses on vision transformers, offering a more specialized approach for image classification tasks
  • Provides data-efficient training methods, reducing the need for large datasets
  • Includes distillation techniques to improve model performance and efficiency

Cons of DeiT

  • Limited to vision tasks, whereas RepVGG is more versatile for various computer vision applications
  • May require more computational resources due to the transformer architecture
  • Less straightforward to deploy on edge devices compared to RepVGG's simplified structure

Code Comparison

DeiT:

class Attention(nn.Module):
    def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0.):
        super().__init__()
        self.num_heads = num_heads
        head_dim = dim // num_heads
        self.scale = head_dim ** -0.5

RepVGG:

class RepVGGBlock(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size,
                 stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', deploy=False):
        super(RepVGGBlock, self).__init__()
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels

The code snippets highlight the different architectural approaches: DeiT uses attention mechanisms typical of transformers, while RepVGG employs a more traditional convolutional structure with additional optimizations for deployment.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

RepVGG: Making VGG-style ConvNets Great Again (CVPR-2021) (PyTorch)

Highlights (Sep. 1st, 2022)

RepVGG and the methodology of re-parameterization have been used in YOLOv6 (paper, code) and YOLOv7 (paper, code).

I have re-organized this repository and released the RepVGGplus-L2pse model with 84.06% ImageNet accuracy. Will release more RepVGGplus models in this month.

Introduction

This is a super simple ConvNet architecture that achieves over 84% top-1 accuracy on ImageNet with a VGG-like architecture! This repo contains the pretrained models, code for building the model, training, and the conversion from training-time model to inference-time, and an example of using RepVGG for semantic segmentation.

The MegEngine version

TensorRT implemention with C++ API by @upczww. Great work!

Another PyTorch implementation by @zjykzj. He also presented detailed benchmarks here. Nice work!

Included in a famous PyTorch model zoo https://github.com/rwightman/pytorch-image-models.

Objax implementation and models by @benjaminjellis. Great work!

Included in the MegEngine Basecls model zoo.

Citation:

@inproceedings{ding2021repvgg,
title={Repvgg: Making vgg-style convnets great again},
author={Ding, Xiaohan and Zhang, Xiangyu and Ma, Ningning and Han, Jungong and Ding, Guiguang and Sun, Jian},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={13733--13742},
year={2021}
}

From RepVGG to RepVGGplus

We have released an improved architecture named RepVGGplus on top of the original version presented in the CVPR-2021 paper.

  1. RepVGGplus is deeper

  2. RepVGGplus has auxiliary classifiers during training, which can also be removed for inference

  3. (Optional) RepVGGplus uses Squeeze-and-Excitation blocks to further improve the performance.

RepVGGplus outperformed several recent visual transformers with a top-1 accuracy of 84.06% and higher throughput. Our training script is based on codebase of Swin Transformer. The throughput is tested with the Swin codebase as well. We would like to thank the authors of Swin for their clean and well-structured code.

ModelTrain image sizeTest sizeImageNet top-1Throughput (examples/second), 320, batchsize=128, 2080Ti)
RepVGGplus-L2pse25632084.06%147
Swin Transformer32032084.0%102

("pse" means Squeeze-and-Excitation blocks after ReLU.)

Download this model: Google Drive or Baidu Cloud.

To train or finetune it, slightly change your training code like this:

        #   Build model and data loader as usual
        for samples, targets in enumerate(train_data_loader):
            #   ......
            outputs = model(samples)                        #   Your original code
            if type(outputs) is dict:                       
                #   A training-time RepVGGplus outputs a dict. The items are:
                    #   'main':     the output of the final layer
                    #   '*aux*':    the output of auxiliary classifiers
                loss = 0
                for name, pred in outputs.items():
                    if 'aux' in name:
                        loss += 0.1 * criterion(pred, targets)          #  Assume "criterion" is cross-entropy for classification
                    else:
                        loss += criterion(pred, targets)
            else:
                loss = criterion(outputs, targets)          #   Your original code
            #   Backward as usual
            #   ......

To use it for downstream tasks like semantic segmentation, just discard the aux classifiers and the final FC layer.

Pleased note that the custom weight decay trick I described last year turned out to be insignificant in our recent experiments (84.16% ImageNet acc and negligible improvements on other tasks), so I decided to stop using it as a new feature of RepVGGplus. You may try it optionally on your task. Please refer to the last part of this page for details.

Use our pretrained model

You may download all of the ImageNet-pretrained models reported in the paper from Google Drive (https://drive.google.com/drive/folders/1Avome4KvNp0Lqh2QwhXO6L5URQjzCjUq?usp=sharing) or Baidu Cloud (https://pan.baidu.com/s/1nCsZlMynnJwbUBKn0ch7dQ, the access code is "rvgg"). For the ease of transfer learning on other tasks, they are all training-time models (with identity and 1x1 branches). You may test the accuracy by running

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12349 main.py --arch [model name] --data-path [/path/to/imagenet] --batch-size 32 --tag test --eval --resume [/path/to/weights/file] --opts DATA.DATASET imagenet DATA.IMG_SIZE [224 or 320]

The valid model names include

RepVGGplus-L2pse, RepVGG-A0, RepVGG-A1, RepVGG-A2, RepVGG-B0, RepVGG-B1, RepVGG-B1g2, RepVGG-B1g4, RepVGG-B2, RepVGG-B2g2, RepVGG-B2g4, RepVGG-B3, RepVGG-B3g2, RepVGG-B3g4

Convert a training-time RepVGG into the inference-time structure

For a RepVGG model or a model with RepVGG as one of its components (e.g., the backbone), you can convert the whole model by simply calling switch_to_deploy of every RepVGG block. This is the recommended way. Examples are shown in tools/convert.py and example_pspnet.py.

    for module in model.modules():
        if hasattr(module, 'switch_to_deploy'):
            module.switch_to_deploy()

We have also released a script for the conversion. For example,

python convert.py RepVGGplus-L2pse-train256-acc84.06.pth RepVGGplus-L2pse-deploy.pth -a RepVGGplus-L2pse

Then you may build the inference-time model with --deploy, load the converted weights and test

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12349 main.py --arch RepVGGplus-L2pse --data-path [/path/to/imagenet] --batch-size 32 --tag test --eval --resume RepVGGplus-L2pse-deploy.pth --deploy --opts DATA.DATASET imagenet DATA.IMG_SIZE [224 or 320]

Except for the final conversion after training, you may want to get the equivalent kernel and bias during training in a differentiable way at any time (get_equivalent_kernel_bias in repvgg.py). This may help training-based pruning or quantization.

Train from scratch

Reproduce RepVGGplus-L2pse (not presented in the paper)

To train the recently released RepVGGplus-L2pse from scratch, activate mixup and use --AUG.PRESET raug15 for RandAug.

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12349 main.py --arch RepVGGplus-L2pse --data-path [/path/to/imagenet] --batch-size 32 --tag train_from_scratch --output-dir /path/to/save/the/log/and/checkpoints --opts TRAIN.EPOCHS 300 TRAIN.BASE_LR 0.1 TRAIN.WEIGHT_DECAY 4e-5 TRAIN.WARMUP_EPOCHS 5 MODEL.LABEL_SMOOTHING 0.1 AUG.PRESET raug15 AUG.MIXUP 0.2 DATA.DATASET imagenet DATA.IMG_SIZE 256 DATA.TEST_SIZE 320

Reproduce original RepVGG results reported in the paper

To reproduce the models reported in the CVPR-2021 paper, use no mixup nor RandAug.

python -m torch.distributed.launch --nproc_per_node 8 --master_port 12349 main.py --arch [model name] --data-path [/path/to/imagenet] --batch-size 32 --tag train_from_scratch --output-dir /path/to/save/the/log/and/checkpoints --opts TRAIN.EPOCHS 300 TRAIN.BASE_LR 0.1 TRAIN.WEIGHT_DECAY 1e-4 TRAIN.WARMUP_EPOCHS 5 MODEL.LABEL_SMOOTHING 0.1 AUG.PRESET weak AUG.MIXUP 0.0 DATA.DATASET imagenet DATA.IMG_SIZE 224

The original RepVGG models were trained in 120 epochs with cosine learning rate decay from 0.1 to 0. We used 8 GPUs, global batch size of 256, weight decay of 1e-4 (no weight decay on fc.bias, bn.bias, rbr_dense.bn.weight and rbr_1x1.bn.weight) (weight decay on rbr_identity.weight makes little difference, and it is better to use it in most of the cases), and the same simple data preprocssing as the PyTorch official example:

            trans = transforms.Compose([
                transforms.RandomResizedCrop(224),
                transforms.RandomHorizontalFlip(),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

Other released models not presented in the paper

Apr 25, 2021 A deeper RepVGG model achieves 83.55% top-1 accuracy on ImageNet with SE blocks and an input resolution of 320x320 (and a wider version achieves 83.67% accuracy without SE). Note that it is trained with 224x224 but tested with 320x320, so that it is still trainable with a global batch size of 256 on a single machine with 8 1080Ti GPUs. If you test it with 224x224, the top-1 accuracy will be 81.82%. It has 1, 8, 14, 24, 1 layers in the 5 stages respectively. The width multipliers are a=2.5 and b=5 (the same as RepVGG-B2). The model name is "RepVGG-D2se". The code for building the model (repvgg.py) and testing with 320x320 (the testing example below) has been updated and the weights have been released at Google Drive and Baidu Cloud. Please check the links below.

Example 1: use Structural Re-parameterization like this in your own code

from repvgg import repvgg_model_convert, create_RepVGG_A0
train_model = create_RepVGG_A0(deploy=False)
train_model.load_state_dict(torch.load('RepVGG-A0-train.pth'))          # or train from scratch
# do whatever you want with train_model
deploy_model = repvgg_model_convert(train_model, save_path='RepVGG-A0-deploy.pth')
# do whatever you want with deploy_model

or

deploy_model = create_RepVGG_A0(deploy=True)
deploy_model.load_state_dict(torch.load('RepVGG-A0-deploy.pth'))
# do whatever you want with deploy_model

If you use RepVGG as a component of another model, the conversion is as simple as calling switch_to_deploy of every RepVGG block.

Example 2: use RepVGG as the backbone for downstream tasks

I would suggest you use popular frameworks like MMDetection and MMSegmentation. The features from any stage or layer of RepVGG can be fed into the task-specific heads. If you are not familiar with such frameworks and just would like to see a simple example, please check example_pspnet.py, which shows how to use RepVGG as the backbone of PSPNet for semantic segmentation: 1) build a PSPNet with RepVGG backbone, 2) load the ImageNet-pretrained weights, 3) convert the whole model with switch_to_deploy, 4) save and use the converted model for inference.

Quantization

RepVGG works fine with FP16 but the accuracy may decrease when directly quantized to INT8. If IN8 quantization is essential to your application, we suggest three practical solutions.

Solution A: RepOptimizer

I strongly recommend trying RepOptimizer if quantization is essential to your application. RepOptimizer directly trains a VGG-like model via Gradient Re-parameterization without any structural conversions. Quantizing a VGG-like model trained with RepOptimizer is as easy as quantizing a regular model. RepOptimizer has already been used in YOLOv6.

Paper: https://arxiv.org/abs/2205.15242

Code: https://github.com/DingXiaoH/RepOptimizers

Tutorial provided by the authors of YOLOv6: https://github.com/meituan/YOLOv6/blob/main/docs/tutorial_repopt.md. Great work! Many thanks!

Solution B: custom quantization-aware training

Another choice is is to constrain the equivalent kernel (get_equivalent_kernel_bias() in repvgg.py) to be low-bit (e.g., make every param in {-127, -126, .., 126, 127} for int8), instead of constraining the params of every kernel separately for an ordinary model.

Solution C: use the off-the-shelf toolboxes

(TODO: check and refactor the code of this example)

For the simplicity, we can also use the off-the-shelf quantization toolboxes to quantize RepVGG. We use the simple QAT (quantization-aware training) tool in torch.quantization as an example.

  1. Given the base model converted into the inference-time structure. We insert BN after the converted 3x3 conv layers because QAT with torch.quantization requires BN. Specifically, we run the model on ImageNet training set and record the mean/std statistics and use them to initialize the BN layers, and initialize BN.gamma/beta accordingly so that the saved model has the same outputs as the inference-time model.
python quantization/convert.py RepVGG-A0.pth RepVGG-A0_base.pth -a RepVGG-A0 
python quantization/insert_bn.py [imagenet-folder] RepVGG-A0_base.pth RepVGG-A0_withBN.pth -a RepVGG-A0 -b 32 -n 40000
  1. Build the model, prepare it for QAT (torch.quantization.prepare_qat), and conduct QAT. This is only an example and the hyper-parameters may not be optimal.
python quantization/quant_qat_train.py [imagenet-folder] -j 32 --epochs 20 -b 256 --lr 1e-3 --weight-decay 4e-5 --base-weights RepVGG-A0_withBN.pth --tag quanttest

FAQs

Q: Is the inference-time model's output the same as the training-time model?

A: Yes. You can verify that by

python tools/verify.py

Q: How to use the pretrained RepVGG models for other tasks?

A: It is better to finetune the training-time RepVGG models on your datasets. Then you should do the conversion after finetuning and before you deploy the models. For example, say you want to use PSPNet for semantic segmentation, you should build a PSPNet with a training-time RepVGG model as the backbone, load pre-trained weights into the backbone, and finetune the PSPNet on your segmentation dataset. Then you should convert the backbone following the code provided in this repo and keep the other task-specific structures (the PSPNet parts, in this case). The pseudo code will be like

#   train_backbone = create_RepVGG_B2(deploy=False)
#   train_backbone.load_state_dict(torch.load('RepVGG-B2-train.pth'))
#   train_pspnet = build_pspnet(backbone=train_backbone)
#   segmentation_train(train_pspnet)
#   deploy_pspnet = repvgg_model_convert(train_pspnet)
#   segmentation_test(deploy_pspnet)

There is an example in example_pspnet.py.

Finetuning with a converted RepVGG also makes sense if you insert a BN after each conv (please see the quantization example), but the performance may be slightly lower.

Q: I tried to finetune your model with multiple GPUs but got an error. Why are the names of params like "stage1.0.rbr_dense.conv.weight" in the downloaded weight file but sometimes like "module.stage1.0.rbr_dense.conv.weight" (shown by nn.Module.named_parameters()) in my model?

A: DistributedDataParallel may prefix "module." to the name of params and cause a mismatch when loading weights by name. The simplest solution is to load the weights (model.load_state_dict(...)) before DistributedDataParallel(model). Otherwise, you may insert "module." before the names like this

checkpoint = torch.load(...)    # This is just a name-value dict
ckpt = {('module.' + k) : v for k, v in checkpoint.items()}
model.load_state_dict(ckpt)

Likewise, if the param names in the checkpoint file start with "module." but those in your model do not, you may strip the names like line 50 in test.py.

ckpt = {k.replace('module.', ''):v for k,v in checkpoint.items()}   # strip the names
model.load_state_dict(ckpt)

Q: So a RepVGG model derives the equivalent 3x3 kernels before each forwarding to save computations?

A: No! More precisely, we do the conversion only once right after training. Then the training-time model can be discarded, and the resultant model only has 3x3 kernels. We only save and use the resultant model.

An optional trick with a custom weight decay (deprecated)

This is deprecated. Please check repvggplus_custom_L2.py. The intuition is to add regularization on the equivalent kernel. It may work in some cases.

The trained model can be downloaded at Google Drive or Baidu Cloud

The training code should be changed like this:

        #   Build model and data loader as usual
        for samples, targets in enumerate(train_data_loader):
            #   ......
            outputs = model(samples)                        #   Your original code
            if type(outputs) is dict:                       
                #   A training-time RepVGGplus outputs a dict. The items are:
                    #   'main':     the output of the final layer
                    #   '*aux*':    the output of auxiliary classifiers
                    #   'L2':       the custom L2 regularization term
                loss = WEIGHT_DECAY * 0.5 * outputs['L2']
                for name, pred in outputs.items():
                    if name == 'L2':
                        pass
                    elif 'aux' in name:
                        loss += 0.1 * criterion(pred, targets)          #  Assume "criterion" is cross-entropy for classification
                    else:
                        loss += criterion(pred, targets)
            else:
                loss = criterion(outputs, targets)          #   Your original code
            #   Backward as usual
            #   ......

Contact

xiaohding@gmail.com (The original Tsinghua mailbox dxh17@mails.tsinghua.edu.cn will expire in several months)

Google Scholar Profile: https://scholar.google.com/citations?user=CIjw0KoAAAAJ&hl=en

Homepage: https://dingxiaohan.xyz/

My open-sourced papers and repos:

The Structural Re-parameterization Universe:

  1. RepLKNet (CVPR 2022) Powerful efficient architecture with very large kernels (31x31) and guidelines for using large kernels in model CNNs
    Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
    code.

  2. RepOptimizer (ICLR 2023) uses Gradient Re-parameterization to train powerful models efficiently. The training-time RepOpt-VGG is as simple as the inference-time. It also addresses the problem of quantization.
    Re-parameterizing Your Optimizers rather than Architectures
    code.

  3. RepVGG (CVPR 2021) A super simple and powerful VGG-style ConvNet architecture. Up to 84.16% ImageNet top-1 accuracy!
    RepVGG: Making VGG-style ConvNets Great Again
    code.

  4. RepMLP (CVPR 2022) MLP-style building block and Architecture
    RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
    code.

  5. ResRep (ICCV 2021) State-of-the-art channel pruning (Res50, 55% FLOPs reduction, 76.15% acc)
    ResRep: Lossless CNN Pruning via Decoupling Remembering and Forgetting
    code.

  6. ACB (ICCV 2019) is a CNN component without any inference-time costs. The first work of our Structural Re-parameterization Universe.
    ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks.
    code.

  7. DBB (CVPR 2021) is a CNN component with higher performance than ACB and still no inference-time costs. Sometimes I call it ACNet v2 because "DBB" is 2 bits larger than "ACB" in ASCII (lol).
    Diverse Branch Block: Building a Convolution as an Inception-like Unit
    code.

Model compression and acceleration:

  1. (CVPR 2019) Channel pruning: Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
    code

  2. (ICML 2019) Channel pruning: Approximated Oracle Filter Pruning for Destructive CNN Width Optimization
    code

  3. (NeurIPS 2019) Unstructured pruning: Global Sparse Momentum SGD for Pruning Very Deep Neural Networks
    code