mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

3,616

1,080

3,616

269

View on GitHub

Top Related Projects

vissl

3,284

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

Swin-Transformer

14,946

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Quick Overview

The open-mmlab/mmpretrain repository is a comprehensive deep learning model pre-training library developed by the Open Multimedia Lab (MMLAB) team. It provides a unified interface for training and evaluating a wide range of computer vision models, including image classification, object detection, and segmentation tasks.

Pros

Extensive Model Zoo: The library offers a diverse collection of pre-trained models across various computer vision tasks, allowing users to easily leverage state-of-the-art performance without the need for extensive training.
Modular and Extensible: The codebase is designed to be highly modular, making it easy to add new models, datasets, and training pipelines to the library.
Efficient Training and Evaluation: The library leverages the powerful PyTorch framework and provides optimized training and evaluation workflows, enabling efficient model development and deployment.
Comprehensive Documentation: The project maintains detailed documentation, including installation guides, tutorials, and API references, making it accessible for both beginners and experienced users.

Cons

Steep Learning Curve: The library's extensive features and flexibility can be overwhelming for newcomers, requiring a significant investment in understanding the project's structure and conventions.
Limited Community Support: Compared to more popular deep learning libraries like TensorFlow or PyTorch, the open-mmlab/mmpretrain project may have a smaller user community, which could impact the availability of resources and troubleshooting support.
Potential Compatibility Issues: As the library is actively developed, there may be occasional compatibility issues with newer versions of PyTorch or other dependencies, which could require additional effort to resolve.
Limited Deployment Options: While the library provides tools for model evaluation and deployment, the focus is primarily on the training and development aspects, and the deployment options may be less comprehensive than dedicated serving frameworks.

Code Examples

Here are a few code examples demonstrating the usage of the open-mmlab/mmpretrain library:

Image Classification:

from mmpretrain import ImageClassificationInferencer

# Load a pre-trained model
inferencer = ImageClassificationInferencer('resnet50_8xb32_in1k')

# Perform inference on an image
result = inferencer.inference('path/to/image.jpg')
print(result)

Object Detection:

from mmpretrain import ObjectDetectionInferencer

# Load a pre-trained model
inferencer = ObjectDetectionInferencer('faster-rcnn_r50_fpn_1x_coco')

# Perform inference on an image
result = inferencer.inference('path/to/image.jpg')
print(result)

Semantic Segmentation:

from mmpretrain import SemanticSegmentationInferencer

# Load a pre-trained model
inferencer = SemanticSegmentationInferencer('fcn_r50-d8_512x1024_40k_cityscapes')

# Perform inference on an image
result = inferencer.inference('path/to/image.jpg')
print(result)

Training a Custom Model:

from mmpretrain import ImageClassificationTask, ResNetModel

# Define the task and model
task = ImageClassificationTask(
    model=ResNetModel(depth=50),
    dataset='path/to/dataset',
    work_dir='path/to/work_dir',
)

# Train the model
task.train(
    lr=0.01,
    epochs=100,
    batch_size=32,
)

Getting Started

To get started with the open-mmlab/mmpretrain library, follow these steps:

Install the Library: You can install the library using pip:

pip install mmpretrain

Explore the Model Zoo: Browse the available pre-trained models in the Model Zoo to find the ones that suit your needs.
Perform Inference: Use the provided inferencer classes to load and run inference on your images:

from mmpretrain import ImageClassificationInferencer

inferencer = ImageClassificationInferencer('resnet50_

Competitor Comparisons

vissl

3,284

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

Pros of VISSL

VISSL provides a wide range of self-supervised learning algorithms, including SimCLR, MoCo, and BYOL, allowing for more flexibility in model training.
The library is well-documented and includes detailed tutorials, making it easier for researchers and developers to get started.
VISSL supports distributed training, which can significantly speed up the training process on large datasets.

Cons of VISSL

VISSL is primarily focused on self-supervised learning, while MMPreTrain offers a broader range of pre-training and fine-tuning capabilities.
The installation process for VISSL can be more complex, as it requires setting up a specific PyTorch environment.

Code Comparison

MMPreTrain:

from mmpretrain import ImageClassifier

model = ImageClassifier(
    'resnet50',
    pretrained=True,
    num_classes=1000,
    backbone_args=dict(
        norm_cfg=dict(type='BN'),
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
    ),
)

VISSL:

from vissl.config import AttrDict
from vissl.models import build_model

cfg = AttrDict({
    "MODEL": {
        "FEATURE_EVAL_SETTINGS": {
            "EVAL_MODE_ON": True,
            "FREEZE_TRUNK_ONLY": True,
        },
        "TRUNK": {
            "NAME": "resnet50",
        },
    },
})

model = build_model(cfg)

Swin-Transformer

14,946

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Pros of Swin-Transformer

Swin-Transformer is a state-of-the-art vision transformer model that has achieved impressive results on various computer vision tasks.
The repository provides a well-documented and easy-to-use implementation of the Swin-Transformer model, making it accessible to researchers and developers.
The model has been pre-trained on large-scale datasets, allowing users to fine-tune it for their specific tasks, which can save time and resources.

Cons of Swin-Transformer

The Swin-Transformer repository is primarily focused on the Swin-Transformer model and does not provide a comprehensive set of tools and utilities like mmpretrain.
The documentation and examples in the Swin-Transformer repository may not be as extensive as those in mmpretrain, which can make it more challenging for beginners to get started.
The Swin-Transformer repository may not have the same level of community support and active development as mmpretrain.

Code Comparison

Swin-Transformer:

from swin_transformer import SwinTransformer

model = SwinTransformer(
    img_size=224,
    patch_size=4,
    in_chans=3,
    num_classes=1000,
    embed_dim=96,
    depths=[2, 2, 6, 2],
    num_heads=[3, 6, 12, 24],
    window_size=7,
    mlp_ratio=4.,
    qkv_bias=True,
    qk_scale=None,
    drop_rate=0.,
    attn_drop_rate=0.,
    drop_path_rate=0.1,
    norm_layer=nn.LayerNorm,
    ape=False,
    patch_norm=True,
    use_checkpoint=False
)

mmpretrain:

from mmpretrain import create_model

model = create_model(
    'resnet50',
    pretrained=True,
    num_classes=1000,
    init_cfg=dict(
        type='Pretrained',
        checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
    )
)

vision_transformer

11,291

Pros of Vision Transformer

Vision Transformer provides a more flexible and scalable architecture compared to traditional convolutional neural networks (CNNs), allowing for better performance on various computer vision tasks.
The Vision Transformer model is pre-trained on a large dataset, which can be fine-tuned for specific tasks, reducing the need for extensive training from scratch.
The Vision Transformer implementation includes comprehensive documentation and examples, making it easier for researchers and developers to get started.

Cons of Vision Transformer

The Vision Transformer model can be computationally more expensive and require more memory compared to some CNN-based models, especially for smaller-scale tasks.
The Vision Transformer may not perform as well as specialized CNN architectures on certain computer vision tasks, such as image classification on small datasets.

Code Comparison

Vision Transformer:

import tensorflow as tf
from vit_keras import ViT

# Create a Vision Transformer model
model = ViT(
    image_size=224,
    patch_size=16,
    num_classes=10,
    transformer_layers=12,
    transformer_units=768,
    num_heads=12,
    activation='gelu'
)

MMLab PreTrain:

from mmpretrain import build_model

# Create a MMLab PreTrain model
model = build_model(
    'resnet50',
    pretrained=True,
    num_classes=1000,
    init_cfg=dict(
        type='Pretrained',
        checkpoint='https://download.openmmlab.com/mmpretrain/v1.0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
    )
)

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Transformers provides a wide range of pre-trained models for various NLP tasks, making it easy to fine-tune and use in your own projects.
The library has extensive documentation and a large community, providing ample support and resources for users.
Transformers integrates well with popular deep learning frameworks like PyTorch and TensorFlow, allowing for seamless integration into your existing codebase.

Cons of Transformers

Transformers is primarily focused on NLP tasks, while MMPretrain covers a broader range of computer vision and multimodal tasks.
The library can be more complex to set up and configure compared to MMPretrain, which has a more streamlined and user-friendly setup process.
Transformers may have a steeper learning curve for users who are new to the field of natural language processing.

Code Comparison

Transformers:

from transformers import pipeline

# Load a pre-trained model for sentiment analysis
sentiment_analyzer = pipeline('sentiment-analysis')

# Perform sentiment analysis on a given text
result = sentiment_analyzer('This movie was amazing!')
print(result)

MMPretrain:

from mmpretrain import ClassificationInferencer

# Load a pre-trained model for image classification
classifier = ClassificationInferencer('resnet50_v1c')

# Perform image classification on an input image
result = classifier.inference('path/to/image.jpg')
print(result)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

OpenMMLab website ^HOT OpenMMLab platform ^{TRY IT OUT}

ð Documentation | ð ï¸ Installation | ð Model Zoo | ð Update News | ð¤ Reporting Issues

English | ç®ä½ä¸æ

Introduction

MMPreTrain is an open source pre-training toolbox based on PyTorch. It is a part of the OpenMMLab project.

The main branch works with PyTorch 1.8+.

Major features

Various backbones and pretrained models
Rich training strategies (supervised learning, self-supervised learning, multi-modality learning etc.)
Bag of training tricks
Large-scale training configs
High efficiency and extensibility
Powerful toolkits for model analysis and experiments
Various out-of-box inference tasks.
- Image Classification
- Image Caption
- Visual Question Answering
- Visual Grounding
- Retrieval (Image-To-Image, Text-To-Image, Image-To-Text)

https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351-fbc74a04e904

What's new

ð v1.2.0 was released in 04/01/2023

Support LLaVA 1.5.
Implement of RAM with a gradio interface.

ð v1.1.0 was released in 12/10/2023

Support Mini-GPT4 training and provide a Chinese model (based on Baichuan-7B)
Support zero-shot classification based on CLIP.

ð v1.0.0 was released in 04/07/2023

Support inference of more multi-modal algorithms, such as LLaVA, MiniGPT-4, Otter, etc.
Support around 10 multi-modal datasets!
Add iTPN, SparK self-supervised learning algorithms.
Provide examples of New Config and DeepSpeed/FSDP with FlexibleRunner. Here are the documentation links of New Config and DeepSpeed/FSDP with FlexibleRunner.

ð Upgrade from MMClassification to MMPreTrain

Integrated Self-supervised learning algorithms from MMSelfSup, such as MAE, BEiT, etc.
Support RIFormer, a simple but effective vision backbone by removing token mixer.
Refactor dataset pipeline visualization.
Support LeViT, XCiT, ViG, ConvNeXt-V2, EVA, RevViT, EfficientnetV2, CLIP, TinyViT and MixMIM backbones.

This release introduced a brand new and flexible training & test engine, but it's still in progress. Welcome to try according to the documentation.

And there are some BC-breaking changes. Please check the migration tutorial.

Please refer to changelog for more details and other release history.

Installation

Below are quick steps for installation:

conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -y
conda activate open-mmlab
pip install openmim
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
mim install -e .

Please refer to installation documentation for more detailed installation and dataset preparation.

For multi-modality models support, please install the extra dependencies by:

mim install -e ".[multimodal]"

User Guides

We provided a series of tutorials about the basic usage of MMPreTrain for new users:

For more information, please refer to our documentation.

Model zoo

Results and models are available in the model zoo.

Overview

Supported Backbones

Self-supervised Learning

Multi-Modality Algorithms

Others

Image Retrieval Task:

ArcFace (CVPR'2019)

Training&Test Tips:

Contributing

We appreciate all contributions to improve MMPreTrain. Please refer to CONTRUBUTING for the contributing guideline.

Acknowledgement

MMPreTrain is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and supporting their own academic research.

Citation

If you find this project useful in your research, please consider cite:

@misc{2023mmpretrain,
    title={OpenMMLab's Pre-training Toolbox and Benchmark},
    author={MMPreTrain Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmpretrain}},
    year={2023}
}

License

This project is released under the Apache 2.0 license.

Projects in OpenMMLab

MMEngine: OpenMMLab foundational library for training deep learning models.
MMCV: OpenMMLab foundational library for computer vision.
MIM: MIM installs OpenMMLab packages.
MMEval: A unified evaluation library for multiple machine learning libraries.
MMPreTrain: OpenMMLab pre-training toolbox and benchmark.
MMDetection: OpenMMLab detection toolbox and benchmark.
MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
MMYOLO: OpenMMLab YOLO series toolbox and benchmark.
MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
MMPose: OpenMMLab pose estimation toolbox and benchmark.
MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
MMRazor: OpenMMLab model compression toolbox and benchmark.
MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
MMTracking: OpenMMLab video perception toolbox and benchmark.
MMFlow: OpenMMLab optical flow toolbox and benchmark.
MMagic: OpenMMLab Advanced, Generative and Intelligent Creation toolbox.
MMGeneration: OpenMMLab image and video generative models toolbox.
MMDeploy: OpenMMLab model deployment framework.
Playground: A central hub for gathering and showcasing amazing projects built upon OpenMMLab.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot