Top Related Projects
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Quick Overview
The open-mmlab/mmpretrain repository is a comprehensive deep learning model pre-training library developed by the Open Multimedia Lab (MMLAB) team. It provides a unified interface for training and evaluating a wide range of computer vision models, including image classification, object detection, and segmentation tasks.
Pros
- Extensive Model Zoo: The library offers a diverse collection of pre-trained models across various computer vision tasks, allowing users to easily leverage state-of-the-art performance without the need for extensive training.
- Modular and Extensible: The codebase is designed to be highly modular, making it easy to add new models, datasets, and training pipelines to the library.
- Efficient Training and Evaluation: The library leverages the powerful PyTorch framework and provides optimized training and evaluation workflows, enabling efficient model development and deployment.
- Comprehensive Documentation: The project maintains detailed documentation, including installation guides, tutorials, and API references, making it accessible for both beginners and experienced users.
Cons
- Steep Learning Curve: The library's extensive features and flexibility can be overwhelming for newcomers, requiring a significant investment in understanding the project's structure and conventions.
- Limited Community Support: Compared to more popular deep learning libraries like TensorFlow or PyTorch, the open-mmlab/mmpretrain project may have a smaller user community, which could impact the availability of resources and troubleshooting support.
- Potential Compatibility Issues: As the library is actively developed, there may be occasional compatibility issues with newer versions of PyTorch or other dependencies, which could require additional effort to resolve.
- Limited Deployment Options: While the library provides tools for model evaluation and deployment, the focus is primarily on the training and development aspects, and the deployment options may be less comprehensive than dedicated serving frameworks.
Code Examples
Here are a few code examples demonstrating the usage of the open-mmlab/mmpretrain library:
- Image Classification:
from mmpretrain import ImageClassificationInferencer
# Load a pre-trained model
inferencer = ImageClassificationInferencer('resnet50_8xb32_in1k')
# Perform inference on an image
result = inferencer.inference('path/to/image.jpg')
print(result)
- Object Detection:
from mmpretrain import ObjectDetectionInferencer
# Load a pre-trained model
inferencer = ObjectDetectionInferencer('faster-rcnn_r50_fpn_1x_coco')
# Perform inference on an image
result = inferencer.inference('path/to/image.jpg')
print(result)
- Semantic Segmentation:
from mmpretrain import SemanticSegmentationInferencer
# Load a pre-trained model
inferencer = SemanticSegmentationInferencer('fcn_r50-d8_512x1024_40k_cityscapes')
# Perform inference on an image
result = inferencer.inference('path/to/image.jpg')
print(result)
- Training a Custom Model:
from mmpretrain import ImageClassificationTask, ResNetModel
# Define the task and model
task = ImageClassificationTask(
model=ResNetModel(depth=50),
dataset='path/to/dataset',
work_dir='path/to/work_dir',
)
# Train the model
task.train(
lr=0.01,
epochs=100,
batch_size=32,
)
Getting Started
To get started with the open-mmlab/mmpretrain library, follow these steps:
- Install the Library: You can install the library using pip:
pip install mmpretrain
-
Explore the Model Zoo: Browse the available pre-trained models in the Model Zoo to find the ones that suit your needs.
-
Perform Inference: Use the provided inferencer classes to load and run inference on your images:
from mmpretrain import ImageClassificationInferencer
inferencer = ImageClassificationInferencer('resnet50_
Competitor Comparisons
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
Pros of VISSL
- VISSL provides a wide range of self-supervised learning algorithms, including SimCLR, MoCo, and BYOL, allowing for more flexibility in model training.
- The library is well-documented and includes detailed tutorials, making it easier for researchers and developers to get started.
- VISSL supports distributed training, which can significantly speed up the training process on large datasets.
Cons of VISSL
- VISSL is primarily focused on self-supervised learning, while MMPreTrain offers a broader range of pre-training and fine-tuning capabilities.
- The installation process for VISSL can be more complex, as it requires setting up a specific PyTorch environment.
Code Comparison
MMPreTrain:
from mmpretrain import ImageClassifier
model = ImageClassifier(
'resnet50',
pretrained=True,
num_classes=1000,
backbone_args=dict(
norm_cfg=dict(type='BN'),
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50'),
),
)
VISSL:
from vissl.config import AttrDict
from vissl.models import build_model
cfg = AttrDict({
"MODEL": {
"FEATURE_EVAL_SETTINGS": {
"EVAL_MODE_ON": True,
"FREEZE_TRUNK_ONLY": True,
},
"TRUNK": {
"NAME": "resnet50",
},
},
})
model = build_model(cfg)
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Pros of Swin-Transformer
- Swin-Transformer is a state-of-the-art vision transformer model that has achieved impressive results on various computer vision tasks.
- The repository provides a well-documented and easy-to-use implementation of the Swin-Transformer model, making it accessible to researchers and developers.
- The model has been pre-trained on large-scale datasets, allowing users to fine-tune it for their specific tasks, which can save time and resources.
Cons of Swin-Transformer
- The Swin-Transformer repository is primarily focused on the Swin-Transformer model and does not provide a comprehensive set of tools and utilities like mmpretrain.
- The documentation and examples in the Swin-Transformer repository may not be as extensive as those in mmpretrain, which can make it more challenging for beginners to get started.
- The Swin-Transformer repository may not have the same level of community support and active development as mmpretrain.
Code Comparison
Swin-Transformer:
from swin_transformer import SwinTransformer
model = SwinTransformer(
img_size=224,
patch_size=4,
in_chans=3,
num_classes=1000,
embed_dim=96,
depths=[2, 2, 6, 2],
num_heads=[3, 6, 12, 24],
window_size=7,
mlp_ratio=4.,
qkv_bias=True,
qk_scale=None,
drop_rate=0.,
attn_drop_rate=0.,
drop_path_rate=0.1,
norm_layer=nn.LayerNorm,
ape=False,
patch_norm=True,
use_checkpoint=False
)
mmpretrain:
from mmpretrain import create_model
model = create_model(
'resnet50',
pretrained=True,
num_classes=1000,
init_cfg=dict(
type='Pretrained',
checkpoint='https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
)
)
Pros of Vision Transformer
- Vision Transformer provides a more flexible and scalable architecture compared to traditional convolutional neural networks (CNNs), allowing for better performance on various computer vision tasks.
- The Vision Transformer model is pre-trained on a large dataset, which can be fine-tuned for specific tasks, reducing the need for extensive training from scratch.
- The Vision Transformer implementation includes comprehensive documentation and examples, making it easier for researchers and developers to get started.
Cons of Vision Transformer
- The Vision Transformer model can be computationally more expensive and require more memory compared to some CNN-based models, especially for smaller-scale tasks.
- The Vision Transformer may not perform as well as specialized CNN architectures on certain computer vision tasks, such as image classification on small datasets.
Code Comparison
Vision Transformer:
import tensorflow as tf
from vit_keras import ViT
# Create a Vision Transformer model
model = ViT(
image_size=224,
patch_size=16,
num_classes=10,
transformer_layers=12,
transformer_units=768,
num_heads=12,
activation='gelu'
)
MMLab PreTrain:
from mmpretrain import build_model
# Create a MMLab PreTrain model
model = build_model(
'resnet50',
pretrained=True,
num_classes=1000,
init_cfg=dict(
type='Pretrained',
checkpoint='https://download.openmmlab.com/mmpretrain/v1.0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
)
)
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Pros of Transformers
- Transformers provides a wide range of pre-trained models for various NLP tasks, making it easy to fine-tune and use in your own projects.
- The library has extensive documentation and a large community, providing ample support and resources for users.
- Transformers integrates well with popular deep learning frameworks like PyTorch and TensorFlow, allowing for seamless integration into your existing codebase.
Cons of Transformers
- Transformers is primarily focused on NLP tasks, while MMPretrain covers a broader range of computer vision and multimodal tasks.
- The library can be more complex to set up and configure compared to MMPretrain, which has a more streamlined and user-friendly setup process.
- Transformers may have a steeper learning curve for users who are new to the field of natural language processing.
Code Comparison
Transformers:
from transformers import pipeline
# Load a pre-trained model for sentiment analysis
sentiment_analyzer = pipeline('sentiment-analysis')
# Perform sentiment analysis on a given text
result = sentiment_analyzer('This movie was amazing!')
print(result)
MMPretrain:
from mmpretrain import ClassificationInferencer
# Load a pre-trained model for image classification
classifier = ClassificationInferencer('resnet50_v1c')
# Perform image classification on an input image
result = classifier.inference('path/to/image.jpg')
print(result)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME

ð Documentation | ð ï¸ Installation | ð Model Zoo | ð Update News | ð¤ Reporting Issues

English | ç®ä½ä¸æ
Introduction
MMPreTrain is an open source pre-training toolbox based on PyTorch. It is a part of the OpenMMLab project.
The main
branch works with PyTorch 1.8+.
Major features
- Various backbones and pretrained models
- Rich training strategies (supervised learning, self-supervised learning, multi-modality learning etc.)
- Bag of training tricks
- Large-scale training configs
- High efficiency and extensibility
- Powerful toolkits for model analysis and experiments
- Various out-of-box inference tasks.
- Image Classification
- Image Caption
- Visual Question Answering
- Visual Grounding
- Retrieval (Image-To-Image, Text-To-Image, Image-To-Text)
https://github.com/open-mmlab/mmpretrain/assets/26739999/e4dcd3a2-f895-4d1b-a351-fbc74a04e904
What's new
ð v1.2.0 was released in 04/01/2023
- Support LLaVA 1.5.
- Implement of RAM with a gradio interface.
ð v1.1.0 was released in 12/10/2023
- Support Mini-GPT4 training and provide a Chinese model (based on Baichuan-7B)
- Support zero-shot classification based on CLIP.
ð v1.0.0 was released in 04/07/2023
- Support inference of more multi-modal algorithms, such as LLaVA, MiniGPT-4, Otter, etc.
- Support around 10 multi-modal datasets!
- Add iTPN, SparK self-supervised learning algorithms.
- Provide examples of New Config and DeepSpeed/FSDP with FlexibleRunner. Here are the documentation links of New Config and DeepSpeed/FSDP with FlexibleRunner.
ð Upgrade from MMClassification to MMPreTrain
- Integrated Self-supervised learning algorithms from MMSelfSup, such as MAE, BEiT, etc.
- Support RIFormer, a simple but effective vision backbone by removing token mixer.
- Refactor dataset pipeline visualization.
- Support LeViT, XCiT, ViG, ConvNeXt-V2, EVA, RevViT, EfficientnetV2, CLIP, TinyViT and MixMIM backbones.
This release introduced a brand new and flexible training & test engine, but it's still in progress. Welcome to try according to the documentation.
And there are some BC-breaking changes. Please check the migration tutorial.
Please refer to changelog for more details and other release history.
Installation
Below are quick steps for installation:
conda create -n open-mmlab python=3.8 pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -y
conda activate open-mmlab
pip install openmim
git clone https://github.com/open-mmlab/mmpretrain.git
cd mmpretrain
mim install -e .
Please refer to installation documentation for more detailed installation and dataset preparation.
For multi-modality models support, please install the extra dependencies by:
mim install -e ".[multimodal]"
User Guides
We provided a series of tutorials about the basic usage of MMPreTrain for new users:
For more information, please refer to our documentation.
Model zoo
Results and models are available in the model zoo.
Contributing
We appreciate all contributions to improve MMPreTrain. Please refer to CONTRUBUTING for the contributing guideline.
Acknowledgement
MMPreTrain is an open source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and supporting their own academic research.
Citation
If you find this project useful in your research, please consider cite:
@misc{2023mmpretrain,
title={OpenMMLab's Pre-training Toolbox and Benchmark},
author={MMPreTrain Contributors},
howpublished = {\url{https://github.com/open-mmlab/mmpretrain}},
year={2023}
}
License
This project is released under the Apache 2.0 license.
Projects in OpenMMLab
- MMEngine: OpenMMLab foundational library for training deep learning models.
- MMCV: OpenMMLab foundational library for computer vision.
- MIM: MIM installs OpenMMLab packages.
- MMEval: A unified evaluation library for multiple machine learning libraries.
- MMPreTrain: OpenMMLab pre-training toolbox and benchmark.
- MMDetection: OpenMMLab detection toolbox and benchmark.
- MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
- MMRotate: OpenMMLab rotated object detection toolbox and benchmark.
- MMYOLO: OpenMMLab YOLO series toolbox and benchmark.
- MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
- MMOCR: OpenMMLab text detection, recognition, and understanding toolbox.
- MMPose: OpenMMLab pose estimation toolbox and benchmark.
- MMHuman3D: OpenMMLab 3D human parametric model toolbox and benchmark.
- MMSelfSup: OpenMMLab self-supervised learning toolbox and benchmark.
- MMRazor: OpenMMLab model compression toolbox and benchmark.
- MMFewShot: OpenMMLab fewshot learning toolbox and benchmark.
- MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
- MMTracking: OpenMMLab video perception toolbox and benchmark.
- MMFlow: OpenMMLab optical flow toolbox and benchmark.
- MMagic: OpenMMLab Advanced, Generative and Intelligent Creation toolbox.
- MMGeneration: OpenMMLab image and video generative models toolbox.
- MMDeploy: OpenMMLab model deployment framework.
- Playground: A central hub for gathering and showcasing amazing projects built upon OpenMMLab.
Top Related Projects
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot