moco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722

4,926

801

4,926

View on GitHub

Top Related Projects

swav

2,029

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

simclr

4,174

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

simsiam

1,184

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566

Quick Overview

MoCo (Momentum Contrast) is a self-supervised learning framework for visual representation learning, developed by Facebook AI Research. It aims to learn visual representations from large-scale unlabeled data, which can then be used for various downstream tasks such as image classification, object detection, and segmentation.

Pros

Achieves state-of-the-art results in self-supervised learning for computer vision tasks
Requires minimal data preprocessing and augmentation
Scalable to large datasets and can leverage distributed training
Applicable to various downstream tasks with minimal fine-tuning

Cons

Requires significant computational resources for training on large datasets
May not perform as well on small-scale datasets compared to supervised learning methods
Hyperparameter tuning can be challenging and time-consuming
Limited to visual representation learning, not directly applicable to other domains

Code Examples

Loading a pre-trained MoCo model:

import torch
from moco.builder import MoCo

# Load the pre-trained MoCo v2 model
model = MoCo(dim=128, K=65536, m=0.999, T=0.07, mlp=True)
checkpoint = torch.load('moco_v2_800ep_pretrain.pth.tar', map_location="cpu")
state_dict = checkpoint['state_dict']
model.load_state_dict(state_dict, strict=False)

Extracting features using a pre-trained MoCo model:

import torch
from torchvision import transforms
from PIL import Image

# Load and preprocess an image
image = Image.open('example.jpg')
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
input_tensor = preprocess(image).unsqueeze(0)

# Extract features
with torch.no_grad():
    features = model.encoder_q(input_tensor)

Fine-tuning MoCo for image classification:

import torch.nn as nn

# Replace the MoCo head with a classification head
num_classes = 1000
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Fine-tune the model
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
criterion = nn.CrossEntropyLoss()

for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Getting Started

To get started with MoCo:

Clone the repository:

git clone https://github.com/facebookresearch/moco.git
cd moco

Install dependencies:
```
pip install -r requirements.txt
```

Download a pre-trained model:

wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_800ep/moco_v2_800ep_pretrain.pth.tar

Use the pre-trained model in your project as shown in the code examples above.

Competitor Comparisons

swav

2,029

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Pros of SwAV

Achieves better performance on downstream tasks, especially with fewer epochs of pre-training
Introduces a novel online clustering approach for self-supervised learning
Supports multi-crop augmentation, which can improve representation quality

Cons of SwAV

More complex implementation compared to MoCo's simpler contrastive learning approach
May require more computational resources due to the clustering step
Potentially more sensitive to hyperparameter tuning

Code Comparison

SwAV:

loss = swav_loss(output, prototype, args)
loss.backward()
optimizer.step()

MoCo:

loss = moco_loss(q, k, queue)
loss.backward()
optimizer.step()
update_queue(queue, k)

Both repositories implement self-supervised learning methods for visual representation learning. SwAV introduces a novel online clustering approach, while MoCo focuses on contrastive learning using a momentum encoder. SwAV generally achieves better performance but may be more complex to implement and tune. MoCo offers a simpler approach that can still yield competitive results. The code snippets highlight the different loss calculations and update mechanisms used in each method.

simclr

4,174

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Pros of SimCLR

Simpler architecture without requiring a memory bank or momentum encoder
Achieves state-of-the-art results on several benchmarks
Provides a more straightforward implementation of contrastive learning

Cons of SimCLR

Requires larger batch sizes and longer training times compared to MoCo
May be more computationally expensive due to the need for many negative samples
Less memory-efficient than MoCo's queue-based approach

Code Comparison

MoCo:

# Momentum update
self._momentum_update_key_encoder()

# Compute key features
with torch.no_grad():
    k = self.encoder_k(im_k)  # keys: NxC
    k = nn.functional.normalize(k, dim=1)

SimCLR:

# Apply the same transformation twice
x_i = data_transform(x)
x_j = data_transform(x)

# Compute features
h_i = self.encoder(x_i)
h_j = self.encoder(x_j)

# Normalize feature vectors
z_i = self.projection_head(h_i)
z_j = self.projection_head(h_j)

Both repositories implement self-supervised contrastive learning methods, but they differ in their approach. MoCo uses a momentum encoder and a queue-based memory bank, while SimCLR relies on large batch sizes and a simpler architecture. SimCLR may be easier to implement but could require more computational resources, whereas MoCo offers better memory efficiency at the cost of a slightly more complex setup.

simsiam

1,184

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566

Pros of SimSiam

Simpler architecture without requiring a momentum encoder or large batches
Achieves competitive performance with less computational resources
More flexible and easier to implement in various frameworks

Cons of SimSiam

May require careful tuning of hyperparameters for optimal performance
Potentially less stable during training compared to MoCo
Limited to specific types of augmentations for effective learning

Code Comparison

SimSiam:

# SimSiam loss calculation
p1, p2 = predictor(encoder(x1)), predictor(encoder(x2))
z1, z2 = encoder(x1).detach(), encoder(x2).detach()
loss = -(F.cosine_similarity(p1, z2).mean() + F.cosine_similarity(p2, z1).mean()) * 0.5

MoCo:

# MoCo loss calculation
q = encoder_q(x_q)
k = encoder_k(x_k)
l_pos = torch.einsum('nc,nc->n', [q, k]).unsqueeze(-1)
l_neg = torch.einsum('nc,ck->nk', [q, queue.clone().detach()])
logits = torch.cat([l_pos, l_neg], dim=1)
loss = nn.CrossEntropyLoss()(logits / temperature, torch.zeros(logits.shape[0], dtype=torch.long))

Both repositories implement self-supervised learning methods for visual representation learning. SimSiam offers a simpler approach with competitive performance, while MoCo provides a more established framework with potential benefits in stability and scalability.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

This is a PyTorch implementation of the MoCo paper:

@Article{he2019moco,
  author  = {Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  title   = {Momentum Contrast for Unsupervised Visual Representation Learning},
  journal = {arXiv preprint arXiv:1911.05722},
  year    = {2019},
}

It also includes the implementation of the MoCo v2 paper:

@Article{chen2020mocov2,
  author  = {Xinlei Chen and Haoqi Fan and Ross Girshick and Kaiming He},
  title   = {Improved Baselines with Momentum Contrastive Learning},
  journal = {arXiv preprint arXiv:2003.04297},
  year    = {2020},
}

Preparation

Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.

This repo aims to be minimal modifications on that code. Check the modifications by:

diff main_moco.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)
diff main_lincls.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)

Unsupervised Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_moco.py \
  -a resnet50 \
  --lr 0.03 \
  --batch-size 256 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

This script uses all the default hyper-parameters as described in the MoCo v1 paper. To run MoCo v2, set --mlp --moco-t 0.2 --aug-plus --cos.

Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128 with 4 gpus. We got similar results using this setting.

Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --lr 30.0 \
  --batch-size 256 \
  --pretrained [your checkpoint path]/checkpoint_0199.pth.tar \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Linear classification results on ImageNet using this repo with 8 NVIDIA V100 GPUs :

	pre-train epochs	pre-train time	MoCo v1 top-1 acc.	MoCo v2 top-1 acc.
ResNet-50	200	53 hours	60.8±0.2	67.5±0.1

Here we run 5 trials (of pre-training and linear classification) and report mean±std: the 5 results of MoCo v1 are {60.6, 60.6, 60.7, 60.9, 61.1}, and of MoCo v2 are {67.7, 67.6, 67.4, 67.6, 67.3}.

Models

Our pre-trained ResNet-50 models can be downloaded as following:

	epochs	mlp	aug+	cos	top-1 acc.	model	md5
MoCo v1	200				60.6	download	`b251726a`
MoCo v2	200	✓	✓	✓	67.7	download	`59fd9945`
MoCo v2	800	✓	✓	✓	71.1	download	`a04e12f8`

Transferring to Object Detection

See ./detection.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Menu

moco

Top Related Projects

swav

simclr

simsiam

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

swav

Pros of SwAV

Cons of SwAV

Code Comparison

simclr

Pros of SimCLR

Cons of SimCLR

Code Comparison

simsiam

Pros of SimSiam

Cons of SimSiam

Code Comparison

Convert designs to code with AI

README

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

Preparation

Unsupervised Training

Linear Classification

Models

Transferring to Object Detection

License

See Also

Top Related Projects

swav

simclr

simsiam

Convert designs to code with AI