Convert Figma logo to code with AI

facebookresearch logomoco

PyTorch implementation of MoCo: https://arxiv.org/abs/1911.05722

4,714
775
4,714
62

Top Related Projects

1,989

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

4,047

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

1,146

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566

Quick Overview

MoCo (Momentum Contrast) is a self-supervised learning framework for visual representation learning, developed by Facebook AI Research. It aims to learn visual representations from large-scale unlabeled data, which can then be used for various downstream tasks such as image classification, object detection, and segmentation.

Pros

  • Achieves state-of-the-art results in self-supervised learning for computer vision tasks
  • Requires minimal data preprocessing and augmentation
  • Scalable to large datasets and can leverage distributed training
  • Applicable to various downstream tasks with minimal fine-tuning

Cons

  • Requires significant computational resources for training on large datasets
  • May not perform as well on small-scale datasets compared to supervised learning methods
  • Hyperparameter tuning can be challenging and time-consuming
  • Limited to visual representation learning, not directly applicable to other domains

Code Examples

  1. Loading a pre-trained MoCo model:
import torch
from moco.builder import MoCo

# Load the pre-trained MoCo v2 model
model = MoCo(dim=128, K=65536, m=0.999, T=0.07, mlp=True)
checkpoint = torch.load('moco_v2_800ep_pretrain.pth.tar', map_location="cpu")
state_dict = checkpoint['state_dict']
model.load_state_dict(state_dict, strict=False)
  1. Extracting features using a pre-trained MoCo model:
import torch
from torchvision import transforms
from PIL import Image

# Load and preprocess an image
image = Image.open('example.jpg')
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
input_tensor = preprocess(image).unsqueeze(0)

# Extract features
with torch.no_grad():
    features = model.encoder_q(input_tensor)
  1. Fine-tuning MoCo for image classification:
import torch.nn as nn

# Replace the MoCo head with a classification head
num_classes = 1000
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Fine-tune the model
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
criterion = nn.CrossEntropyLoss()

for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Getting Started

To get started with MoCo:

  1. Clone the repository:

    git clone https://github.com/facebookresearch/moco.git
    cd moco
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download a pre-trained model:

    wget https://dl.fbaipublicfiles.com/moco/moco_checkpoints/moco_v2_800ep/moco_v2_800ep_pretrain.pth.tar
    
  4. Use the pre-trained model in your project as shown in the code examples above.

Competitor Comparisons

1,989

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Pros of SwAV

  • Achieves better performance on downstream tasks, especially with fewer epochs of pre-training
  • Introduces a novel online clustering approach for self-supervised learning
  • Supports multi-crop augmentation, which can improve representation quality

Cons of SwAV

  • More complex implementation compared to MoCo's simpler contrastive learning approach
  • May require more computational resources due to the clustering step
  • Potentially more sensitive to hyperparameter tuning

Code Comparison

SwAV:

loss = swav_loss(output, prototype, args)
loss.backward()
optimizer.step()

MoCo:

loss = moco_loss(q, k, queue)
loss.backward()
optimizer.step()
update_queue(queue, k)

Both repositories implement self-supervised learning methods for visual representation learning. SwAV introduces a novel online clustering approach, while MoCo focuses on contrastive learning using a momentum encoder. SwAV generally achieves better performance but may be more complex to implement and tune. MoCo offers a simpler approach that can still yield competitive results. The code snippets highlight the different loss calculations and update mechanisms used in each method.

4,047

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Pros of SimCLR

  • Simpler architecture without requiring a memory bank or momentum encoder
  • Achieves state-of-the-art results on several benchmarks
  • Provides a more straightforward implementation of contrastive learning

Cons of SimCLR

  • Requires larger batch sizes and longer training times compared to MoCo
  • May be more computationally expensive due to the need for many negative samples
  • Less memory-efficient than MoCo's queue-based approach

Code Comparison

MoCo:

# Momentum update
self._momentum_update_key_encoder()

# Compute key features
with torch.no_grad():
    k = self.encoder_k(im_k)  # keys: NxC
    k = nn.functional.normalize(k, dim=1)

SimCLR:

# Apply the same transformation twice
x_i = data_transform(x)
x_j = data_transform(x)

# Compute features
h_i = self.encoder(x_i)
h_j = self.encoder(x_j)

# Normalize feature vectors
z_i = self.projection_head(h_i)
z_j = self.projection_head(h_j)

Both repositories implement self-supervised contrastive learning methods, but they differ in their approach. MoCo uses a momentum encoder and a queue-based memory bank, while SimCLR relies on large batch sizes and a simpler architecture. SimCLR may be easier to implement but could require more computational resources, whereas MoCo offers better memory efficiency at the cost of a slightly more complex setup.

1,146

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566

Pros of SimSiam

  • Simpler architecture without requiring a momentum encoder or large batches
  • Achieves competitive performance with less computational resources
  • More flexible and easier to implement in various frameworks

Cons of SimSiam

  • May require careful tuning of hyperparameters for optimal performance
  • Potentially less stable during training compared to MoCo
  • Limited to specific types of augmentations for effective learning

Code Comparison

SimSiam:

# SimSiam loss calculation
p1, p2 = predictor(encoder(x1)), predictor(encoder(x2))
z1, z2 = encoder(x1).detach(), encoder(x2).detach()
loss = -(F.cosine_similarity(p1, z2).mean() + F.cosine_similarity(p2, z1).mean()) * 0.5

MoCo:

# MoCo loss calculation
q = encoder_q(x_q)
k = encoder_k(x_k)
l_pos = torch.einsum('nc,nc->n', [q, k]).unsqueeze(-1)
l_neg = torch.einsum('nc,ck->nk', [q, queue.clone().detach()])
logits = torch.cat([l_pos, l_neg], dim=1)
loss = nn.CrossEntropyLoss()(logits / temperature, torch.zeros(logits.shape[0], dtype=torch.long))

Both repositories implement self-supervised learning methods for visual representation learning. SimSiam offers a simpler approach with competitive performance, while MoCo provides a more established framework with potential benefits in stability and scalability.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

This is a PyTorch implementation of the MoCo paper:

@Article{he2019moco,
  author  = {Kaiming He and Haoqi Fan and Yuxin Wu and Saining Xie and Ross Girshick},
  title   = {Momentum Contrast for Unsupervised Visual Representation Learning},
  journal = {arXiv preprint arXiv:1911.05722},
  year    = {2019},
}

It also includes the implementation of the MoCo v2 paper:

@Article{chen2020mocov2,
  author  = {Xinlei Chen and Haoqi Fan and Ross Girshick and Kaiming He},
  title   = {Improved Baselines with Momentum Contrastive Learning},
  journal = {arXiv preprint arXiv:2003.04297},
  year    = {2020},
}

Preparation

Install PyTorch and ImageNet dataset following the official PyTorch ImageNet training code.

This repo aims to be minimal modifications on that code. Check the modifications by:

diff main_moco.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)
diff main_lincls.py <(curl https://raw.githubusercontent.com/pytorch/examples/master/imagenet/main.py)

Unsupervised Training

This implementation only supports multi-gpu, DistributedDataParallel training, which is faster and simpler; single-gpu or DataParallel training is not supported.

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_moco.py \
  -a resnet50 \
  --lr 0.03 \
  --batch-size 256 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

This script uses all the default hyper-parameters as described in the MoCo v1 paper. To run MoCo v2, set --mlp --moco-t 0.2 --aug-plus --cos.

Note: for 4-gpu training, we recommend following the linear lr scaling recipe: --lr 0.015 --batch-size 128 with 4 gpus. We got similar results using this setting.

Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --lr 30.0 \
  --batch-size 256 \
  --pretrained [your checkpoint path]/checkpoint_0199.pth.tar \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  [your imagenet-folder with train and val folders]

Linear classification results on ImageNet using this repo with 8 NVIDIA V100 GPUs :

pre-train
epochs
pre-train
time
MoCo v1
top-1 acc.
MoCo v2
top-1 acc.
ResNet-50 200 53 hours 60.8±0.2 67.5±0.1

Here we run 5 trials (of pre-training and linear classification) and report mean±std: the 5 results of MoCo v1 are {60.6, 60.6, 60.7, 60.9, 61.1}, and of MoCo v2 are {67.7, 67.6, 67.4, 67.6, 67.3}.

Models

Our pre-trained ResNet-50 models can be downloaded as following:

epochs mlp aug+ cos top-1 acc. model md5
MoCo v1 200 60.6 download b251726a
MoCo v2 200 67.7 download 59fd9945
MoCo v2 800 71.1 download a04e12f8

Transferring to Object Detection

See ./detection.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

See Also