simsiam

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566

1,199

174

1,199

View on GitHub

Top Related Projects

moco-v3

1,280

PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057

simclr

4,334

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

lightly

3,453

A python library for self-supervised learning on images.

swav

2,052

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Quick Overview

SimSiam is a self-supervised learning framework for visual representation learning, developed by Facebook Research. It simplifies contrastive learning approaches by removing negative sample pairs and large batches, achieving competitive performance with a simple Siamese network architecture.

Pros

Simple and efficient architecture without requiring negative pairs or large batches
Achieves competitive performance on various downstream tasks
Requires less computational resources compared to other self-supervised learning methods
Provides insights into the working mechanisms of self-supervised learning

Cons

May require careful tuning of hyperparameters for optimal performance
Limited to visual representation learning tasks
Potential sensitivity to data augmentation strategies
Relatively new approach, with less extensive validation compared to more established methods

Code Examples

Loading a pre-trained SimSiam model:

import torch
from simsiam import SimSiam

# Load pre-trained SimSiam model
model = SimSiam.load_from_checkpoint('path/to/checkpoint.ckpt')
model.eval()

Extracting features from an image:

from torchvision import transforms
from PIL import Image

# Prepare image transformation
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load and transform image
image = Image.open('path/to/image.jpg')
img_tensor = transform(image).unsqueeze(0)

# Extract features
with torch.no_grad():
    features = model.backbone(img_tensor)

Training SimSiam on a custom dataset:

from pytorch_lightning import Trainer
from simsiam import SimSiam
from torch.utils.data import DataLoader

# Initialize SimSiam model
model = SimSiam(backbone='resnet50')

# Prepare your custom dataset and dataloader
train_dataset = YourCustomDataset()
train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)

# Train the model
trainer = Trainer(max_epochs=100, gpus=1)
trainer.fit(model, train_loader)

Getting Started

To get started with SimSiam:

Install the required dependencies:

pip install torch torchvision pytorch-lightning

Clone the SimSiam repository:

git clone https://github.com/facebookresearch/simsiam.git
cd simsiam

Train SimSiam on ImageNet or your custom dataset:

from pytorch_lightning import Trainer
from simsiam import SimSiam

model = SimSiam(backbone='resnet50')
trainer = Trainer(max_epochs=100, gpus=1)
trainer.fit(model, train_dataloader)

For more detailed instructions and advanced usage, refer to the repository's README and documentation.

Competitor Comparisons

moco-v3

1,280

PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057

Pros of MoCo v3

Improved performance on downstream tasks compared to SimSiam
More robust to different batch sizes and learning rates
Supports multi-GPU training out of the box

Cons of MoCo v3

Slightly more complex implementation due to the momentum encoder
Requires more memory during training due to the additional encoder
May be slower to train compared to SimSiam

Code Comparison

MoCo v3:

# Momentum update
self._momentum_update_key_encoder()

# Compute key features
with torch.no_grad():
    k = self.encoder_k(im_k)

SimSiam:

# No momentum encoder, directly use the main encoder
z1 = self.encoder(x1)
z2 = self.encoder(x2)

# Prediction and stop-gradient
p1, p2 = self.predictor(z1), self.predictor(z2)
z1, z2 = z1.detach(), z2.detach()

The main difference in the code is that MoCo v3 uses a momentum encoder and a queue of negative samples, while SimSiam uses a stop-gradient operation and a predictor network. MoCo v3's approach allows for more stable training and better performance, but at the cost of increased complexity and memory usage.

simclr

4,334

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Pros of SimCLR

More extensive documentation and examples
Wider range of supported architectures and datasets
Active community support and regular updates

Cons of SimCLR

Higher computational requirements
More complex implementation
Potentially harder to fine-tune for specific tasks

Code Comparison

SimCLR:

# Data augmentation
transform = transforms.Compose([
    transforms.RandomResizedCrop(size=32),
    transforms.RandomHorizontalFlip(),
    transforms.RandomApply([transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)], p=0.8),
    transforms.RandomGrayscale(p=0.2),
    transforms.ToTensor(),
    transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])
])

SimSiam:

# Data augmentation
transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomApply([transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)], p=0.8),
    transforms.RandomGrayscale(p=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Both repositories implement self-supervised learning methods for visual representation learning. SimCLR offers more flexibility and extensive documentation, while SimSiam provides a simpler implementation with potentially lower computational requirements. The code comparison shows similar data augmentation techniques, with minor differences in normalization values and image sizes.

lightly

3,453

A python library for self-supervised learning on images.

Pros of Lightly

More comprehensive self-supervised learning framework with multiple methods
Actively maintained with regular updates and community support
Includes data curation and dataset management features

Cons of Lightly

Potentially more complex to use due to broader feature set
May have higher computational requirements for some tasks
Less focused on a single, highly optimized method like SimSiam

Code Comparison

SimSiam:

# SimSiam-specific loss calculation
loss = -(z1.detach() * p2).sum(dim=1).mean() / 2 + \
       -(z2.detach() * p1).sum(dim=1).mean() / 2

Lightly:

# Lightly's modular approach
criterion = NTXentLoss()
loss = criterion(out0, out1)

SimSiam focuses on a specific self-supervised learning method, while Lightly offers a more modular and flexible approach to implementing various self-supervised learning techniques. SimSiam's implementation is more streamlined for its particular method, whereas Lightly provides a broader toolkit for different scenarios and use cases in self-supervised learning and data curation.

swav

2,052

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Pros of SwAV

Supports multi-crop augmentation, potentially leading to better performance
Includes a clustering step, which can help in learning more robust representations
Offers more flexibility in terms of batch size and number of prototypes

Cons of SwAV

More complex implementation compared to SimSiam
May require more computational resources due to the clustering step
Potentially more sensitive to hyperparameter tuning

Code Comparison

SwAV:

loss = swav_loss(output, queue, epoch)
loss.backward()
optimizer.step()

SimSiam:

loss = D(p1, z2) / 2 + D(p2, z1) / 2
loss.backward()
optimizer.step()

Both repositories implement self-supervised learning methods for visual representation learning. SwAV (Swapping Assignments between Views) uses a clustering approach and supports multi-crop augmentation, which can lead to improved performance. However, it has a more complex implementation and may require more computational resources.

SimSiam, on the other hand, offers a simpler approach with a straightforward implementation. It uses a Siamese network architecture and doesn't require large batches or momentum encoders, making it potentially easier to train and adapt to different scenarios.

The code comparison shows the difference in loss calculation between the two methods. SwAV uses a specific swav_loss function, while SimSiam calculates the loss using a similarity measure D between the projections and targets of two views.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

SimSiam: Exploring Simple Siamese Representation Learning

simsiam

This is a PyTorch implementation of the SimSiam paper:

@Article{chen2020simsiam,
  author  = {Xinlei Chen and Kaiming He},
  title   = {Exploring Simple Siamese Representation Learning},
  journal = {arXiv preprint arXiv:2011.10566},
  year    = {2020},
}

Preparation

Install PyTorch and download the ImageNet dataset following the official PyTorch ImageNet training code. Similar to MoCo, the code release contains minimal modifications for both unsupervised pre-training and linear classification to that code.

In addition, install apex for the LARS implementation needed for linear classification.

Unsupervised Pre-Training

Only multi-gpu, DistributedDataParallel training is supported; single-gpu or DataParallel training is not supported.

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_simsiam.py \
  -a resnet50 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  --fix-pred-lr \
  [your imagenet-folder with train and val folders]

The script uses all the default hyper-parameters as described in the paper, and uses the default augmentation recipe from MoCo v2.

The above command performs pre-training with a non-decaying predictor learning rate for 100 epochs, corresponding to the last row of Table 1 in the paper.

Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/checkpoint_0099.pth.tar \
  --lars \
  [your imagenet-folder with train and val folders]

The above command uses LARS optimizer and a default batch size of 4096.

Models and Logs

Our pre-trained ResNet-50 models and logs:

pre-train epochs	batch size	pre-train ckpt	pre-train log	linear cls. ckpt	linear cls. log	top-1 acc.
100	512	link	link	link	link	68.1
100	256	link	link	link	link	68.3

Settings for the above: 8 NVIDIA V100 GPUs, CUDA 10.1/CuDNN 7.6.5, PyTorch 1.7.0.

Transferring to Object Detection

Same as MoCo for object detection transfer, please see moco/detection.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot