Convert Figma logo to code with AI

facebookresearch logosimsiam

PyTorch implementation of SimSiam https//arxiv.org/abs/2011.10566

1,170
176
1,170
13

Top Related Projects

1,204

PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057

4,075

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

3,213

A python library for self-supervised learning on images.

1,999

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Quick Overview

SimSiam is a self-supervised learning framework for visual representation learning, developed by Facebook Research. It simplifies contrastive learning approaches by removing negative sample pairs and large batches, achieving competitive performance with a simple Siamese network architecture.

Pros

  • Simple and efficient architecture without requiring negative pairs or large batches
  • Achieves competitive performance on various downstream tasks
  • Requires less computational resources compared to other self-supervised learning methods
  • Provides insights into the working mechanisms of self-supervised learning

Cons

  • May require careful tuning of hyperparameters for optimal performance
  • Limited to visual representation learning tasks
  • Potential sensitivity to data augmentation strategies
  • Relatively new approach, with less extensive validation compared to more established methods

Code Examples

  1. Loading a pre-trained SimSiam model:
import torch
from simsiam import SimSiam

# Load pre-trained SimSiam model
model = SimSiam.load_from_checkpoint('path/to/checkpoint.ckpt')
model.eval()
  1. Extracting features from an image:
from torchvision import transforms
from PIL import Image

# Prepare image transformation
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load and transform image
image = Image.open('path/to/image.jpg')
img_tensor = transform(image).unsqueeze(0)

# Extract features
with torch.no_grad():
    features = model.backbone(img_tensor)
  1. Training SimSiam on a custom dataset:
from pytorch_lightning import Trainer
from simsiam import SimSiam
from torch.utils.data import DataLoader

# Initialize SimSiam model
model = SimSiam(backbone='resnet50')

# Prepare your custom dataset and dataloader
train_dataset = YourCustomDataset()
train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True)

# Train the model
trainer = Trainer(max_epochs=100, gpus=1)
trainer.fit(model, train_loader)

Getting Started

To get started with SimSiam:

  1. Install the required dependencies:
pip install torch torchvision pytorch-lightning
  1. Clone the SimSiam repository:
git clone https://github.com/facebookresearch/simsiam.git
cd simsiam
  1. Train SimSiam on ImageNet or your custom dataset:
from pytorch_lightning import Trainer
from simsiam import SimSiam

model = SimSiam(backbone='resnet50')
trainer = Trainer(max_epochs=100, gpus=1)
trainer.fit(model, train_dataloader)

For more detailed instructions and advanced usage, refer to the repository's README and documentation.

Competitor Comparisons

1,204

PyTorch implementation of MoCo v3 https//arxiv.org/abs/2104.02057

Pros of MoCo v3

  • Improved performance on downstream tasks compared to SimSiam
  • More robust to different batch sizes and learning rates
  • Supports multi-GPU training out of the box

Cons of MoCo v3

  • Slightly more complex implementation due to the momentum encoder
  • Requires more memory during training due to the additional encoder
  • May be slower to train compared to SimSiam

Code Comparison

MoCo v3:

# Momentum update
self._momentum_update_key_encoder()

# Compute key features
with torch.no_grad():
    k = self.encoder_k(im_k)

SimSiam:

# No momentum encoder, directly use the main encoder
z1 = self.encoder(x1)
z2 = self.encoder(x2)

# Prediction and stop-gradient
p1, p2 = self.predictor(z1), self.predictor(z2)
z1, z2 = z1.detach(), z2.detach()

The main difference in the code is that MoCo v3 uses a momentum encoder and a queue of negative samples, while SimSiam uses a stop-gradient operation and a predictor network. MoCo v3's approach allows for more stable training and better performance, but at the cost of increased complexity and memory usage.

4,075

SimCLRv2 - Big Self-Supervised Models are Strong Semi-Supervised Learners

Pros of SimCLR

  • More extensive documentation and examples
  • Wider range of supported architectures and datasets
  • Active community support and regular updates

Cons of SimCLR

  • Higher computational requirements
  • More complex implementation
  • Potentially harder to fine-tune for specific tasks

Code Comparison

SimCLR:

# Data augmentation
transform = transforms.Compose([
    transforms.RandomResizedCrop(size=32),
    transforms.RandomHorizontalFlip(),
    transforms.RandomApply([transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)], p=0.8),
    transforms.RandomGrayscale(p=0.2),
    transforms.ToTensor(),
    transforms.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010])
])

SimSiam:

# Data augmentation
transform = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.RandomApply([transforms.ColorJitter(0.4, 0.4, 0.4, 0.1)], p=0.8),
    transforms.RandomGrayscale(p=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

Both repositories implement self-supervised learning methods for visual representation learning. SimCLR offers more flexibility and extensive documentation, while SimSiam provides a simpler implementation with potentially lower computational requirements. The code comparison shows similar data augmentation techniques, with minor differences in normalization values and image sizes.

3,213

A python library for self-supervised learning on images.

Pros of Lightly

  • More comprehensive self-supervised learning framework with multiple methods
  • Actively maintained with regular updates and community support
  • Includes data curation and dataset management features

Cons of Lightly

  • Potentially more complex to use due to broader feature set
  • May have higher computational requirements for some tasks
  • Less focused on a single, highly optimized method like SimSiam

Code Comparison

SimSiam:

# SimSiam-specific loss calculation
loss = -(z1.detach() * p2).sum(dim=1).mean() / 2 + \
       -(z2.detach() * p1).sum(dim=1).mean() / 2

Lightly:

# Lightly's modular approach
criterion = NTXentLoss()
loss = criterion(out0, out1)

SimSiam focuses on a specific self-supervised learning method, while Lightly offers a more modular and flexible approach to implementing various self-supervised learning techniques. SimSiam's implementation is more streamlined for its particular method, whereas Lightly provides a broader toolkit for different scenarios and use cases in self-supervised learning and data curation.

1,999

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882

Pros of SwAV

  • Supports multi-crop augmentation, potentially leading to better performance
  • Includes a clustering step, which can help in learning more robust representations
  • Offers more flexibility in terms of batch size and number of prototypes

Cons of SwAV

  • More complex implementation compared to SimSiam
  • May require more computational resources due to the clustering step
  • Potentially more sensitive to hyperparameter tuning

Code Comparison

SwAV:

loss = swav_loss(output, queue, epoch)
loss.backward()
optimizer.step()

SimSiam:

loss = D(p1, z2) / 2 + D(p2, z1) / 2
loss.backward()
optimizer.step()

Both repositories implement self-supervised learning methods for visual representation learning. SwAV (Swapping Assignments between Views) uses a clustering approach and supports multi-crop augmentation, which can lead to improved performance. However, it has a more complex implementation and may require more computational resources.

SimSiam, on the other hand, offers a simpler approach with a straightforward implementation. It uses a Siamese network architecture and doesn't require large batches or momentum encoders, making it potentially easier to train and adapt to different scenarios.

The code comparison shows the difference in loss calculation between the two methods. SwAV uses a specific swav_loss function, while SimSiam calculates the loss using a similarity measure D between the projections and targets of two views.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

SimSiam: Exploring Simple Siamese Representation Learning

simsiam

This is a PyTorch implementation of the SimSiam paper:

@Article{chen2020simsiam,
  author  = {Xinlei Chen and Kaiming He},
  title   = {Exploring Simple Siamese Representation Learning},
  journal = {arXiv preprint arXiv:2011.10566},
  year    = {2020},
}

Preparation

Install PyTorch and download the ImageNet dataset following the official PyTorch ImageNet training code. Similar to MoCo, the code release contains minimal modifications for both unsupervised pre-training and linear classification to that code.

In addition, install apex for the LARS implementation needed for linear classification.

Unsupervised Pre-Training

Only multi-gpu, DistributedDataParallel training is supported; single-gpu or DataParallel training is not supported.

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

python main_simsiam.py \
  -a resnet50 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  --fix-pred-lr \
  [your imagenet-folder with train and val folders]

The script uses all the default hyper-parameters as described in the paper, and uses the default augmentation recipe from MoCo v2.

The above command performs pre-training with a non-decaying predictor learning rate for 100 epochs, corresponding to the last row of Table 1 in the paper.

Linear Classification

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8-gpu machine, run:

python main_lincls.py \
  -a resnet50 \
  --dist-url 'tcp://localhost:10001' --multiprocessing-distributed --world-size 1 --rank 0 \
  --pretrained [your checkpoint path]/checkpoint_0099.pth.tar \
  --lars \
  [your imagenet-folder with train and val folders]

The above command uses LARS optimizer and a default batch size of 4096.

Models and Logs

Our pre-trained ResNet-50 models and logs:

pre-train
epochs
batch
size
pre-train
ckpt
pre-train
log
linear cls.
ckpt
linear cls.
log
top-1 acc.
100 512 link link link link 68.1
100 256 link link link link 68.3

Settings for the above: 8 NVIDIA V100 GPUs, CUDA 10.1/CuDNN 7.6.5, PyTorch 1.7.0.

Transferring to Object Detection

Same as MoCo for object detection transfer, please see moco/detection.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.