vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

3,284

335

3,284

View on GitHub

Top Related Projects

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

CLIP

29,576

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

jax

32,065

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

DALL-E

10,875

PyTorch package for the discrete VAE used for DALL·E.

Quick Overview

VISSL (Vision Transformer Self-Supervised Learning) is a PyTorch-based library for self-supervised learning of visual representations. It provides a flexible and modular framework for training and evaluating self-supervised vision models, with a focus on Vision Transformers (ViTs) and other state-of-the-art architectures.

Pros

Flexible and Modular Design: VISSL offers a highly configurable and extensible framework, allowing researchers and developers to easily experiment with different self-supervised learning approaches, model architectures, and training strategies.
State-of-the-Art Performance: The library includes implementations of various state-of-the-art self-supervised learning methods, such as DINO, MoCo, and SwAV, which have demonstrated impressive results on a wide range of computer vision tasks.
Extensive Documentation and Examples: VISSL comes with detailed documentation, tutorials, and example scripts, making it easier for users to get started and understand the library's capabilities.
Active Development and Community: The project is actively maintained by the Facebook AI Research team and has a growing community of contributors, ensuring ongoing improvements and support.

Cons

Steep Learning Curve: While the library is well-documented, the complexity of self-supervised learning and the flexibility of VISSL's design can make it challenging for newcomers to get started.
Resource-Intensive Training: Training self-supervised models, especially on large-scale datasets, can be computationally expensive and require significant hardware resources, such as high-end GPUs.
Limited Support for Non-Vision Tasks: VISSL is primarily focused on self-supervised learning for computer vision tasks, and its applicability to other domains, such as natural language processing or speech recognition, may be limited.
Potential Bias in Pretrained Models: As with any machine learning model, the pretrained weights provided by VISSL may inherit biases present in the training data, which can impact downstream applications.

Code Examples

# Example: Training a DINO model on ImageNet
from vissl.config.attr_dict import AttrDict
from vissl.utils.hydra_config import compose_hydra_configuration

cfg = compose_hydra_configuration(
    [
        "config=dino/dino_vit_small_patch16",
        "dataset=imagenet",
        "num_nodes=1",
        "num_gpus=8",
    ]
)

from vissl.trainer import train_model
train_model(cfg)

This code example demonstrates how to train a DINO (Distillation with no Labels) model on the ImageNet dataset using VISSL.

# Example: Evaluating a pretrained model on a downstream task
from vissl.models.model_registry import model_zoo
from vissl.utils.checkpoint import init_model_from_checkpoint

# Load a pretrained DINO model
model = model_zoo.get_model_and_optimizers("dino_vit_small_patch16")
model = init_model_from_checkpoint(model, "path/to/checkpoint.torch")

# Evaluate the model on a downstream task
from vissl.evaluation.evaluation_interface import EvaluationInterface
evaluator = EvaluationInterface(model, dataset_name="imagenet")
top1_acc, top5_acc = evaluator.run_evaluation()

This example shows how to load a pretrained DINO model and use the EvaluationInterface to evaluate its performance on the ImageNet dataset.

# Example: Extracting features using a pretrained model
from vissl.models.model_registry import model_zoo
from vissl.utils.checkpoint import init_model_from_checkpoint
from vissl.data.data_helper import get_data_loader

# Load a pretrained DINO model
model = model_zoo.get_model_and_optimizers("dino_vit_small_patch16")
model = init_model_from_checkpoint(model, "path/to/checkpoint.torch")

# Extract features from a dataset
data_loader = get_data_loader("imagenet", model.device)
features = model.forward_features(data_loader)

This example demonstrates how to use a pretrained DINO model to extract features from a dataset.

Competitor Comparisons

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

PyTorch is a widely-used and well-established deep learning framework, with a large and active community.
PyTorch provides a flexible and intuitive API, making it easy for developers to build and experiment with complex models.
PyTorch has extensive documentation and a wealth of pre-built models and utilities, which can save developers a significant amount of time.

Cons of PyTorch

PyTorch can be more resource-intensive than some other deep learning frameworks, particularly for large-scale deployments.
PyTorch's dynamic computational graph can make it more difficult to optimize for production environments, where a static graph may be preferred.

Code Comparison

PyTorch:

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc1 = nn.Linear(100, 50)
        self.fc2 = nn.Linear(50, 10)

VISSL:

from vissl.models.model_helpers import build_model
from vissl.config.attr_dict import AttrDict

config = AttrDict({
    "MODEL": {
        "name": "resnet50",
        "params": {
            "num_classes": 1000,
        },
    },
})

model = build_model(config)

CLIP

29,576

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Pros of CLIP

CLIP is a state-of-the-art multimodal model that can perform a wide range of zero-shot tasks, making it highly versatile.
The model is pre-trained on a large and diverse dataset, which gives it strong performance on a variety of tasks.
CLIP is open-sourced and available for use by the research community, which can lead to further advancements and applications.

Cons of CLIP

CLIP is a large and complex model, which can make it computationally expensive to use, especially on resource-constrained devices.
The model's performance can be sensitive to the specific task and dataset, and may not always outperform specialized models.
The training process for CLIP is not as well-documented as some other models, which can make it more challenging to understand and extend.

Code Comparison

VISSL:

from vissl.config.attr_dict import AttrDict
from vissl.utils.hydra_config import compose_hydra_configuration

cfg = compose_hydra_configuration(["config=pretrain/ssl/simclr"])

CLIP:

import clip
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

jax

32,065

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Pros of JAX

More flexible and general-purpose ML framework
Better performance on TPUs and multi-GPU setups
Simpler API with functional programming paradigm

Cons of JAX

Less focused on computer vision tasks specifically
Fewer pre-built models and datasets for vision
Steeper learning curve for those new to functional programming

Code Comparison

VISSL (PyTorch-based):

from vissl.models import build_model
from vissl.data import build_dataset

model = build_model(cfg)
dataset = build_dataset(cfg)

JAX:

import jax
import jax.numpy as jnp

def model(params, inputs):
    # Define model architecture
    return outputs

dataset = load_dataset()  # Custom implementation

VISSL is more specialized for computer vision tasks, providing ready-to-use components for self-supervised learning. JAX offers a more flexible approach, allowing for custom implementations of models and datasets using its functional programming paradigm.

VISSL includes more vision-specific features out-of-the-box, while JAX requires more manual implementation but offers greater flexibility across various ML domains.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Transformers provides a wide range of pre-trained models for various NLP tasks, making it easy to fine-tune and use in your own projects.
The library has extensive documentation and a large community, providing ample support and resources for users.
Transformers integrates well with popular deep learning frameworks like PyTorch and TensorFlow, allowing for seamless integration into your existing workflows.

Cons of Transformers

Transformers is primarily focused on NLP tasks, while VISSL is more versatile, covering a broader range of computer vision applications.
The Transformers library can be more complex to set up and configure, especially for users new to the field of NLP.

Code Comparison

Transformers (Hugging Face):

from transformers import pipeline

# Load a pre-trained model for sentiment analysis
sentiment_analyzer = pipeline('sentiment-analysis')

# Classify the sentiment of a given text
result = sentiment_analyzer('This movie was amazing!')
print(result)

VISSL (Facebook Research):

import torch
from vissl.config.attr_dict import AttrDict
from vissl.models import build_model

# Load a pre-trained VISSL model
cfg = AttrDict({"MODEL": {"TRUNK": {"NAME": "resnet50"}}})
model = build_model(cfg)

# Forward pass through the model
input_image = torch.randn(1, 3, 224, 224)
output = model(input_image)
print(output.shape)

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Broader focus on optimizing large-scale deep learning training across various models and tasks
More extensive optimization techniques, including ZeRO, pipeline parallelism, and 3D parallelism
Active development with frequent updates and new features

Cons of DeepSpeed

Steeper learning curve due to more complex optimization techniques
May require more configuration and tuning for optimal performance
Less focused on self-supervised learning compared to VISSL

Code Comparison

VISSL (config-based approach):

config = [
    "config=test/integration_test/quick_simclr",
    "config.DATA.TRAIN.DATA_SOURCES=[synthetic]",
    "config.OPTIMIZER.num_epochs=2",
]
trainer = VisslTrainer(cfg=config)
trainer.train()

DeepSpeed (integration with PyTorch):

model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)
for step, batch in enumerate(data_loader):
    loss = model_engine(batch)
    model_engine.backward(loss)
    model_engine.step()

DALL-E

10,875

PyTorch package for the discrete VAE used for DALL·E.

Pros of DALL-E

DALL-E is a state-of-the-art text-to-image generation model, capable of producing highly realistic and creative images from textual descriptions.
The model has been trained on a vast dataset, allowing it to generate a wide variety of images across different domains.
DALL-E has demonstrated impressive capabilities in tasks such as image editing, inpainting, and style transfer.

Cons of DALL-E

DALL-E is a proprietary model developed by OpenAI, and the code and training data are not publicly available, limiting the ability to reproduce or extend the research.
The model's training process and architectural details are not fully transparent, making it difficult to understand the inner workings and potential biases.
DALL-E's deployment and usage are subject to OpenAI's policies and restrictions, which may limit its accessibility and flexibility for certain applications.

Code Comparison

VISSL (PyTorch):

from vissl.config.attr_dict import AttrDict
from vissl.models.model_helpers import get_model
from vissl.utils.checkpoint import init_model_from_checkpoint

# Load the model
cfg = AttrDict({"MODEL": {"NAME": "resnet50"}})
model = get_model(cfg)

# Initialize the model from a checkpoint
checkpoint = init_model_from_checkpoint(cfg, "path/to/checkpoint.pth")
model.load_state_dict(checkpoint["model_state_dict"])

DALL-E (not publicly available):

# DALL-E code is not publicly available
# The model is a proprietary creation of OpenAI
# and the details of its implementation are not disclosed

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

What's New

Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here.

[Feb 2022]: Releasing SEER 10B parameters model implementation and model weights.
[Feb 2022]: Releasing implementation of Fairness Benchmarks for computer vision models proposed in the paper.
[Jan 2022]: Implementation for Geolocalization test (gps prediction for an image) released in VISSL.
[Jan 2022]: Add BEiT transformer implementation and ClassyVision ViT.
[Nov 2021]: Vissl Release 0.1.6 We have released a new version of VISSL. Please see our release notes for more information.
[Oct 2021]: AugLy data augmentations support introduced in this commit.
[Oct 2021]: XCiT: Cross-Covariance Image Transformers code released in this commit.
[Sept 2021]: VISSL master branch renamed to main in this PR in VISSL.
[August 2021]: Instance Retrieval benchmark implemented and available in VISSL.
[July 2021]: Fully Sharded Data Parallel integrated in VISSL and announced in blog.
[May 2021]: DINO: Emerging Properties in Self-Supervised Vision Transformers code released.
[May 2021]: VISSL relicensed under MIT License.
[May 2021]: Barlow Twins: Self-Supervised Learning via Redundancy Reduction code released.
[April 2021]: ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases code released.
[March 2021]: Added most benchmark datasets used in VTAB and CLIP benchmark tasks.
[February 2021]: Added Vision Transformers (ViT) backbone and training self-supervision with ViT.
[January 2021]: VISSL v0.1.5 released.

Introduction

VISSL is a computer VIsion library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations. Key features include:

Reproducible implementation of SOTA in Self-Supervision: All existing SOTA in Self-Supervision are implemented - SwAV, SimCLR, MoCo(v2), PIRL, NPID, NPID++, DeepClusterV2, ClusterFit, RotNet, Jigsaw. Also supports supervised trainings.
Benchmark suite: Variety of benchmarks tasks including linear image classification (places205, imagenet1k, voc07, food, CLEVR, dsprites, UCF101, stanford cars and many more), full finetuning, semi-supervised benchmark, nearest neighbor benchmark, object detection (Pascal VOC and COCO).
Ease of Usability: easy to use using yaml configuration system based on Hydra.
Modular: Easy to design new tasks and reuse the existing components from other tasks (objective functions, model trunk and heads, data transforms, etc.). The modular components are simple drop-in replacements in yaml config files.
Scalability: Easy to train model on 1-gpu, multi-gpu and multi-node. Several components for large scale trainings provided as simple config file plugs: Activation checkpointing, ZeRO, FP16, LARC, Stateful data sampler, data class to handle invalid images, large model backbones like RegNets, etc.
Model Zoo: Over 60 pre-trained self-supervised model weights.

Installation

See INSTALL.md.

Getting Started

Install VISSL by following the installation instructions. After installation, please see Getting Started with VISSL and the Colab Notebook to learn about basic usage.

Documentation

Learn more about VISSL at our documentation. And see the projects/ for some projects built on top of VISSL.

Tutorials

Get started with VISSL by trying one of the Colab tutorial notebooks.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the VISSL Model Zoo.

Contributors

VISSL is written and maintained by the Facebook AI Research.

Development

We welcome new contributions to VISSL and we will be actively maintaining this library! Please refer to CONTRIBUTING.md for full instructions on how to run the code, tests and linter, and submit your pull requests.

License

VISSL is released under MIT license.

Citing VISSL

If you find VISSL useful in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{goyal2021vissl,
  author =       {Priya Goyal and Quentin Duval and Jeremy Reizenstein and Matthew Leavitt and Min Xu and
                  Benjamin Lefaudeux and Mannat Singh and Vinicius Reis and Mathilde Caron and Piotr Bojanowski and
                  Armand Joulin and Ishan Misra},
  title =        {VISSL},
  howpublished = {\url{https://github.com/facebookresearch/vissl}},
  year =         {2021}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot