Convert Figma logo to code with AI

facebookresearch logovissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

3,248
331
3,248
82

Top Related Projects

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

24,594

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

30,218

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

34,658

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

10,775

PyTorch package for the discrete VAE used for DALL·E.

Quick Overview

VISSL (Vision Transformer Self-Supervised Learning) is a PyTorch-based library for self-supervised learning of visual representations. It provides a flexible and modular framework for training and evaluating self-supervised vision models, with a focus on Vision Transformers (ViTs) and other state-of-the-art architectures.

Pros

  • Flexible and Modular Design: VISSL offers a highly configurable and extensible framework, allowing researchers and developers to easily experiment with different self-supervised learning approaches, model architectures, and training strategies.
  • State-of-the-Art Performance: The library includes implementations of various state-of-the-art self-supervised learning methods, such as DINO, MoCo, and SwAV, which have demonstrated impressive results on a wide range of computer vision tasks.
  • Extensive Documentation and Examples: VISSL comes with detailed documentation, tutorials, and example scripts, making it easier for users to get started and understand the library's capabilities.
  • Active Development and Community: The project is actively maintained by the Facebook AI Research team and has a growing community of contributors, ensuring ongoing improvements and support.

Cons

  • Steep Learning Curve: While the library is well-documented, the complexity of self-supervised learning and the flexibility of VISSL's design can make it challenging for newcomers to get started.
  • Resource-Intensive Training: Training self-supervised models, especially on large-scale datasets, can be computationally expensive and require significant hardware resources, such as high-end GPUs.
  • Limited Support for Non-Vision Tasks: VISSL is primarily focused on self-supervised learning for computer vision tasks, and its applicability to other domains, such as natural language processing or speech recognition, may be limited.
  • Potential Bias in Pretrained Models: As with any machine learning model, the pretrained weights provided by VISSL may inherit biases present in the training data, which can impact downstream applications.

Code Examples

# Example: Training a DINO model on ImageNet
from vissl.config.attr_dict import AttrDict
from vissl.utils.hydra_config import compose_hydra_configuration

cfg = compose_hydra_configuration(
    [
        "config=dino/dino_vit_small_patch16",
        "dataset=imagenet",
        "num_nodes=1",
        "num_gpus=8",
    ]
)

from vissl.trainer import train_model
train_model(cfg)

This code example demonstrates how to train a DINO (Distillation with no Labels) model on the ImageNet dataset using VISSL.

# Example: Evaluating a pretrained model on a downstream task
from vissl.models.model_registry import model_zoo
from vissl.utils.checkpoint import init_model_from_checkpoint

# Load a pretrained DINO model
model = model_zoo.get_model_and_optimizers("dino_vit_small_patch16")
model = init_model_from_checkpoint(model, "path/to/checkpoint.torch")

# Evaluate the model on a downstream task
from vissl.evaluation.evaluation_interface import EvaluationInterface
evaluator = EvaluationInterface(model, dataset_name="imagenet")
top1_acc, top5_acc = evaluator.run_evaluation()

This example shows how to load a pretrained DINO model and use the EvaluationInterface to evaluate its performance on the ImageNet dataset.

# Example: Extracting features using a pretrained model
from vissl.models.model_registry import model_zoo
from vissl.utils.checkpoint import init_model_from_checkpoint
from vissl.data.data_helper import get_data_loader

# Load a pretrained DINO model
model = model_zoo.get_model_and_optimizers("dino_vit_small_patch16")
model = init_model_from_checkpoint(model, "path/to/checkpoint.torch")

# Extract features from a dataset
data_loader = get_data_loader("imagenet", model.device)
features = model.forward_features(data_loader)

This example demonstrates how to use a pretrained DINO model to extract features from a dataset.

Competitor Comparisons

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • PyTorch is a widely-used and well-established deep learning framework, with a large and active community.
  • PyTorch provides a flexible and intuitive API, making it easy for developers to build and experiment with complex models.
  • PyTorch has extensive documentation and a wealth of pre-built models and utilities, which can save developers a significant amount of time.

Cons of PyTorch

  • PyTorch can be more resource-intensive than some other deep learning frameworks, particularly for large-scale deployments.
  • PyTorch's dynamic computational graph can make it more difficult to optimize for production environments, where a static graph may be preferred.

Code Comparison

PyTorch:

import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc1 = nn.Linear(100, 50)
        self.fc2 = nn.Linear(50, 10)

VISSL:

from vissl.models.model_helpers import build_model
from vissl.config.attr_dict import AttrDict

config = AttrDict({
    "MODEL": {
        "name": "resnet50",
        "params": {
            "num_classes": 1000,
        },
    },
})

model = build_model(config)
24,594

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Pros of CLIP

  • CLIP is a state-of-the-art multimodal model that can perform a wide range of zero-shot tasks, making it highly versatile.
  • The model is pre-trained on a large and diverse dataset, which gives it strong performance on a variety of tasks.
  • CLIP is open-sourced and available for use by the research community, which can lead to further advancements and applications.

Cons of CLIP

  • CLIP is a large and complex model, which can make it computationally expensive to use, especially on resource-constrained devices.
  • The model's performance can be sensitive to the specific task and dataset, and may not always outperform specialized models.
  • The training process for CLIP is not as well-documented as some other models, which can make it more challenging to understand and extend.

Code Comparison

VISSL:

from vissl.config.attr_dict import AttrDict
from vissl.utils.hydra_config import compose_hydra_configuration

cfg = compose_hydra_configuration(["config=pretrain/ssl/simclr"])

CLIP:

import clip
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
30,218

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Pros of JAX

  • More flexible and general-purpose ML framework
  • Better performance on TPUs and multi-GPU setups
  • Simpler API with functional programming paradigm

Cons of JAX

  • Less focused on computer vision tasks specifically
  • Fewer pre-built models and datasets for vision
  • Steeper learning curve for those new to functional programming

Code Comparison

VISSL (PyTorch-based):

from vissl.models import build_model
from vissl.data import build_dataset

model = build_model(cfg)
dataset = build_dataset(cfg)

JAX:

import jax
import jax.numpy as jnp

def model(params, inputs):
    # Define model architecture
    return outputs

dataset = load_dataset()  # Custom implementation

VISSL is more specialized for computer vision tasks, providing ready-to-use components for self-supervised learning. JAX offers a more flexible approach, allowing for custom implementations of models and datasets using its functional programming paradigm.

VISSL includes more vision-specific features out-of-the-box, while JAX requires more manual implementation but offers greater flexibility across various ML domains.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

  • Transformers provides a wide range of pre-trained models for various NLP tasks, making it easy to fine-tune and use in your own projects.
  • The library has extensive documentation and a large community, providing ample support and resources for users.
  • Transformers integrates well with popular deep learning frameworks like PyTorch and TensorFlow, allowing for seamless integration into your existing workflows.

Cons of Transformers

  • Transformers is primarily focused on NLP tasks, while VISSL is more versatile, covering a broader range of computer vision applications.
  • The Transformers library can be more complex to set up and configure, especially for users new to the field of NLP.

Code Comparison

Transformers (Hugging Face):

from transformers import pipeline

# Load a pre-trained model for sentiment analysis
sentiment_analyzer = pipeline('sentiment-analysis')

# Classify the sentiment of a given text
result = sentiment_analyzer('This movie was amazing!')
print(result)

VISSL (Facebook Research):

import torch
from vissl.config.attr_dict import AttrDict
from vissl.models import build_model

# Load a pre-trained VISSL model
cfg = AttrDict({"MODEL": {"TRUNK": {"NAME": "resnet50"}}})
model = build_model(cfg)

# Forward pass through the model
input_image = torch.randn(1, 3, 224, 224)
output = model(input_image)
print(output.shape)
34,658

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • DeepSpeed provides efficient memory management and gradient accumulation, allowing for training of larger models with limited GPU memory.
  • DeepSpeed supports mixed precision training, which can significantly improve training speed and reduce memory usage.
  • DeepSpeed offers advanced features like zero-redundancy parallelism, which can further optimize the training process.

Cons of DeepSpeed

  • DeepSpeed may have a steeper learning curve compared to VISSL, as it requires more configuration and setup.
  • The documentation for DeepSpeed, while comprehensive, may not be as user-friendly as the VISSL documentation.
  • DeepSpeed is primarily focused on training large language models, while VISSL is more geared towards computer vision tasks.

Code Comparison

VISSL (PyTorch-based):

from vissl.config.attr_dict import AttrDict
from vissl.utils.hydra_config import compose_hydra_configuration

cfg = compose_hydra_configuration(["config=pretrain/swav/swav_800ep_pretrain"])
model = build_model(cfg.MODEL)

DeepSpeed (PyTorch-based):

import deepspeed
from deepspeed.runtime.zero.stage3 import ZeroStage3Optimizer

model = ...
optimizer = ZeroStage3Optimizer(model.parameters(), ...)
10,775

PyTorch package for the discrete VAE used for DALL·E.

Pros of DALL-E

  • DALL-E is a state-of-the-art text-to-image generation model, capable of producing highly realistic and creative images from textual descriptions.
  • The model has been trained on a vast dataset, allowing it to generate a wide variety of images across different domains.
  • DALL-E has demonstrated impressive capabilities in tasks such as image editing, inpainting, and style transfer.

Cons of DALL-E

  • DALL-E is a proprietary model developed by OpenAI, and the code and training data are not publicly available, limiting the ability to reproduce or extend the research.
  • The model's training process and architectural details are not fully transparent, making it difficult to understand the inner workings and potential biases.
  • DALL-E's deployment and usage are subject to OpenAI's policies and restrictions, which may limit its accessibility and flexibility for certain applications.

Code Comparison

VISSL (PyTorch):

from vissl.config.attr_dict import AttrDict
from vissl.models.model_helpers import get_model
from vissl.utils.checkpoint import init_model_from_checkpoint

# Load the model
cfg = AttrDict({"MODEL": {"NAME": "resnet50"}})
model = get_model(cfg)

# Initialize the model from a checkpoint
checkpoint = init_model_from_checkpoint(cfg, "path/to/checkpoint.pth")
model.load_state_dict(checkpoint["model_state_dict"])

DALL-E (not publicly available):

# DALL-E code is not publicly available
# The model is a proprietary creation of OpenAI
# and the details of its implementation are not disclosed

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CircleCIPRs Welcome

What's New

Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here.

Introduction

VISSL is a computer VIsion library for state-of-the-art Self-Supervised Learning research with PyTorch. VISSL aims to accelerate research cycle in self-supervised learning: from designing a new self-supervised task to evaluating the learned representations. Key features include:

Installation

See INSTALL.md.

Getting Started

Install VISSL by following the installation instructions. After installation, please see Getting Started with VISSL and the Colab Notebook to learn about basic usage.

Documentation

Learn more about VISSL at our documentation. And see the projects/ for some projects built on top of VISSL.

Tutorials

Get started with VISSL by trying one of the Colab tutorial notebooks.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the VISSL Model Zoo.

Contributors

VISSL is written and maintained by the Facebook AI Research.

Development

We welcome new contributions to VISSL and we will be actively maintaining this library! Please refer to CONTRIBUTING.md for full instructions on how to run the code, tests and linter, and submit your pull requests.

License

VISSL is released under MIT license.

Citing VISSL

If you find VISSL useful in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{goyal2021vissl,
  author =       {Priya Goyal and Quentin Duval and Jeremy Reizenstein and Matthew Leavitt and Min Xu and
                  Benjamin Lefaudeux and Mannat Singh and Vinicius Reis and Mathilde Caron and Piotr Bojanowski and
                  Armand Joulin and Ishan Misra},
  title =        {VISSL},
  howpublished = {\url{https://github.com/facebookresearch/vissl}},
  year =         {2021}
}