Convert Figma logo to code with AI

pytorch logovision

Datasets, Transforms and Models specific to Computer Vision

15,955
6,914
15,955
1,013

Top Related Projects

76,949

Models and examples built with TensorFlow

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

61,580

Deep Learning for humans

scikit-learn: machine learning in Python

77,862

Open Source Computer Vision Library

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Quick Overview

PyTorch Vision (torchvision) is a package of popular datasets, model architectures, and common image transformations for computer vision. It's designed to work seamlessly with PyTorch, providing a comprehensive toolkit for computer vision tasks such as image classification, object detection, and segmentation.

Pros

  • Extensive collection of pre-trained models and datasets
  • Seamless integration with PyTorch ecosystem
  • Easy-to-use data loading and transformation utilities
  • Regular updates and community support

Cons

  • Primarily focused on computer vision, limiting its use in other domains
  • Some advanced features may require additional dependencies
  • Documentation can be overwhelming for beginners
  • Performance may vary depending on hardware and specific use cases

Code Examples

  1. Loading a pre-trained model:
import torchvision.models as models

# Load a pre-trained ResNet-50 model
resnet50 = models.resnet50(pretrained=True)
  1. Applying image transformations:
from torchvision import transforms

# Define a series of image transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
  1. Loading a dataset:
from torchvision import datasets

# Load the CIFAR-10 dataset
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

Getting Started

To get started with torchvision, follow these steps:

  1. Install torchvision:
pip install torchvision
  1. Import the necessary modules:
import torch
import torchvision
import torchvision.transforms as transforms
  1. Load a pre-trained model and dataset:
# Load a pre-trained ResNet-18 model
model = torchvision.models.resnet18(pretrained=True)

# Load the CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

With these steps, you'll have a pre-trained model and a dataset ready for further experimentation and fine-tuning.

Competitor Comparisons

76,949

Models and examples built with TensorFlow

Pros of models

  • Larger collection of pre-implemented models and architectures
  • More comprehensive documentation and tutorials
  • Better integration with TensorFlow ecosystem and tools

Cons of models

  • Can be more complex to use and customize
  • Slower development cycle and updates compared to vision
  • Less flexibility in model definition and experimentation

Code Comparison

models:

import tensorflow as tf
from official.vision.image_classification import resnet_model

model = resnet_model.resnet50(num_classes=1000)
model.compile(optimizer='adam', loss='categorical_crossentropy')

vision:

import torch
import torchvision.models as models

model = models.resnet50(pretrained=True)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.CrossEntropyLoss()

Both repositories provide high-level APIs for creating and using pre-trained models. models offers a more structured approach with official implementations, while vision provides a more flexible and Pythonic interface. The choice between them often depends on the user's familiarity with the respective frameworks and specific project requirements.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

  • Broader scope, covering various NLP tasks and models
  • Extensive documentation and community support
  • Easier to use pre-trained models and fine-tune for specific tasks

Cons of Transformers

  • Steeper learning curve for beginners
  • Larger library size, potentially slower import times
  • More complex API due to wider range of functionalities

Code Comparison

Transformers:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

Vision:

import torchvision.models as models
import torchvision.transforms as transforms
model = models.resnet18(pretrained=True)
transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224)])
img = transform(img)
output = model(img.unsqueeze(0))

Both libraries offer powerful tools for their respective domains. Vision focuses on computer vision tasks with a simpler API, while Transformers provides a comprehensive solution for NLP tasks with more flexibility and pre-trained models.

61,580

Deep Learning for humans

Pros of Keras

  • Higher-level API, making it easier for beginners to get started with deep learning
  • Supports multiple backend engines (TensorFlow, Theano, CNTK), offering more flexibility
  • Extensive documentation and a large community, providing ample resources for learning and troubleshooting

Cons of Keras

  • Less flexible for advanced users who need fine-grained control over model architecture
  • Slower execution compared to lower-level libraries like PyTorch
  • Limited support for dynamic computational graphs, which can be restrictive for certain types of models

Code Comparison

Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(10, activation='softmax')
])

PyTorch Vision:

import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 64)
        self.fc2 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = nn.functional.relu(self.fc1(x))
        return nn.functional.softmax(self.fc2(x), dim=1)

scikit-learn: machine learning in Python

Pros of scikit-learn

  • Broader range of machine learning algorithms and tools
  • Easier to use for traditional ML tasks and data analysis
  • Better documentation and more extensive examples

Cons of scikit-learn

  • Less suitable for deep learning tasks
  • Not optimized for GPU acceleration
  • Limited support for neural network architectures

Code Comparison

scikit-learn:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4)
clf = RandomForestClassifier()
clf.fit(X, y)

torchvision:

import torchvision.models as models
import torch

model = models.resnet18(pretrained=True)
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)

Summary

scikit-learn is a comprehensive library for traditional machine learning tasks, offering a wide range of algorithms and tools. It's user-friendly and well-documented, making it ideal for data analysis and classical ML problems. However, it lacks deep learning capabilities and GPU optimization.

torchvision, part of the PyTorch ecosystem, specializes in computer vision tasks and deep learning. It provides pre-trained models and utilities for image processing, making it more suitable for complex vision tasks and neural network-based solutions.

77,862

Open Source Computer Vision Library

Pros of OpenCV

  • Broader scope, covering a wide range of computer vision tasks beyond just deep learning
  • More mature project with a larger community and extensive documentation
  • Better performance for traditional computer vision algorithms

Cons of OpenCV

  • Less integrated with deep learning frameworks
  • Steeper learning curve for beginners
  • Slower adoption of cutting-edge deep learning techniques

Code Comparison

OpenCV:

import cv2

img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 100, 200)

PyTorch Vision:

import torchvision.transforms as T
from PIL import Image

img = Image.open('image.jpg')
transform = T.Compose([T.Grayscale(), T.ToTensor()])
tensor = transform(img)

OpenCV focuses on direct image processing, while PyTorch Vision is designed for deep learning workflows. OpenCV provides lower-level access to image data and algorithms, whereas PyTorch Vision integrates seamlessly with PyTorch's tensor operations and neural network modules.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive and specialized for object detection and segmentation tasks
  • Includes pre-trained models and advanced features like panoptic segmentation
  • Faster training and inference due to optimized CUDA implementations

Cons of Detectron2

  • Steeper learning curve and more complex API compared to torchvision
  • Less flexibility for general computer vision tasks outside of detection/segmentation
  • Requires more computational resources for training and inference

Code Comparison

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

torchvision:

import torchvision.models as models
from torchvision.transforms import transforms

model = models.resnet50(pretrained=True)
transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224)])
output = model(transform(image))

Detectron2 focuses on configuring and running object detection models, while torchvision provides a more general-purpose approach to image classification and other vision tasks. Detectron2's code is more specialized and requires more setup, whereas torchvision offers a simpler interface for basic tasks but may require additional code for more advanced use cases.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

torchvision

total torchvision downloads documentation

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Installation

Please refer to the official instructions to install the stable versions of torch and torchvision on your system.

To build source, refer to our contributing page.

The following is the corresponding torchvision versions and supported Python versions.

torchtorchvisionPython
main / nightlymain / nightly>=3.9, <=3.12
2.40.19>=3.8, <=3.12
2.30.18>=3.8, <=3.12
2.20.17>=3.8, <=3.11
2.10.16>=3.8, <=3.11
2.00.15>=3.8, <=3.11
older versions
torchtorchvisionPython
1.130.14>=3.7.2, <=3.10
1.120.13>=3.7, <=3.10
1.110.12>=3.7, <=3.10
1.100.11>=3.6, <=3.9
1.90.10>=3.6, <=3.9
1.80.9>=3.6, <=3.9
1.70.8>=3.6, <=3.9
1.60.7>=3.6, <=3.8
1.50.6>=3.5, <=3.8
1.40.5==2.7, >=3.5, <=3.8
1.30.4.2 / 0.4.3==2.7, >=3.5, <=3.7
1.20.4.1==2.7, >=3.5, <=3.7
1.10.3==2.7, >=3.5, <=3.7
<=1.00.2==2.7, >=3.5, <=3.7

Image Backends

Torchvision currently supports the following image backends:

  • torch tensors
  • PIL images:

Read more in in our docs.

[UNSTABLE] Video Backend

Torchvision currently supports the following video backends:

  • pyav (default) - Pythonic binding for ffmpeg libraries.
  • video_reader - This needs ffmpeg to be installed and torchvision to be built from source. There shouldn't be any conflicting version of ffmpeg installed. Currently, this is only supported on Linux.
conda install -c conda-forge 'ffmpeg<4.3'
python setup.py install

Using the models on C++

Refer to example/cpp.

DISCLAIMER: the libtorchvision library includes the torchvision custom ops as well as most of the C++ torchvision APIs. Those APIs do not come with any backward-compatibility guarantees and may change from one version to the next. Only the Python APIs are stable and with backward-compatibility guarantees. So, if you need stability within a C++ environment, your best bet is to export the Python APIs via torchscript.

Documentation

You can find the API documentation on the pytorch website: https://pytorch.org/vision/stable/index.html

Contributing

See the CONTRIBUTING file for how to help out.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Pre-trained Model License

The pre-trained models provided in this library may have their own licenses or terms and conditions derived from the dataset used for training. It is your responsibility to determine whether you have permission to use the models for your use case.

More specifically, SWAG models are released under the CC-BY-NC 4.0 license. See SWAG LICENSE for additional details.

Citing TorchVision

If you find TorchVision useful in your work, please consider citing the following BibTeX entry:

@software{torchvision2016,
    title        = {TorchVision: PyTorch's Computer Vision library},
    author       = {TorchVision maintainers and contributors},
    year         = 2016,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/pytorch/vision}}
}