vision

Datasets, Transforms and Models specific to Computer Vision

16,477

6,989

16,477

1,095

View on GitHub

Top Related Projects

models

77,312

Models and examples built with TensorFlow

transformers

136,322

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

scikit-learn

60,480

scikit-learn: machine learning in Python

opencv

80,157

Open Source Computer Vision Library

detectron2

30,784

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Quick Overview

PyTorch Vision (torchvision) is a package of popular datasets, model architectures, and common image transformations for computer vision. It's designed to work seamlessly with PyTorch, providing a comprehensive toolkit for computer vision tasks such as image classification, object detection, and segmentation.

Pros

Extensive collection of pre-trained models and datasets
Seamless integration with PyTorch ecosystem
Easy-to-use data loading and transformation utilities
Regular updates and community support

Cons

Primarily focused on computer vision, limiting its use in other domains
Some advanced features may require additional dependencies
Documentation can be overwhelming for beginners
Performance may vary depending on hardware and specific use cases

Code Examples

Loading a pre-trained model:

import torchvision.models as models

# Load a pre-trained ResNet-50 model
resnet50 = models.resnet50(pretrained=True)

Applying image transformations:

from torchvision import transforms

# Define a series of image transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

Loading a dataset:

from torchvision import datasets

# Load the CIFAR-10 dataset
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)

Getting Started

To get started with torchvision, follow these steps:

Install torchvision:

pip install torchvision

Import the necessary modules:

import torch
import torchvision
import torchvision.transforms as transforms

Load a pre-trained model and dataset:

# Load a pre-trained ResNet-18 model
model = torchvision.models.resnet18(pretrained=True)

# Load the CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

With these steps, you'll have a pre-trained model and a dataset ready for further experimentation and fine-tuning.

Competitor Comparisons

models

77,312

Models and examples built with TensorFlow

Pros of models

Larger collection of pre-implemented models and architectures
More comprehensive documentation and tutorials
Better integration with TensorFlow ecosystem and tools

Cons of models

Can be more complex to use and customize
Slower development cycle and updates compared to vision
Less flexibility in model definition and experimentation

Code Comparison

models:

import tensorflow as tf
from official.vision.image_classification import resnet_model

model = resnet_model.resnet50(num_classes=1000)
model.compile(optimizer='adam', loss='categorical_crossentropy')

vision:

import torch
import torchvision.models as models

model = models.resnet50(pretrained=True)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.CrossEntropyLoss()

Both repositories provide high-level APIs for creating and using pre-trained models. models offers a more structured approach with official implementations, while vision provides a more flexible and Pythonic interface. The choice between them often depends on the user's familiarity with the respective frameworks and specific project requirements.

transformers

136,322

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

Broader scope, covering various NLP tasks and models
Extensive documentation and community support
Easier to use pre-trained models and fine-tune for specific tasks

Cons of Transformers

Steeper learning curve for beginners
Larger library size, potentially slower import times
More complex API due to wider range of functionalities

Code Comparison

Transformers:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

Vision:

import torchvision.models as models
import torchvision.transforms as transforms
model = models.resnet18(pretrained=True)
transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224)])
img = transform(img)
output = model(img.unsqueeze(0))

Both libraries offer powerful tools for their respective domains. Vision focuses on computer vision tasks with a simpler API, while Transformers provides a comprehensive solution for NLP tasks with more flexibility and pre-trained models.

keras

62,199

Deep Learning for humans

Pros of Keras

Higher-level API, making it easier for beginners to get started with deep learning
Supports multiple backend engines (TensorFlow, Theano, CNTK), offering more flexibility
Extensive documentation and a large community, providing ample resources for learning and troubleshooting

Cons of Keras

Less flexible for advanced users who need fine-grained control over model architecture
Slower execution compared to lower-level libraries like PyTorch
Limited support for dynamic computational graphs, which can be restrictive for certain types of models

Code Comparison

Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(784,)),
    Dense(10, activation='softmax')
])

PyTorch Vision:

import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 64)
        self.fc2 = nn.Linear(64, 10)
    
    def forward(self, x):
        x = nn.functional.relu(self.fc1(x))
        return nn.functional.softmax(self.fc2(x), dim=1)

scikit-learn

60,480

scikit-learn: machine learning in Python

Pros of scikit-learn

Broader range of machine learning algorithms and tools
Easier to use for traditional ML tasks and data analysis
Better documentation and more extensive examples

Cons of scikit-learn

Less suitable for deep learning tasks
Not optimized for GPU acceleration
Limited support for neural network architectures

Code Comparison

scikit-learn:

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=4)
clf = RandomForestClassifier()
clf.fit(X, y)

torchvision:

import torchvision.models as models
import torch

model = models.resnet18(pretrained=True)
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)

Summary

scikit-learn is a comprehensive library for traditional machine learning tasks, offering a wide range of algorithms and tools. It's user-friendly and well-documented, making it ideal for data analysis and classical ML problems. However, it lacks deep learning capabilities and GPU optimization.

torchvision, part of the PyTorch ecosystem, specializes in computer vision tasks and deep learning. It provides pre-trained models and utilities for image processing, making it more suitable for complex vision tasks and neural network-based solutions.

opencv

80,157

Open Source Computer Vision Library

Pros of OpenCV

Broader scope, covering a wide range of computer vision tasks beyond just deep learning
More mature project with a larger community and extensive documentation
Better performance for traditional computer vision algorithms

Cons of OpenCV

Less integrated with deep learning frameworks
Steeper learning curve for beginners
Slower adoption of cutting-edge deep learning techniques

Code Comparison

OpenCV:

import cv2

img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 100, 200)

PyTorch Vision:

import torchvision.transforms as T
from PIL import Image

img = Image.open('image.jpg')
transform = T.Compose([T.Grayscale(), T.ToTensor()])
tensor = transform(img)

OpenCV focuses on direct image processing, while PyTorch Vision is designed for deep learning workflows. OpenCV provides lower-level access to image data and algorithms, whereas PyTorch Vision integrates seamlessly with PyTorch's tensor operations and neural network modules.

detectron2

30,784

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

More comprehensive and specialized for object detection and segmentation tasks
Includes pre-trained models and advanced features like panoptic segmentation
Faster training and inference due to optimized CUDA implementations

Cons of Detectron2

Steeper learning curve and more complex API compared to torchvision
Less flexibility for general computer vision tasks outside of detection/segmentation
Requires more computational resources for training and inference

Code Comparison

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

torchvision:

import torchvision.models as models
from torchvision.transforms import transforms

model = models.resnet50(pretrained=True)
transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224)])
output = model(transform(image))

Detectron2 focuses on configuring and running object detection models, while torchvision provides a more general-purpose approach to image classification and other vision tasks. Detectron2's code is more specialized and requires more setup, whereas torchvision offers a simpler interface for basic tasks but may require additional code for more advanced use cases.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

torchvision

The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.

Installation

Please refer to the official instructions to install the stable versions of torch and torchvision on your system.

To build source, refer to our contributing page.

The following is the corresponding torchvision versions and supported Python versions.

`torch`	`torchvision`	Python
`main` / `nightly`	`main` / `nightly`	`>=3.9`, `<=3.12`
`2.5`	`0.20`	`>=3.9`, `<=3.12`
`2.4`	`0.19`	`>=3.8`, `<=3.12`
`2.3`	`0.18`	`>=3.8`, `<=3.12`
`2.2`	`0.17`	`>=3.8`, `<=3.11`
`2.1`	`0.16`	`>=3.8`, `<=3.11`
`2.0`	`0.15`	`>=3.8`, `<=3.11`

older versions

`torch`	`torchvision`	Python
`1.13`	`0.14`	`>=3.7.2`, `<=3.10`
`1.12`	`0.13`	`>=3.7`, `<=3.10`
`1.11`	`0.12`	`>=3.7`, `<=3.10`
`1.10`	`0.11`	`>=3.6`, `<=3.9`
`1.9`	`0.10`	`>=3.6`, `<=3.9`
`1.8`	`0.9`	`>=3.6`, `<=3.9`
`1.7`	`0.8`	`>=3.6`, `<=3.9`
`1.6`	`0.7`	`>=3.6`, `<=3.8`
`1.5`	`0.6`	`>=3.5`, `<=3.8`
`1.4`	`0.5`	`==2.7`, `>=3.5`, `<=3.8`
`1.3`	`0.4.2` / `0.4.3`	`==2.7`, `>=3.5`, `<=3.7`
`1.2`	`0.4.1`	`==2.7`, `>=3.5`, `<=3.7`
`1.1`	`0.3`	`==2.7`, `>=3.5`, `<=3.7`
`<=1.0`	`0.2`	`==2.7`, `>=3.5`, `<=3.7`

Image Backends

Torchvision currently supports the following image backends:

torch tensors
PIL images:
- Pillow
- Pillow-SIMD - a much faster drop-in replacement for Pillow with SIMD.

[UNSTABLE] Video Backend

Torchvision currently supports the following video backends:

pyav (default) - Pythonic binding for ffmpeg libraries.
video_reader - This needs ffmpeg to be installed and torchvision to be built from source. There shouldn't be any conflicting version of ffmpeg installed. Currently, this is only supported on Linux.

conda install -c conda-forge 'ffmpeg<4.3'
python setup.py install

Using the models on C++

Refer to example/cpp.

DISCLAIMER: the libtorchvision library includes the torchvision custom ops as well as most of the C++ torchvision APIs. Those APIs do not come with any backward-compatibility guarantees and may change from one version to the next. Only the Python APIs are stable and with backward-compatibility guarantees. So, if you need stability within a C++ environment, your best bet is to export the Python APIs via torchscript.

Documentation

You can find the API documentation on the pytorch website: https://pytorch.org/vision/stable/index.html

Contributing

See the CONTRIBUTING file for how to help out.

Disclaimer on Datasets

This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

Pre-trained Model License

The pre-trained models provided in this library may have their own licenses or terms and conditions derived from the dataset used for training. It is your responsibility to determine whether you have permission to use the models for your use case.

More specifically, SWAG models are released under the CC-BY-NC 4.0 license. See SWAG LICENSE for additional details.

Citing TorchVision

If you find TorchVision useful in your work, please consider citing the following BibTeX entry:

@software{torchvision2016,
    title        = {TorchVision: PyTorch's Computer Vision library},
    author       = {TorchVision maintainers and contributors},
    year         = 2016,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/pytorch/vision}}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot