Top Related Projects
Models and examples built with TensorFlow
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Deep Learning for humans
scikit-learn: machine learning in Python
Open Source Computer Vision Library
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Quick Overview
PyTorch Vision (torchvision) is a package of popular datasets, model architectures, and common image transformations for computer vision. It's designed to work seamlessly with PyTorch, providing a comprehensive toolkit for computer vision tasks such as image classification, object detection, and segmentation.
Pros
- Extensive collection of pre-trained models and datasets
- Seamless integration with PyTorch ecosystem
- Easy-to-use data loading and transformation utilities
- Regular updates and community support
Cons
- Primarily focused on computer vision, limiting its use in other domains
- Some advanced features may require additional dependencies
- Documentation can be overwhelming for beginners
- Performance may vary depending on hardware and specific use cases
Code Examples
- Loading a pre-trained model:
import torchvision.models as models
# Load a pre-trained ResNet-50 model
resnet50 = models.resnet50(pretrained=True)
- Applying image transformations:
from torchvision import transforms
# Define a series of image transformations
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
- Loading a dataset:
from torchvision import datasets
# Load the CIFAR-10 dataset
trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
Getting Started
To get started with torchvision, follow these steps:
- Install torchvision:
pip install torchvision
- Import the necessary modules:
import torch
import torchvision
import torchvision.transforms as transforms
- Load a pre-trained model and dataset:
# Load a pre-trained ResNet-18 model
model = torchvision.models.resnet18(pretrained=True)
# Load the CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
With these steps, you'll have a pre-trained model and a dataset ready for further experimentation and fine-tuning.
Competitor Comparisons
Models and examples built with TensorFlow
Pros of models
- Larger collection of pre-implemented models and architectures
- More comprehensive documentation and tutorials
- Better integration with TensorFlow ecosystem and tools
Cons of models
- Can be more complex to use and customize
- Slower development cycle and updates compared to vision
- Less flexibility in model definition and experimentation
Code Comparison
models:
import tensorflow as tf
from official.vision.image_classification import resnet_model
model = resnet_model.resnet50(num_classes=1000)
model.compile(optimizer='adam', loss='categorical_crossentropy')
vision:
import torch
import torchvision.models as models
model = models.resnet50(pretrained=True)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.CrossEntropyLoss()
Both repositories provide high-level APIs for creating and using pre-trained models. models offers a more structured approach with official implementations, while vision provides a more flexible and Pythonic interface. The choice between them often depends on the user's familiarity with the respective frameworks and specific project requirements.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of Transformers
- Broader scope, covering various NLP tasks and models
- Extensive documentation and community support
- Easier to use pre-trained models and fine-tune for specific tasks
Cons of Transformers
- Steeper learning curve for beginners
- Larger library size, potentially slower import times
- More complex API due to wider range of functionalities
Code Comparison
Transformers:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
Vision:
import torchvision.models as models
import torchvision.transforms as transforms
model = models.resnet18(pretrained=True)
transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224)])
img = transform(img)
output = model(img.unsqueeze(0))
Both libraries offer powerful tools for their respective domains. Vision focuses on computer vision tasks with a simpler API, while Transformers provides a comprehensive solution for NLP tasks with more flexibility and pre-trained models.
Deep Learning for humans
Pros of Keras
- Higher-level API, making it easier for beginners to get started with deep learning
- Supports multiple backend engines (TensorFlow, Theano, CNTK), offering more flexibility
- Extensive documentation and a large community, providing ample resources for learning and troubleshooting
Cons of Keras
- Less flexible for advanced users who need fine-grained control over model architecture
- Slower execution compared to lower-level libraries like PyTorch
- Limited support for dynamic computational graphs, which can be restrictive for certain types of models
Code Comparison
Keras:
from keras.models import Sequential
from keras.layers import Dense
model = Sequential([
Dense(64, activation='relu', input_shape=(784,)),
Dense(10, activation='softmax')
])
PyTorch Vision:
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc1 = nn.Linear(784, 64)
self.fc2 = nn.Linear(64, 10)
def forward(self, x):
x = nn.functional.relu(self.fc1(x))
return nn.functional.softmax(self.fc2(x), dim=1)
scikit-learn: machine learning in Python
Pros of scikit-learn
- Broader range of machine learning algorithms and tools
- Easier to use for traditional ML tasks and data analysis
- Better documentation and more extensive examples
Cons of scikit-learn
- Less suitable for deep learning tasks
- Not optimized for GPU acceleration
- Limited support for neural network architectures
Code Comparison
scikit-learn:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=4)
clf = RandomForestClassifier()
clf.fit(X, y)
torchvision:
import torchvision.models as models
import torch
model = models.resnet18(pretrained=True)
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)
Summary
scikit-learn is a comprehensive library for traditional machine learning tasks, offering a wide range of algorithms and tools. It's user-friendly and well-documented, making it ideal for data analysis and classical ML problems. However, it lacks deep learning capabilities and GPU optimization.
torchvision, part of the PyTorch ecosystem, specializes in computer vision tasks and deep learning. It provides pre-trained models and utilities for image processing, making it more suitable for complex vision tasks and neural network-based solutions.
Open Source Computer Vision Library
Pros of OpenCV
- Broader scope, covering a wide range of computer vision tasks beyond just deep learning
- More mature project with a larger community and extensive documentation
- Better performance for traditional computer vision algorithms
Cons of OpenCV
- Less integrated with deep learning frameworks
- Steeper learning curve for beginners
- Slower adoption of cutting-edge deep learning techniques
Code Comparison
OpenCV:
import cv2
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 100, 200)
PyTorch Vision:
import torchvision.transforms as T
from PIL import Image
img = Image.open('image.jpg')
transform = T.Compose([T.Grayscale(), T.ToTensor()])
tensor = transform(img)
OpenCV focuses on direct image processing, while PyTorch Vision is designed for deep learning workflows. OpenCV provides lower-level access to image data and algorithms, whereas PyTorch Vision integrates seamlessly with PyTorch's tensor operations and neural network modules.
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Pros of Detectron2
- More comprehensive and specialized for object detection and segmentation tasks
- Includes pre-trained models and advanced features like panoptic segmentation
- Faster training and inference due to optimized CUDA implementations
Cons of Detectron2
- Steeper learning curve and more complex API compared to torchvision
- Less flexibility for general computer vision tasks outside of detection/segmentation
- Requires more computational resources for training and inference
Code Comparison
Detectron2:
from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor
cfg = get_cfg()
cfg.merge_from_file("config.yaml")
predictor = DefaultPredictor(cfg)
outputs = predictor(image)
torchvision:
import torchvision.models as models
from torchvision.transforms import transforms
model = models.resnet50(pretrained=True)
transform = transforms.Compose([transforms.Resize(256), transforms.CenterCrop(224)])
output = model(transform(image))
Detectron2 focuses on configuring and running object detection models, while torchvision provides a more general-purpose approach to image classification and other vision tasks. Detectron2's code is more specialized and requires more setup, whereas torchvision offers a simpler interface for basic tasks but may require additional code for more advanced use cases.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
torchvision
The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision.
Installation
Please refer to the official
instructions to install the stable
versions of torch
and torchvision
on your system.
To build source, refer to our contributing page.
The following is the corresponding torchvision
versions and supported Python
versions.
torch | torchvision | Python |
---|---|---|
main / nightly | main / nightly | >=3.9 , <=3.12 |
2.4 | 0.19 | >=3.8 , <=3.12 |
2.3 | 0.18 | >=3.8 , <=3.12 |
2.2 | 0.17 | >=3.8 , <=3.11 |
2.1 | 0.16 | >=3.8 , <=3.11 |
2.0 | 0.15 | >=3.8 , <=3.11 |
older versions
torch | torchvision | Python |
---|---|---|
1.13 | 0.14 | >=3.7.2 , <=3.10 |
1.12 | 0.13 | >=3.7 , <=3.10 |
1.11 | 0.12 | >=3.7 , <=3.10 |
1.10 | 0.11 | >=3.6 , <=3.9 |
1.9 | 0.10 | >=3.6 , <=3.9 |
1.8 | 0.9 | >=3.6 , <=3.9 |
1.7 | 0.8 | >=3.6 , <=3.9 |
1.6 | 0.7 | >=3.6 , <=3.8 |
1.5 | 0.6 | >=3.5 , <=3.8 |
1.4 | 0.5 | ==2.7 , >=3.5 , <=3.8 |
1.3 | 0.4.2 / 0.4.3 | ==2.7 , >=3.5 , <=3.7 |
1.2 | 0.4.1 | ==2.7 , >=3.5 , <=3.7 |
1.1 | 0.3 | ==2.7 , >=3.5 , <=3.7 |
<=1.0 | 0.2 | ==2.7 , >=3.5 , <=3.7 |
Image Backends
Torchvision currently supports the following image backends:
- torch tensors
- PIL images:
- Pillow
- Pillow-SIMD - a much faster drop-in replacement for Pillow with SIMD.
Read more in in our docs.
[UNSTABLE] Video Backend
Torchvision currently supports the following video backends:
- pyav (default) - Pythonic binding for ffmpeg libraries.
- video_reader - This needs ffmpeg to be installed and torchvision to be built from source. There shouldn't be any conflicting version of ffmpeg installed. Currently, this is only supported on Linux.
conda install -c conda-forge 'ffmpeg<4.3'
python setup.py install
Using the models on C++
Refer to example/cpp.
DISCLAIMER: the libtorchvision
library includes the torchvision
custom ops as well as most of the C++ torchvision APIs. Those APIs do not come
with any backward-compatibility guarantees and may change from one version to
the next. Only the Python APIs are stable and with backward-compatibility
guarantees. So, if you need stability within a C++ environment, your best bet is
to export the Python APIs via torchscript.
Documentation
You can find the API documentation on the pytorch website: https://pytorch.org/vision/stable/index.html
Contributing
See the CONTRIBUTING file for how to help out.
Disclaimer on Datasets
This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.
If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!
Pre-trained Model License
The pre-trained models provided in this library may have their own licenses or terms and conditions derived from the dataset used for training. It is your responsibility to determine whether you have permission to use the models for your use case.
More specifically, SWAG models are released under the CC-BY-NC 4.0 license. See SWAG LICENSE for additional details.
Citing TorchVision
If you find TorchVision useful in your work, please consider citing the following BibTeX entry:
@software{torchvision2016,
title = {TorchVision: PyTorch's Computer Vision library},
author = {TorchVision maintainers and contributors},
year = 2016,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/pytorch/vision}}
}
Top Related Projects
Models and examples built with TensorFlow
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Deep Learning for humans
scikit-learn: machine learning in Python
Open Source Computer Vision Library
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot