DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Top Related Projects
Models and examples built with TensorFlow
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Quick Overview
NVIDIA/DeepLearningExamples is a GitHub repository that provides state-of-the-art deep learning examples optimized for NVIDIA GPUs. It includes scripts, models, and documentation for various deep learning tasks across different frameworks like PyTorch, TensorFlow, and MXNet. The repository aims to showcase best practices and high-performance implementations for AI researchers and developers.
Pros
- Optimized for NVIDIA GPUs, ensuring high performance and efficiency
- Covers a wide range of deep learning tasks and popular frameworks
- Includes detailed documentation and performance benchmarks
- Regularly updated with new models and techniques
Cons
- Primarily focused on NVIDIA hardware, which may limit usefulness for users with other GPU brands
- Some examples may require significant computational resources
- Learning curve can be steep for beginners in deep learning
- Not all examples are maintained at the same frequency
Code Examples
- Loading a pre-trained BERT model using PyTorch:
from transformers import BertModel, BertTokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)
input_text = "Example sentence for BERT."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)
- Training a ResNet50 model on ImageNet using TensorFlow:
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.optimizers import SGD
model = ResNet50(weights=None, classes=1000)
optimizer = SGD(learning_rate=0.1, momentum=0.9)
model.compile(optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_dataset, epochs=90, validation_data=val_dataset)
- Implementing NVIDIA Apex for mixed precision training in PyTorch:
import torch
from apex import amp
model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
for epoch in range(num_epochs):
for data, target in train_loader:
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
optimizer.step()
Getting Started
To get started with NVIDIA/DeepLearningExamples:
-
Clone the repository:
git clone https://github.com/NVIDIA/DeepLearningExamples.git cd DeepLearningExamples
-
Choose a specific example (e.g., BERT for PyTorch):
cd PyTorch/LanguageModeling/BERT
-
Follow the README instructions for setting up the environment and running the example:
# Create and activate a new conda environment conda env create -f requirements.yml conda activate nvidia_bert_pytorch # Run the training script python run_pretraining.py --input_dir /path/to/your/data
Note: Specific instructions may vary depending on the chosen example and framework.
Competitor Comparisons
Models and examples built with TensorFlow
Pros of TensorFlow Models
- Broader range of models and applications, covering various domains
- More extensive documentation and community support
- Regular updates and contributions from the TensorFlow team
Cons of TensorFlow Models
- Less focus on GPU optimization compared to DeepLearningExamples
- May require more setup and configuration for high-performance scenarios
- Some models might not be as production-ready as those in DeepLearningExamples
Code Comparison
DeepLearningExamples:
import torch
from apex import amp
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
TensorFlow Models:
import tensorflow as tf
with tf.GradientTape() as tape:
predictions = model(inputs, training=True)
loss = loss_function(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
The DeepLearningExamples code showcases NVIDIA's Apex library for mixed precision training, while TensorFlow Models uses standard TensorFlow operations. DeepLearningExamples focuses on GPU optimization, whereas TensorFlow Models provides a more general approach suitable for various hardware configurations.
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Pros of PyTorch Examples
- Simpler, more beginner-friendly implementations
- Wider range of basic deep learning models and tasks
- More frequently updated with community contributions
Cons of PyTorch Examples
- Less focus on performance optimization
- Fewer industry-scale, production-ready examples
- Limited support for distributed training and multi-GPU setups
Code Comparison
DeepLearningExamples (BERT fine-tuning):
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", num_labels=num_labels)
model = model.to(device)
if args.n_gpu > 1:
model = torch.nn.DataParallel(model)
PyTorch Examples (MNIST classification):
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr)
for epoch in range(1, args.epochs + 1):
train(args, model, device, train_loader, optimizer, epoch)
test(model, device, test_loader)
The DeepLearningExamples code shows more advanced features like multi-GPU support, while PyTorch Examples focuses on simplicity and readability for basic tasks.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Extensive library of pre-trained models for various NLP tasks
- Active community and frequent updates
- Easy-to-use API for fine-tuning and inference
Cons of transformers
- Less focus on performance optimization for specific hardware
- May require additional setup for distributed training
- Limited examples for non-NLP tasks
Code Comparison
transformers:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)
DeepLearningExamples:
import torch
from model.bert import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)
The code comparison shows that transformers provides a more streamlined API for loading and using pre-trained models, while DeepLearningExamples may require more manual setup and configuration. However, DeepLearningExamples often includes optimizations for NVIDIA hardware and distributed training scenarios.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Offers more advanced optimization techniques like ZeRO (Zero Redundancy Optimizer)
- Provides better support for distributed training across multiple GPUs and nodes
- Includes a more comprehensive set of tools for large-scale model training
Cons of DeepSpeed
- Steeper learning curve due to more complex features and configurations
- May require more setup and fine-tuning for optimal performance
- Less focus on providing ready-to-use examples for specific models or tasks
Code Comparison
DeepSpeed:
model_engine, optimizer, _, _ = deepspeed.initialize(
args=args,
model=model,
model_parameters=params
)
DeepLearningExamples:
model = torch.nn.parallel.DistributedDataParallel(model)
optimizer = optim.SGD(model.parameters(), lr=args.lr)
Summary
DeepSpeed offers more advanced features for large-scale model training and optimization, while DeepLearningExamples provides a simpler approach with ready-to-use examples. DeepSpeed may require more setup but offers better scalability, while DeepLearningExamples is easier to get started with for specific tasks. The choice between them depends on the project's scale and requirements.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More focused on sequence-to-sequence learning and natural language processing tasks
- Offers a wider range of pre-trained models and benchmarks for NLP
- Provides a flexible and modular architecture for easier customization
Cons of fairseq
- Less emphasis on other deep learning domains (e.g., computer vision, speech recognition)
- May have a steeper learning curve for beginners due to its more specialized nature
- Potentially less optimized for NVIDIA hardware compared to DeepLearningExamples
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', 'checkpoint.pt')
model.translate('Hello world!')
DeepLearningExamples:
from nvidia.transformer import TransformerModel
model = TransformerModel.from_pretrained('nvidia_transformer_large')
model.translate('Hello world!')
Both repositories provide high-quality implementations of deep learning models, but they cater to different needs. fairseq is more specialized for NLP tasks, while DeepLearningExamples covers a broader range of deep learning applications with a focus on NVIDIA hardware optimization.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
NVIDIA Deep Learning Examples for Tensor Cores
Introduction
This repository provides State-of-the-Art Deep Learning examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X software stack running on NVIDIA Volta, Turing and Ampere GPUs.
NVIDIA GPU Cloud (NGC) Container Registry
These examples, along with our NVIDIA deep learning software stack, are provided in a monthly updated Docker container on the NGC container registry (https://ngc.nvidia.com). These containers include:
- The latest NVIDIA examples from this repository
- The latest NVIDIA contributions shared upstream to the respective framework
- The latest NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, cuBLAS, etc. which have all been through a rigorous monthly quality assurance process to ensure that they provide the best possible performance
- Monthly release notes for each of the NVIDIA optimized containers
Computer Vision
Models | Framework | AMP | Multi-GPU | Multi-Node | TensorRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|
EfficientNet-B0 | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
EfficientNet-B4 | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
EfficientNet-WideSE-B0 | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
EfficientNet-WideSE-B4 | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
EfficientNet v1-B0 | TensorFlow2 | Yes | Yes | Yes | Example | - | Supported | Yes | - |
EfficientNet v1-B4 | TensorFlow2 | Yes | Yes | Yes | Example | - | Supported | Yes | - |
EfficientNet v2-S | TensorFlow2 | Yes | Yes | Yes | Example | - | Supported | Yes | - |
GPUNet | PyTorch | Yes | Yes | - | Example | Yes | Example | Yes | - |
Mask R-CNN | PyTorch | Yes | Yes | - | Example | - | Supported | - | Yes |
Mask R-CNN | TensorFlow2 | Yes | Yes | - | Example | - | Supported | Yes | - |
nnUNet | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
ResNet-50 | MXNet | Yes | Yes | - | Supported | - | Supported | - | - |
ResNet-50 | PaddlePaddle | Yes | Yes | - | Example | - | Supported | - | - |
ResNet-50 | PyTorch | Yes | Yes | - | Example | - | Example | Yes | - |
ResNet-50 | TensorFlow | Yes | Yes | - | Supported | - | Supported | Yes | - |
ResNeXt-101 | PyTorch | Yes | Yes | - | Example | - | Example | Yes | - |
ResNeXt-101 | TensorFlow | Yes | Yes | - | Supported | - | Supported | Yes | - |
SE-ResNeXt-101 | PyTorch | Yes | Yes | - | Example | - | Example | Yes | - |
SE-ResNeXt-101 | TensorFlow | Yes | Yes | - | Supported | - | Supported | Yes | - |
SSD | PyTorch | Yes | Yes | - | Supported | - | Supported | - | Yes |
SSD | TensorFlow | Yes | Yes | - | Supported | - | Supported | Yes | Yes |
U-Net Med | TensorFlow2 | Yes | Yes | - | Example | - | Supported | Yes | - |
Natural Language Processing
Models | Framework | AMP | Multi-GPU | Multi-Node | TensorRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|
BERT | PyTorch | Yes | Yes | Yes | Example | - | Example | Yes | - |
GNMT | PyTorch | Yes | Yes | - | Supported | - | Supported | - | - |
ELECTRA | TensorFlow2 | Yes | Yes | Yes | Supported | - | Supported | Yes | - |
BERT | TensorFlow | Yes | Yes | Yes | Example | - | Example | Yes | Yes |
BERT | TensorFlow2 | Yes | Yes | Yes | Supported | - | Supported | Yes | - |
GNMT | TensorFlow | Yes | Yes | - | Supported | - | Supported | - | - |
Faster Transformer | Tensorflow | - | - | - | Example | - | Supported | - | - |
Recommender Systems
Models | Framework | AMP | Multi-GPU | Multi-Node | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|
DLRM | PyTorch | Yes | Yes | - | Yes | Example | Yes | Yes |
DLRM | TensorFlow2 | Yes | Yes | Yes | - | Supported | Yes | - |
NCF | PyTorch | Yes | Yes | - | - | Supported | - | - |
Wide&Deep | TensorFlow | Yes | Yes | - | - | Supported | Yes | - |
Wide&Deep | TensorFlow2 | Yes | Yes | - | - | Supported | Yes | - |
NCF | TensorFlow | Yes | Yes | - | - | Supported | Yes | - |
VAE-CF | TensorFlow | Yes | Yes | - | - | Supported | - | - |
SIM | TensorFlow2 | Yes | Yes | - | - | Supported | Yes | - |
Speech to Text
Models | Framework | AMP | Multi-GPU | Multi-Node | TensorRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|
Jasper | PyTorch | Yes | Yes | - | Example | Yes | Example | Yes | Yes |
QuartzNet | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
Text to Speech
Models | Framework | AMP | Multi-GPU | Multi-Node | TensorRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|
FastPitch | PyTorch | Yes | Yes | - | Example | - | Example | Yes | Yes |
FastSpeech | PyTorch | Yes | Yes | - | Example | - | Supported | - | - |
Tacotron 2 and WaveGlow | PyTorch | Yes | Yes | - | Example | Yes | Example | Yes | - |
HiFi-GAN | PyTorch | Yes | Yes | - | Supported | - | Supported | Yes | - |
Graph Neural Networks
Models | Framework | AMP | Multi-GPU | Multi-Node | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|
SE(3)-Transformer | PyTorch | Yes | Yes | - | - | Supported | - | - |
MoFlow | PyTorch | Yes | Yes | - | - | Supported | - | - |
Time-Series Forecasting
Models | Framework | AMP | Multi-GPU | Multi-Node | TensorRT | ONNX | Triton | DLC | NB |
---|---|---|---|---|---|---|---|---|---|
Temporal Fusion Transformer | PyTorch | Yes | Yes | - | Example | Yes | Example | Yes | - |
NVIDIA support
In each of the network READMEs, we indicate the level of support that will be provided. The range is from ongoing updates and improvements to a point-in-time release for thought leadership.
Glossary
Multinode Training Supported on a pyxis/enroot Slurm cluster.
Deep Learning Compiler (DLC) TensorFlow XLA and PyTorch JIT and/or TorchScript
Accelerated Linear Algebra (XLA) XLA is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes. The results are improvements in speed and memory usage.
PyTorch JIT and/or TorchScript TorchScript is a way to create serializable and optimizable models from PyTorch code. TorchScript, an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.
Automatic Mixed Precision (AMP) Automatic Mixed Precision (AMP) enables mixed precision training on Volta, Turing, and NVIDIA Ampere GPU architectures automatically.
TensorFloat-32 (TF32) TensorFloat-32 (TF32) is the new math mode in NVIDIA A100 GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.
Jupyter Notebooks (NB) The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
Feedback / Contributions
We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!
Known issues
In each of the network READMEs, we indicate any known issues and encourage the community to provide feedback.
Top Related Projects
Models and examples built with TensorFlow
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot