DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

14,427

3,357

14,427

360

View on GitHub

Top Related Projects

models

77,618

Models and examples built with TensorFlow

examples

23,172

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

NVIDIA/DeepLearningExamples is a GitHub repository that provides state-of-the-art deep learning examples optimized for NVIDIA GPUs. It includes scripts, models, and documentation for various deep learning tasks across different frameworks like PyTorch, TensorFlow, and MXNet. The repository aims to showcase best practices and high-performance implementations for AI researchers and developers.

Pros

Optimized for NVIDIA GPUs, ensuring high performance and efficiency
Covers a wide range of deep learning tasks and popular frameworks
Includes detailed documentation and performance benchmarks
Regularly updated with new models and techniques

Cons

Primarily focused on NVIDIA hardware, which may limit usefulness for users with other GPU brands
Some examples may require significant computational resources
Learning curve can be steep for beginners in deep learning
Not all examples are maintained at the same frequency

Code Examples

Loading a pre-trained BERT model using PyTorch:

from transformers import BertModel, BertTokenizer

model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)

input_text = "Example sentence for BERT."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)

Training a ResNet50 model on ImageNet using TensorFlow:

import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.optimizers import SGD

model = ResNet50(weights=None, classes=1000)
optimizer = SGD(learning_rate=0.1, momentum=0.9)

model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_dataset, epochs=90, validation_data=val_dataset)

Implementing NVIDIA Apex for mixed precision training in PyTorch:

import torch
from apex import amp

model = YourModel()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

model, optimizer = amp.initialize(model, optimizer, opt_level="O1")

for epoch in range(num_epochs):
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        with amp.scale_loss(loss, optimizer) as scaled_loss:
            scaled_loss.backward()
        optimizer.step()

Getting Started

To get started with NVIDIA/DeepLearningExamples:

Clone the repository:

git clone https://github.com/NVIDIA/DeepLearningExamples.git
cd DeepLearningExamples

Choose a specific example (e.g., BERT for PyTorch):
```
cd PyTorch/LanguageModeling/BERT
```

Follow the README instructions for setting up the environment and running the example:

# Create and activate a new conda environment
conda env create -f requirements.yml
conda activate nvidia_bert_pytorch

# Run the training script
python run_pretraining.py --input_dir /path/to/your/data

Note: Specific instructions may vary depending on the chosen example and framework.

Competitor Comparisons

models

77,618

Models and examples built with TensorFlow

Pros of TensorFlow Models

Broader range of models and applications, covering various domains
More extensive documentation and community support
Regular updates and contributions from the TensorFlow team

Cons of TensorFlow Models

Less focus on GPU optimization compared to DeepLearningExamples
May require more setup and configuration for high-performance scenarios
Some models might not be as production-ready as those in DeepLearningExamples

Code Comparison

DeepLearningExamples:

import torch
from apex import amp

model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
with amp.scale_loss(loss, optimizer) as scaled_loss:
    scaled_loss.backward()

TensorFlow Models:

import tensorflow as tf

with tf.GradientTape() as tape:
    predictions = model(inputs, training=True)
    loss = loss_function(labels, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

The DeepLearningExamples code showcases NVIDIA's Apex library for mixed precision training, while TensorFlow Models uses standard TensorFlow operations. DeepLearningExamples focuses on GPU optimization, whereas TensorFlow Models provides a more general approach suitable for various hardware configurations.

examples

23,172

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Pros of PyTorch Examples

Simpler, more beginner-friendly implementations
Wider range of basic deep learning models and tasks
More frequently updated with community contributions

Cons of PyTorch Examples

Less focus on performance optimization
Fewer industry-scale, production-ready examples
Limited support for distributed training and multi-GPU setups

Code Comparison

DeepLearningExamples (BERT fine-tuning):

model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", num_labels=num_labels)
model = model.to(device)
if args.n_gpu > 1:
    model = torch.nn.DataParallel(model)

PyTorch Examples (MNIST classification):

model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=args.lr)
for epoch in range(1, args.epochs + 1):
    train(args, model, device, train_loader, optimizer, epoch)
    test(model, device, test_loader)

The DeepLearningExamples code shows more advanced features like multi-GPU support, while PyTorch Examples focuses on simplicity and readability for basic tasks.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Extensive library of pre-trained models for various NLP tasks
Active community and frequent updates
Easy-to-use API for fine-tuning and inference

Cons of transformers

Less focus on performance optimization for specific hardware
May require additional setup for distributed training
Limited examples for non-NLP tasks

Code Comparison

transformers:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

DeepLearningExamples:

import torch
from model.bert import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

The code comparison shows that transformers provides a more streamlined API for loading and using pre-trained models, while DeepLearningExamples may require more manual setup and configuration. However, DeepLearningExamples often includes optimizations for NVIDIA hardware and distributed training scenarios.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Offers more advanced optimization techniques like ZeRO (Zero Redundancy Optimizer)
Provides better scalability for distributed training across multiple GPUs and nodes
Includes a more comprehensive set of features for large-scale model training

Cons of DeepSpeed

Steeper learning curve due to more complex configuration options
May require more setup and tuning to achieve optimal performance
Less focus on providing ready-to-use examples for specific models or tasks

Code Comparison

DeepSpeed configuration example:

{
    "train_batch_size": 32,
    "gradient_accumulation_steps": 1,
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 3e-5
        }
    },
    "fp16": {
        "enabled": true
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": true,
        "reduce_scatter": true,
        "allgather_bucket_size": 5e8,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "cpu_offload": true
    }
}

DeepLearningExamples configuration example:

{
    "batch_size": 32,
    "learning_rate": 3e-5,
    "fp16": true,
    "distributed": true,
    "num_gpus": 8
}

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

More focused on sequence-to-sequence learning and natural language processing tasks
Offers a wider range of pre-trained models and benchmarks for NLP
Provides a flexible and modular architecture for easier customization

Cons of fairseq

Less emphasis on other deep learning domains (e.g., computer vision, speech recognition)
May have a steeper learning curve for beginners due to its more specialized nature
Potentially less optimized for NVIDIA hardware compared to DeepLearningExamples

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', 'checkpoint.pt')
model.translate('Hello world!')

DeepLearningExamples:

from nvidia.transformer import TransformerModel
model = TransformerModel.from_pretrained('nvidia_transformer_large')
model.translate('Hello world!')

Both repositories provide high-quality implementations of deep learning models, but they cater to different needs. fairseq is more specialized for NLP tasks, while DeepLearningExamples covers a broader range of deep learning applications with a focus on NVIDIA hardware optimization.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

NVIDIA Deep Learning Examples for Tensor Cores

Introduction

This repository provides State-of-the-Art Deep Learning examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X software stack running on NVIDIA Volta, Turing and Ampere GPUs.

NVIDIA GPU Cloud (NGC) Container Registry

These examples, along with our NVIDIA deep learning software stack, are provided in a monthly updated Docker container on the NGC container registry (https://ngc.nvidia.com). These containers include:

The latest NVIDIA examples from this repository
The latest NVIDIA contributions shared upstream to the respective framework
The latest NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, cuBLAS, etc. which have all been through a rigorous monthly quality assurance process to ensure that they provide the best possible performance
Monthly release notes for each of the NVIDIA optimized containers

Computer Vision

Models	Framework	AMP	Multi-GPU	Multi-Node	TensorRT	ONNX	Triton	DLC	NB
EfficientNet-B0	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-
EfficientNet-B4	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-
EfficientNet-WideSE-B0	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-
EfficientNet-WideSE-B4	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-
EfficientNet v1-B0	TensorFlow2	Yes	Yes	Yes	Example	-	Supported	Yes	-
EfficientNet v1-B4	TensorFlow2	Yes	Yes	Yes	Example	-	Supported	Yes	-
EfficientNet v2-S	TensorFlow2	Yes	Yes	Yes	Example	-	Supported	Yes	-
GPUNet	PyTorch	Yes	Yes	-	Example	Yes	Example	Yes	-
Mask R-CNN	PyTorch	Yes	Yes	-	Example	-	Supported	-	Yes
Mask R-CNN	TensorFlow2	Yes	Yes	-	Example	-	Supported	Yes	-
nnUNet	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-
ResNet-50	MXNet	Yes	Yes	-	Supported	-	Supported	-	-
ResNet-50	PaddlePaddle	Yes	Yes	-	Example	-	Supported	-	-
ResNet-50	PyTorch	Yes	Yes	-	Example	-	Example	Yes	-
ResNet-50	TensorFlow	Yes	Yes	-	Supported	-	Supported	Yes	-
ResNeXt-101	PyTorch	Yes	Yes	-	Example	-	Example	Yes	-
ResNeXt-101	TensorFlow	Yes	Yes	-	Supported	-	Supported	Yes	-
SE-ResNeXt-101	PyTorch	Yes	Yes	-	Example	-	Example	Yes	-
SE-ResNeXt-101	TensorFlow	Yes	Yes	-	Supported	-	Supported	Yes	-
SSD	PyTorch	Yes	Yes	-	Supported	-	Supported	-	Yes
SSD	TensorFlow	Yes	Yes	-	Supported	-	Supported	Yes	Yes
U-Net Med	TensorFlow2	Yes	Yes	-	Example	-	Supported	Yes	-

Natural Language Processing

Models	Framework	AMP	Multi-GPU	Multi-Node	TensorRT	ONNX	Triton	DLC	NB
BERT	PyTorch	Yes	Yes	Yes	Example	-	Example	Yes	-
GNMT	PyTorch	Yes	Yes	-	Supported	-	Supported	-	-
ELECTRA	TensorFlow2	Yes	Yes	Yes	Supported	-	Supported	Yes	-
BERT	TensorFlow	Yes	Yes	Yes	Example	-	Example	Yes	Yes
BERT	TensorFlow2	Yes	Yes	Yes	Supported	-	Supported	Yes	-
GNMT	TensorFlow	Yes	Yes	-	Supported	-	Supported	-	-
Faster Transformer	Tensorflow	-	-	-	Example	-	Supported	-	-

Recommender Systems

Models	Framework	AMP	Multi-GPU	Multi-Node	ONNX	Triton	DLC	NB
DLRM	PyTorch	Yes	Yes	-	Yes	Example	Yes	Yes
DLRM	TensorFlow2	Yes	Yes	Yes	-	Supported	Yes	-
NCF	PyTorch	Yes	Yes	-	-	Supported	-	-
Wide&Deep	TensorFlow	Yes	Yes	-	-	Supported	Yes	-
Wide&Deep	TensorFlow2	Yes	Yes	-	-	Supported	Yes	-
NCF	TensorFlow	Yes	Yes	-	-	Supported	Yes	-
VAE-CF	TensorFlow	Yes	Yes	-	-	Supported	-	-
SIM	TensorFlow2	Yes	Yes	-	-	Supported	Yes	-

Speech to Text

Models	Framework	AMP	Multi-GPU	Multi-Node	TensorRT	ONNX	Triton	DLC	NB
Jasper	PyTorch	Yes	Yes	-	Example	Yes	Example	Yes	Yes
QuartzNet	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-

Text to Speech

Models	Framework	AMP	Multi-GPU	Multi-Node	TensorRT	ONNX	Triton	DLC	NB
FastPitch	PyTorch	Yes	Yes	-	Example	-	Example	Yes	Yes
FastSpeech	PyTorch	Yes	Yes	-	Example	-	Supported	-	-
Tacotron 2 and WaveGlow	PyTorch	Yes	Yes	-	Example	Yes	Example	Yes	-
HiFi-GAN	PyTorch	Yes	Yes	-	Supported	-	Supported	Yes	-

Graph Neural Networks

Models	Framework	AMP	Multi-GPU	Multi-Node	ONNX	Triton	DLC	NB
SE(3)-Transformer	PyTorch	Yes	Yes	-	-	Supported	-	-
MoFlow	PyTorch	Yes	Yes	-	-	Supported	-	-

Time-Series Forecasting

Models	Framework	AMP	Multi-GPU	Multi-Node	TensorRT	ONNX	Triton	DLC	NB
Temporal Fusion Transformer	PyTorch	Yes	Yes	-	Example	Yes	Example	Yes	-

NVIDIA support

In each of the network READMEs, we indicate the level of support that will be provided. The range is from ongoing updates and improvements to a point-in-time release for thought leadership.

Glossary

Multinode Training Supported on a pyxis/enroot Slurm cluster.

Deep Learning Compiler (DLC) TensorFlow XLA and PyTorch JIT and/or TorchScript

Accelerated Linear Algebra (XLA) XLA is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes. The results are improvements in speed and memory usage.

PyTorch JIT and/or TorchScript TorchScript is a way to create serializable and optimizable models from PyTorch code. TorchScript, an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.

Automatic Mixed Precision (AMP) Automatic Mixed Precision (AMP) enables mixed precision training on Volta, Turing, and NVIDIA Ampere GPU architectures automatically.

TensorFloat-32 (TF32) TensorFloat-32 (TF32) is the new math mode in NVIDIA A100 GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.

Jupyter Notebooks (NB) The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Feedback / Contributions

We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!

Known issues

In each of the network READMEs, we indicate any known issues and encourage the community to provide feedback.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot