TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

11,912

2,231

11,912

460

View on GitHub

Top Related Projects

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

onnx

19,372

Open standard for machine learning interoperability

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

tvm

12,482

Open deep learning compiler stack for cpu, gpu and specialized accelerators

onnxruntime

17,390

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

mlpack

5,377

mlpack: a fast, header-only C++ machine learning library

Quick Overview

TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. It's designed to work with models trained in all major frameworks and can be deployed on various NVIDIA GPUs in data centers, embedded platforms, and automotive solutions.

Pros

Significantly accelerates inference performance on NVIDIA GPUs
Supports a wide range of deep learning frameworks and model types
Provides optimization techniques like layer fusion, precision calibration, and dynamic tensor memory
Offers both C++ and Python APIs for integration flexibility

Cons

Limited to NVIDIA GPUs, not usable on other hardware platforms
Can have a steep learning curve for beginners
Optimization process may require manual tuning for best results
Limited support for certain custom layers or operations

Code Examples

Building a TensorRT engine from an ONNX model:

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)

with open("model.onnx", "rb") as model:
    parser.parse(model.read())

config = builder.create_builder_config()
config.max_workspace_size = 1 << 30  # 1GB
engine = builder.build_engine(network, config)

Performing inference with TensorRT:

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

# Assuming 'engine' is a pre-built TensorRT engine
context = engine.create_execution_context()

# Prepare input and output buffers
input_shape = (1, 3, 224, 224)  # Example shape
output_shape = (1, 1000)  # Example shape

h_input = cuda.pagelocked_empty(input_shape, dtype=np.float32)
h_output = cuda.pagelocked_empty(output_shape, dtype=np.float32)
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)

stream = cuda.Stream()

# Copy input data to device
cuda.memcpy_htod_async(d_input, h_input, stream)

# Run inference
context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)

# Transfer predictions back
cuda.memcpy_dtoh_async(h_output, d_output, stream)

# Synchronize the stream
stream.synchronize()

# h_output now contains the inference results

INT8 Calibration for quantization:

import tensorrt as trt

def get_int8_calibrator():
    return trt.IInt8EntropyCalibrator2(
        dataloader=your_calibration_dataloader,
        cache_file="calibration.cache"
    )

config = builder.create_builder_config()
config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = get_int8_calibrator()

engine = builder.build_engine(network, config)

Getting Started

Install TensorRT:
```
pip install nvidia-tensorrt
```
Convert your model to ONNX format if not already done.

Use the TensorRT Python API to build an engine:

import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)

with open("your_model.onnx", "rb") as model:
    parser.parse(model.read())

Competitor Comparisons

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Broader ecosystem with extensive libraries and tools
Supports a wider range of hardware platforms
More flexible and customizable for various ML tasks

Cons of TensorFlow

Generally slower inference performance than TensorRT
Steeper learning curve for optimization and deployment

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

TensorRT:

import tensorrt as trt

builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)

Key Differences

TensorFlow is a complete ML framework, while TensorRT focuses on optimizing inference
TensorRT is specifically designed for NVIDIA GPUs, offering better performance on supported hardware
TensorFlow provides more flexibility in model development, while TensorRT excels in deployment optimization

Use Cases

TensorFlow: General-purpose ML development, research, and prototyping
TensorRT: High-performance inference deployment on NVIDIA GPUs, especially in production environments

onnx

19,372

Open standard for machine learning interoperability

Pros of ONNX

Platform-independent, supporting a wide range of frameworks and hardware
Open-source with broad industry support and collaboration
Extensive ecosystem of tools and libraries for model conversion and optimization

Cons of ONNX

May require additional steps for deployment and optimization
Performance can vary depending on the target platform and implementation

Code Comparison

ONNX model loading:

import onnx
model = onnx.load("model.onnx")

TensorRT model loading:

import tensorrt as trt
with trt.Builder(TRT_LOGGER) as builder:
    network = builder.create_network()
    parser = trt.OnnxParser(network, TRT_LOGGER)
    parser.parse_from_file("model.onnx")

Key Differences

ONNX focuses on model interoperability and standardization
TensorRT specializes in NVIDIA GPU optimization and acceleration
ONNX provides a common format for various frameworks, while TensorRT is tailored for high-performance inference on NVIDIA hardware
TensorRT offers advanced optimization techniques specific to NVIDIA GPUs
ONNX has broader ecosystem support, while TensorRT excels in NVIDIA-specific deployments

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

More flexible and user-friendly for research and prototyping
Supports dynamic computational graphs, allowing for easier debugging
Larger community and ecosystem with more resources and third-party libraries

Cons of PyTorch

Generally slower inference performance compared to TensorRT
Less optimized for deployment on NVIDIA hardware
Requires more manual optimization for production environments

Code Comparison

PyTorch:

import torch

model = torch.nn.Linear(10, 5)
input_tensor = torch.randn(3, 10)
output = model(input_tensor)

TensorRT:

import tensorrt as trt

builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.parse_from_file("model.onnx")
engine = builder.build_cuda_engine(network)

The PyTorch example shows a simple linear model creation and inference, while the TensorRT example demonstrates the process of parsing an ONNX model and building an optimized CUDA engine. TensorRT requires more setup but offers better performance for deployment on NVIDIA hardware.

tvm

12,482

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Pros of TVM

Hardware-agnostic: Supports a wide range of hardware platforms, not limited to NVIDIA GPUs
Open-source community: Larger and more diverse contributor base, potentially leading to faster innovation
Flexibility: Offers more customization options for optimizing deep learning models

Cons of TVM

Learning curve: Generally requires more expertise to use effectively compared to TensorRT
Performance: May not achieve the same level of optimization as TensorRT on NVIDIA hardware
Maturity: Less mature ecosystem and tooling compared to TensorRT

Code Comparison

TVM example:

import tvm
from tvm import relay

# Define and compile a model
mod, params = relay.testing.resnet.get_workload()
target = tvm.target.cuda()
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target, params=params)

TensorRT example:

import tensorrt as trt

# Create a TensorRT builder and network
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.parse_from_file(model_path)

onnxruntime

17,390

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

Cross-platform compatibility: Supports a wide range of hardware and operating systems
Broader model format support: Works with models from various frameworks, not just TensorRT
Active open-source community: More frequent updates and contributions

Cons of ONNX Runtime

Generally lower performance on NVIDIA GPUs compared to TensorRT
Less optimized for NVIDIA-specific hardware features
May require additional steps for model conversion and optimization

Code Comparison

ONNX Runtime:

import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
output = session.run(None, {"input": input_data})

TensorRT:

import tensorrt as trt
engine = trt.Runtime(trt.Logger(trt.Logger.WARNING)).deserialize_cuda_engine(engine_bytes)
context = engine.create_execution_context()
output = context.execute_v2([input_data])

Both ONNX Runtime and TensorRT are powerful inference engines, but they cater to different use cases. ONNX Runtime offers greater flexibility and cross-platform support, making it suitable for a wide range of applications. TensorRT, on the other hand, excels in performance optimization for NVIDIA GPUs, making it the preferred choice for high-performance inference on NVIDIA hardware.

mlpack

5,377

mlpack: a fast, header-only C++ machine learning library

Pros of mlpack

Open-source and platform-independent, allowing for wider accessibility and community contributions
Extensive collection of machine learning algorithms and utilities
Supports multiple programming languages (C++, Python, Julia, Go, R)

Cons of mlpack

Generally slower performance compared to TensorRT's optimized inference
Less focus on deep learning and neural network acceleration
Smaller community and ecosystem compared to NVIDIA-backed projects

Code Comparison

mlpack (C++):

#include <mlpack/core.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

using namespace mlpack;

arma::mat data;
data::Load("dataset.csv", data, true);
NeighborSearch<NearestNeighborSort> nn(data);

TensorRT (C++):

#include "NvInfer.h"
#include "NvOnnxParser.h"

auto builder = nvinfer1::createInferBuilder(logger);
auto network = builder->createNetworkV2(1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
auto parser = nvonnxparser::createParser(*network, logger);

Both repositories offer machine learning capabilities, but mlpack provides a broader range of algorithms and language support, while TensorRT focuses on optimizing deep learning inference for NVIDIA GPUs.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TensorRT Open Source Software

This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. It includes the sources for TensorRT plugins and ONNX parser, as well as sample applications demonstrating usage and capabilities of the TensorRT platform. These open source software components are a subset of the TensorRT General Availability (GA) release with some extensions and bug-fixes.

For code contributions to TensorRT-OSS, please see our Contribution Guide and Coding Guidelines.
For a summary of new additions and updates shipped with TensorRT-OSS releases, please refer to the Changelog.
For business inquiries, please contact researchinquiries@nvidia.com
For press and other inquiries, please contact Hector Marinez at hmarinez@nvidia.com

Need enterprise support? NVIDIA global support is available for TensorRT with the NVIDIA AI Enterprise software suite. Check out NVIDIA LaunchPad for free access to a set of hands-on labs with TensorRT hosted on NVIDIA infrastructure.

Join the TensorRT and Triton community and stay current on the latest product updates, bug fixes, content, best practices, and more.

Prebuilt TensorRT Python Package

We provide the TensorRT Python package for an easy installation.
To install:

pip install tensorrt

You can skip the Build section to enjoy TensorRT with Python.

Build

Prerequisites

To build the TensorRT-OSS components, you will first need the following software packages.

TensorRT GA build

TensorRT v10.12.0.36
- Available from direct download links listed below

System Packages

CUDA
- Recommended versions:
- cuda-12.9.0
- cuda-11.8.0
CUDNN (optional)
- cuDNN 8.9
GNU make >= v4.1
cmake >= v3.13
python >= v3.8, <= v3.10.x
pip >= v19.0
Essential utilities
- git, pkg-config, wget

Optional Packages

Containerized build
- Docker >= 19.03
- NVIDIA Container Toolkit
PyPI packages (for demo applications/tests)
- onnx
- onnxruntime
- tensorflow-gpu >= 2.5.1
- Pillow >= 9.0.1
- pycuda < 2021.1
- numpy
- pytest
Code formatting tools (for contributors)
- Clang-format
- Git-clang-format
NOTE: onnx-tensorrt, cub, and protobuf packages are downloaded along with TensorRT OSS, and not required to be installed.

Downloading TensorRT Build

Download TensorRT OSS

git clone -b main https://github.com/nvidia/TensorRT TensorRT
cd TensorRT
git submodule update --init --recursive

(Optional - if not using TensorRT container) Specify the TensorRT GA release build path

If using the TensorRT OSS build container, TensorRT libraries are preinstalled under /usr/lib/x86_64-linux-gnu and you may skip this step.

Else download and extract the TensorRT GA build from NVIDIA Developer Zone with the direct links below:
Example: Ubuntu 20.04 on x86-64 with cuda-12.9
```
cd ~/Downloads
tar -xvzf TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.12.0.36
```
Example: Windows on x86-64 with cuda-12.9
```
Expand-Archive -Path TensorRT-10.12.0.36.Windows.win10.cuda-12.9.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.12.0.36\lib"
```

Setting Up The Build Environment

For Linux platforms, we recommend that you generate a docker container for building TensorRT OSS as described below. For native builds, please install the prerequisite System Packages.

Generate the TensorRT-OSS build container.

Example: Ubuntu 20.04 on x86-64 with cuda-12.9 (default)

./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.9

Example: Rockylinux8 on x86-64 with cuda-12.9

./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.9

Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.9 (JetPack SDK)

./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.9

Example: Ubuntu 22.04 on aarch64 with cuda-12.9

./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.9

Launch the TensorRT-OSS build container.

Example: Ubuntu 20.04 build container
```
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.9 --gpus all
```
NOTE:
1. Use the --tag corresponding to build container generated in Step 1.
2. NVIDIA Container Toolkit is required for GPU access (running TensorRT applications) inside the build container.
3. sudo password for Ubuntu build containers is 'nvidia'.
4. Specify port number using --jupyter <port> for launching Jupyter notebooks.

Building TensorRT-OSS

Generate Makefiles and build

Example: Linux (x86-64) build with default cuda-12.9

cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)

Example: Linux (aarch64) build with default cuda-12.9

cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
make -j$(nproc)

Example: Native build on Jetson (aarch64) with cuda-12.9

cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.9
CC=/usr/bin/gcc make -j$(nproc)

NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.

Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.9 (JetPack)

cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.9 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.9/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.9/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
make -j$(nproc)

Example: Native builds on Windows (x86) with cuda-12.9

cd $TRT_OSSPATH
mkdir -p build
cd -p build
cmake .. -DTRT_LIB_DIR="$env:TRT_LIBPATH" -DCUDNN_ROOT_DIR="$env:CUDNN_PATH" -DTRT_OUT_DIR="$pwd\\out"
msbuild TensorRT.sln /property:Configuration=Release -m:$env:NUMBER_OF_PROCESSORS

NOTE: The default CUDA version used by CMake is 12.9.0. To override this, for example to 11.8, append -DCUDA_VERSION=11.8 to the cmake command.

Required CMake build arguments are:
- TRT_LIB_DIR: Path to the TensorRT installation directory containing libraries.
- TRT_OUT_DIR: Output directory where generated build artifacts will be copied.
Optional CMake build arguments:
- CMAKE_BUILD_TYPE: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [Release] | Debug
- CUDA_VERSION: The version of CUDA to target, for example [11.7.1].
- CUDNN_VERSION: The version of cuDNN to target, for example [8.6].
- PROTOBUF_VERSION: The version of Protobuf to use, for example [3.0.0]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
- CMAKE_TOOLCHAIN_FILE: The path to a toolchain file for cross compilation.
- BUILD_PARSERS: Specify if the parsers should be built, for example [ON] | OFF. If turned OFF, CMake will try to find precompiled versions of the parser libraries to use in compiling samples. First in ${TRT_LIB_DIR}, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
- BUILD_PLUGINS: Specify if the plugins should be built, for example [ON] | OFF. If turned OFF, CMake will try to find a precompiled version of the plugin library to use in compiling samples. First in ${TRT_LIB_DIR}, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
- BUILD_SAMPLES: Specify if the samples should be built, for example [ON] | OFF.
- GPU_ARCHS: GPU (SM) architectures to target. By default we generate CUDA code for all major SMs. Specific SM versions can be specified here as a quoted space-separated list to reduce compilation time and binary size. Table of compute capabilities of NVIDIA GPUs can be found here. Examples: - NVidia A100: -DGPU_ARCHS="80" - Tesla T4, GeForce RTX 2080: -DGPU_ARCHS="75" - Titan V, Tesla V100: -DGPU_ARCHS="70" - Multiple SMs: -DGPU_ARCHS="80 75"
- TRT_PLATFORM_ID: Bare-metal build (unlike containerized cross-compilation). Currently supported options: x86_64 (default).

References

TensorRT Resources

Known Issues

Please refer to TensorRT Release Notes

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of TensorFlow

Cons of TensorFlow

Code Comparison

Key Differences

Use Cases

Pros of ONNX

Cons of ONNX

Code Comparison

Key Differences

Pros of PyTorch

Cons of PyTorch

Code Comparison

Pros of TVM

Cons of TVM

Code Comparison

Pros of ONNX Runtime

Cons of ONNX Runtime

Code Comparison

Pros of mlpack

Cons of mlpack

Code Comparison

Convert designs to code with AI

README

TensorRT Open Source Software

Prebuilt TensorRT Python Package

Build

Prerequisites

Downloading TensorRT Build

Download TensorRT OSS

(Optional - if not using TensorRT container) Specify the TensorRT GA release build path

Setting Up The Build Environment

Generate the TensorRT-OSS build container.

Launch the TensorRT-OSS build container.

Building TensorRT-OSS

References

TensorRT Resources

Known Issues

Top Related Projects

Convert designs to code with AI