Convert Figma logo to code with AI

NVIDIA logoTensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

10,522
2,101
10,522
336

Top Related Projects

185,446

An Open Source Machine Learning Framework for Everyone

17,615

Open standard for machine learning interoperability

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

11,580

Open deep learning compiler stack for cpu, gpu and specialized accelerators

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

5,011

mlpack: a fast, header-only C++ machine learning library

Quick Overview

TensorRT is a high-performance deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. It's designed to work with models trained in all major frameworks and can be deployed on various NVIDIA GPUs in data centers, embedded platforms, and automotive solutions.

Pros

  • Significantly accelerates inference performance on NVIDIA GPUs
  • Supports a wide range of deep learning frameworks and model types
  • Provides optimization techniques like layer fusion, precision calibration, and dynamic tensor memory
  • Offers both C++ and Python APIs for integration flexibility

Cons

  • Limited to NVIDIA GPUs, not usable on other hardware platforms
  • Can have a steep learning curve for beginners
  • Optimization process may require manual tuning for best results
  • Limited support for certain custom layers or operations

Code Examples

  1. Building a TensorRT engine from an ONNX model:
import tensorrt as trt

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)

with open("model.onnx", "rb") as model:
    parser.parse(model.read())

config = builder.create_builder_config()
config.max_workspace_size = 1 << 30  # 1GB
engine = builder.build_engine(network, config)
  1. Performing inference with TensorRT:
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

# Assuming 'engine' is a pre-built TensorRT engine
context = engine.create_execution_context()

# Prepare input and output buffers
input_shape = (1, 3, 224, 224)  # Example shape
output_shape = (1, 1000)  # Example shape

h_input = cuda.pagelocked_empty(input_shape, dtype=np.float32)
h_output = cuda.pagelocked_empty(output_shape, dtype=np.float32)
d_input = cuda.mem_alloc(h_input.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)

stream = cuda.Stream()

# Copy input data to device
cuda.memcpy_htod_async(d_input, h_input, stream)

# Run inference
context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)

# Transfer predictions back
cuda.memcpy_dtoh_async(h_output, d_output, stream)

# Synchronize the stream
stream.synchronize()

# h_output now contains the inference results
  1. INT8 Calibration for quantization:
import tensorrt as trt

def get_int8_calibrator():
    return trt.IInt8EntropyCalibrator2(
        dataloader=your_calibration_dataloader,
        cache_file="calibration.cache"
    )

config = builder.create_builder_config()
config.set_flag(trt.BuilderFlag.INT8)
config.int8_calibrator = get_int8_calibrator()

engine = builder.build_engine(network, config)

Getting Started

  1. Install TensorRT:

    pip install nvidia-tensorrt
    
  2. Convert your model to ONNX format if not already done.

  3. Use the TensorRT Python API to build an engine:

    import tensorrt as trt
    
    logger = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)
    
    with open("your_model.onnx", "rb") as model:
        parser.parse(model.read())
    

Competitor Comparisons

185,446

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Broader ecosystem with extensive libraries and tools
  • Supports a wider range of hardware platforms
  • More flexible and customizable for various ML tasks

Cons of TensorFlow

  • Generally slower inference performance than TensorRT
  • Steeper learning curve for optimization and deployment

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

TensorRT:

import tensorrt as trt

builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)

Key Differences

  • TensorFlow is a complete ML framework, while TensorRT focuses on optimizing inference
  • TensorRT is specifically designed for NVIDIA GPUs, offering better performance on supported hardware
  • TensorFlow provides more flexibility in model development, while TensorRT excels in deployment optimization

Use Cases

  • TensorFlow: General-purpose ML development, research, and prototyping
  • TensorRT: High-performance inference deployment on NVIDIA GPUs, especially in production environments
17,615

Open standard for machine learning interoperability

Pros of ONNX

  • Platform-independent, supporting a wide range of frameworks and hardware
  • Open-source with broad industry support and collaboration
  • Extensive ecosystem of tools and libraries for model conversion and optimization

Cons of ONNX

  • May require additional steps for deployment and optimization
  • Performance can vary depending on the target platform and implementation

Code Comparison

ONNX model loading:

import onnx
model = onnx.load("model.onnx")

TensorRT model loading:

import tensorrt as trt
with trt.Builder(TRT_LOGGER) as builder:
    network = builder.create_network()
    parser = trt.OnnxParser(network, TRT_LOGGER)
    parser.parse_from_file("model.onnx")

Key Differences

  • ONNX focuses on model interoperability and standardization
  • TensorRT specializes in NVIDIA GPU optimization and acceleration
  • ONNX provides a common format for various frameworks, while TensorRT is tailored for high-performance inference on NVIDIA hardware
  • TensorRT offers advanced optimization techniques specific to NVIDIA GPUs
  • ONNX has broader ecosystem support, while TensorRT excels in NVIDIA-specific deployments
82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • More flexible and user-friendly for research and prototyping
  • Supports dynamic computational graphs, allowing for easier debugging
  • Larger community and ecosystem with more resources and third-party libraries

Cons of PyTorch

  • Generally slower inference performance compared to TensorRT
  • Less optimized for deployment on NVIDIA hardware
  • Requires more manual optimization for production environments

Code Comparison

PyTorch:

import torch

model = torch.nn.Linear(10, 5)
input_tensor = torch.randn(3, 10)
output = model(input_tensor)

TensorRT:

import tensorrt as trt

builder = trt.Builder(TRT_LOGGER)
network = builder.create_network()
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.parse_from_file("model.onnx")
engine = builder.build_cuda_engine(network)

The PyTorch example shows a simple linear model creation and inference, while the TensorRT example demonstrates the process of parsing an ONNX model and building an optimized CUDA engine. TensorRT requires more setup but offers better performance for deployment on NVIDIA hardware.

11,580

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Pros of TVM

  • Hardware-agnostic: Supports a wide range of hardware platforms, not limited to NVIDIA GPUs
  • Open-source community: Larger and more diverse contributor base, potentially leading to faster innovation
  • Flexibility: Offers more customization options for optimizing deep learning models

Cons of TVM

  • Learning curve: Generally requires more expertise to use effectively compared to TensorRT
  • Performance: May not achieve the same level of optimization as TensorRT on NVIDIA hardware
  • Maturity: Less mature ecosystem and tooling compared to TensorRT

Code Comparison

TVM example:

import tvm
from tvm import relay

# Define and compile a model
mod, params = relay.testing.resnet.get_workload()
target = tvm.target.cuda()
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target, params=params)

TensorRT example:

import tensorrt as trt

# Create a TensorRT builder and network
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, TRT_LOGGER)
parser.parse_from_file(model_path)

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Cross-platform compatibility: Supports a wide range of hardware and operating systems
  • Broader model format support: Works with models from various frameworks, not just TensorRT
  • Active open-source community: More frequent updates and contributions

Cons of ONNX Runtime

  • Generally lower performance on NVIDIA GPUs compared to TensorRT
  • Less optimized for NVIDIA-specific hardware features
  • May require additional steps for model conversion and optimization

Code Comparison

ONNX Runtime:

import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
output = session.run(None, {"input": input_data})

TensorRT:

import tensorrt as trt
engine = trt.Runtime(trt.Logger(trt.Logger.WARNING)).deserialize_cuda_engine(engine_bytes)
context = engine.create_execution_context()
output = context.execute_v2([input_data])

Both ONNX Runtime and TensorRT are powerful inference engines, but they cater to different use cases. ONNX Runtime offers greater flexibility and cross-platform support, making it suitable for a wide range of applications. TensorRT, on the other hand, excels in performance optimization for NVIDIA GPUs, making it the preferred choice for high-performance inference on NVIDIA hardware.

5,011

mlpack: a fast, header-only C++ machine learning library

Pros of mlpack

  • Open-source and platform-independent, allowing for wider accessibility and community contributions
  • Extensive collection of machine learning algorithms and utilities
  • Supports multiple programming languages (C++, Python, Julia, Go, R)

Cons of mlpack

  • Generally slower performance compared to TensorRT's optimized inference
  • Less focus on deep learning and neural network acceleration
  • Smaller community and ecosystem compared to NVIDIA-backed projects

Code Comparison

mlpack (C++):

#include <mlpack/core.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>

using namespace mlpack;

arma::mat data;
data::Load("dataset.csv", data, true);
NeighborSearch<NearestNeighborSort> nn(data);

TensorRT (C++):

#include "NvInfer.h"
#include "NvOnnxParser.h"

auto builder = nvinfer1::createInferBuilder(logger);
auto network = builder->createNetworkV2(1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
auto parser = nvonnxparser::createParser(*network, logger);

Both repositories offer machine learning capabilities, but mlpack provides a broader range of algorithms and language support, while TensorRT focuses on optimizing deep learning inference for NVIDIA GPUs.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

License Documentation

TensorRT Open Source Software

This repository contains the Open Source Software (OSS) components of NVIDIA TensorRT. It includes the sources for TensorRT plugins and ONNX parser, as well as sample applications demonstrating usage and capabilities of the TensorRT platform. These open source software components are a subset of the TensorRT General Availability (GA) release with some extensions and bug-fixes.

Need enterprise support? NVIDIA global support is available for TensorRT with the NVIDIA AI Enterprise software suite. Check out NVIDIA LaunchPad for free access to a set of hands-on labs with TensorRT hosted on NVIDIA infrastructure.

Join the TensorRT and Triton community and stay current on the latest product updates, bug fixes, content, best practices, and more.

Prebuilt TensorRT Python Package

We provide the TensorRT Python package for an easy installation.
To install:

pip install tensorrt

You can skip the Build section to enjoy TensorRT with Python.

Build

Prerequisites

To build the TensorRT-OSS components, you will first need the following software packages.

TensorRT GA build

  • TensorRT v10.3.0.26
    • Available from direct download links listed below

System Packages

Optional Packages

Downloading TensorRT Build

  1. Download TensorRT OSS

    git clone -b main https://github.com/nvidia/TensorRT TensorRT
    cd TensorRT
    git submodule update --init --recursive
    
  2. (Optional - if not using TensorRT container) Specify the TensorRT GA release build path

    If using the TensorRT OSS build container, TensorRT libraries are preinstalled under /usr/lib/x86_64-linux-gnu and you may skip this step.

    Else download and extract the TensorRT GA build from NVIDIA Developer Zone with the direct links below:

    Example: Ubuntu 20.04 on x86-64 with cuda-12.5

    cd ~/Downloads
    tar -xvzf TensorRT-10.3.0.26.Linux.x86_64-gnu.cuda-12.5.tar.gz
    export TRT_LIBPATH=`pwd`/TensorRT-10.3.0.26
    

    Example: Windows on x86-64 with cuda-12.5

    Expand-Archive -Path TensorRT-10.3.0.26.Windows.win10.cuda-12.5.zip
    $env:TRT_LIBPATH="$pwd\TensorRT-10.3.0.26\lib"
    

Setting Up The Build Environment

For Linux platforms, we recommend that you generate a docker container for building TensorRT OSS as described below. For native builds, please install the prerequisite System Packages.

  1. Generate the TensorRT-OSS build container.

    The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box.

    Example: Ubuntu 20.04 on x86-64 with cuda-12.5 (default)

    ./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.5
    

    Example: Rockylinux8 on x86-64 with cuda-12.5

    ./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.5
    

    Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.5 (JetPack SDK)

    ./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.5
    

    Example: Ubuntu 22.04 on aarch64 with cuda-12.5

    ./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.5
    
  2. Launch the TensorRT-OSS build container.

    Example: Ubuntu 20.04 build container

    ./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.5 --gpus all
    

    NOTE:
    1. Use the --tag corresponding to build container generated in Step 1.
    2. NVIDIA Container Toolkit is required for GPU access (running TensorRT applications) inside the build container.
    3. sudo password for Ubuntu build containers is 'nvidia'.
    4. Specify port number using --jupyter <port> for launching Jupyter notebooks.

Building TensorRT-OSS

  • Generate Makefiles and build.

    Example: Linux (x86-64) build with default cuda-12.5

    cd $TRT_OSSPATH
    mkdir -p build && cd build
    cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
    make -j$(nproc)
    

    Example: Linux (aarch64) build with default cuda-12.5

    cd $TRT_OSSPATH
    mkdir -p build && cd build
    cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
    make -j$(nproc)
    

    Example: Native build on Jetson (aarch64) with cuda-12.5

    cd $TRT_OSSPATH
    mkdir -p build && cd build
    cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.5
    CC=/usr/bin/gcc make -j$(nproc)
    

    NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.

    Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.5 (JetPack)

    cd $TRT_OSSPATH
    mkdir -p build && cd build
    cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.5 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.5/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
    make -j$(nproc)
    

    Example: Native builds on Windows (x86) with cuda-12.5

    cd $TRT_OSSPATH
    mkdir -p build
    cd -p build
    cmake .. -DTRT_LIB_DIR="$env:TRT_LIBPATH" -DCUDNN_ROOT_DIR="$env:CUDNN_PATH" -DTRT_OUT_DIR="$pwd\\out"
    msbuild TensorRT.sln /property:Configuration=Release -m:$env:NUMBER_OF_PROCESSORS
    

    NOTE:
    1. The default CUDA version used by CMake is 12.4.0. To override this, for example to 11.8, append -DCUDA_VERSION=11.8 to the cmake command.

  • Required CMake build arguments are:

    • TRT_LIB_DIR: Path to the TensorRT installation directory containing libraries.
    • TRT_OUT_DIR: Output directory where generated build artifacts will be copied.
  • Optional CMake build arguments:

    • CMAKE_BUILD_TYPE: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [Release] | Debug
    • CUDA_VERSION: The version of CUDA to target, for example [11.7.1].
    • CUDNN_VERSION: The version of cuDNN to target, for example [8.6].
    • PROTOBUF_VERSION: The version of Protobuf to use, for example [3.0.0]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
    • CMAKE_TOOLCHAIN_FILE: The path to a toolchain file for cross compilation.
    • BUILD_PARSERS: Specify if the parsers should be built, for example [ON] | OFF. If turned OFF, CMake will try to find precompiled versions of the parser libraries to use in compiling samples. First in ${TRT_LIB_DIR}, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
    • BUILD_PLUGINS: Specify if the plugins should be built, for example [ON] | OFF. If turned OFF, CMake will try to find a precompiled version of the plugin library to use in compiling samples. First in ${TRT_LIB_DIR}, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
    • BUILD_SAMPLES: Specify if the samples should be built, for example [ON] | OFF.
    • GPU_ARCHS: GPU (SM) architectures to target. By default we generate CUDA code for all major SMs. Specific SM versions can be specified here as a quoted space-separated list to reduce compilation time and binary size. Table of compute capabilities of NVIDIA GPUs can be found here. Examples:
      • NVidia A100: -DGPU_ARCHS="80"
      • Tesla T4, GeForce RTX 2080: -DGPU_ARCHS="75"
      • Titan V, Tesla V100: -DGPU_ARCHS="70"
      • Multiple SMs: -DGPU_ARCHS="80 75"
    • TRT_PLATFORM_ID: Bare-metal build (unlike containerized cross-compilation). Currently supported options: x86_64 (default).

References

TensorRT Resources

Known Issues