nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs

3,371

361

3,371

406

View on GitHub

Top Related Projects

k8s-device-plugin

3,291

NVIDIA device plugin for Kubernetes

gpu-operator

2,179

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

containerd

18,858

An open and reliable container runtime

moby

70,416

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

kubernetes

115,996

Production-Grade Container Scheduling and Management

Quick Overview

The NVIDIA Container Toolkit is a set of tools and libraries that enable GPU support for Docker containers. It allows users to build and run GPU-accelerated Docker containers, making it easier to deploy and manage GPU-enabled applications in containerized environments.

Pros

Seamless integration of NVIDIA GPUs with Docker containers
Supports various NVIDIA GPU architectures and driver versions
Enables easy deployment of GPU-accelerated applications in containerized environments
Provides fine-grained control over GPU resources allocation

Cons

Limited to NVIDIA GPUs only
Requires additional setup and configuration compared to standard Docker containers
May introduce compatibility issues with certain applications or frameworks
Performance overhead due to virtualization layer (though minimal in most cases)

Code Examples

Running a GPU-enabled container:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

This command runs a container with the CUDA 11.0 base image and executes the nvidia-smi command to display GPU information.

Specifying GPU devices:

docker run --gpus '"device=0,1"' nvidia/cuda:11.0-base nvidia-smi

This example runs a container using only GPUs 0 and 1.

Setting GPU memory limits:

docker run --gpus all,capabilities=utility,compute,memory,graphics,video --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all -e NVIDIA_REQUIRE_CUDA="cuda>=11.0" nvidia/cuda:11.0-base nvidia-smi

This command runs a container with full GPU capabilities and sets specific CUDA version requirements.

Getting Started

Install the NVIDIA Container Toolkit:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2

Restart the Docker daemon:

sudo systemctl restart docker

Test the installation:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

This should display information about your NVIDIA GPUs, confirming that the toolkit is working correctly.

Competitor Comparisons

k8s-device-plugin

3,291

NVIDIA device plugin for Kubernetes

Pros of k8s-device-plugin

Specifically designed for Kubernetes environments
Simplifies GPU allocation in Kubernetes clusters
Supports advanced features like GPU sharing and MIG

Cons of k8s-device-plugin

Limited to Kubernetes environments
Requires additional setup compared to nvidia-container-toolkit
May have a steeper learning curve for non-Kubernetes users

Code Comparison

k8s-device-plugin:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  selector:
    matchLabels:
      name: nvidia-device-plugin-ds

nvidia-container-toolkit:

FROM nvidia/cuda:11.0-base
RUN apt-get update && apt-get install -y --no-install-recommends \
    nvidia-container-toolkit
CMD ["nvidia-smi"]

The k8s-device-plugin code snippet shows a Kubernetes DaemonSet configuration for deploying the NVIDIA device plugin, while the nvidia-container-toolkit snippet demonstrates how to include the toolkit in a Docker container.

Both repositories aim to enable GPU support in containerized environments, but k8s-device-plugin is tailored for Kubernetes, offering more advanced features and integration. nvidia-container-toolkit provides a more general-purpose solution that can be used in various container runtimes, including Docker and Kubernetes.

gpu-operator

2,179

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

Pros of gpu-operator

Provides a complete solution for GPU management in Kubernetes clusters
Automates driver installation and updates across nodes
Simplifies GPU resource allocation and monitoring

Cons of gpu-operator

Requires more resources and overhead compared to nvidia-container-toolkit
May be overkill for simple GPU setups or single-node environments
Less flexibility for custom configurations

Code Comparison

gpu-operator (Helm chart values):

operator:
  defaultRuntime: containerd
  driver:
    enabled: true
    version: "470.82.01"
  toolkit:
    enabled: true

nvidia-container-toolkit (Docker runtime configuration):

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

The gpu-operator uses a Helm chart for deployment and configuration, while nvidia-container-toolkit requires manual runtime configuration. The gpu-operator provides a more comprehensive and automated approach to GPU management in Kubernetes, whereas nvidia-container-toolkit offers a lightweight solution for Docker environments.

containerd

18,858

An open and reliable container runtime

Pros of containerd

More widely adopted and supported across various container ecosystems
Designed as a general-purpose container runtime, offering broader compatibility
Actively developed with frequent updates and improvements

Cons of containerd

Lacks built-in GPU support for NVIDIA hardware
Requires additional configuration and plugins for GPU-accelerated workloads
May have a steeper learning curve for users primarily focused on GPU containers

Code Comparison

nvidia-container-toolkit:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

containerd:

ctr run --runtime=io.containerd.runc.v2 --nvidia-gpu=all docker.io/nvidia/cuda:11.0-base cuda nvidia-smi

Summary

containerd is a more versatile and widely adopted container runtime, suitable for various container workloads. However, it requires additional setup for GPU support. The nvidia-container-toolkit is specifically designed for NVIDIA GPU integration, offering a more streamlined experience for GPU-accelerated containers but with a narrower focus.

Both projects serve different purposes, with containerd being a general-purpose runtime and nvidia-container-toolkit specializing in GPU support for container environments.

moby

70,416

The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems

Pros of moby

Broader scope and functionality as a complete container platform
Larger community and ecosystem for support and contributions
More extensive documentation and resources for developers

Cons of moby

Higher complexity and steeper learning curve
May include unnecessary features for users only needing basic containerization
Potentially higher resource overhead due to its comprehensive nature

Code Comparison

nvidia-container-toolkit:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

moby:

docker run -it --rm ubuntu:latest /bin/bash

Key Differences

nvidia-container-toolkit focuses specifically on enabling GPU support for containers, while moby serves as a comprehensive container platform. nvidia-container-toolkit is essential for GPU-accelerated workloads, whereas moby provides a broader range of containerization features.

nvidia-container-toolkit is more specialized and easier to use for GPU-related tasks, while moby offers greater flexibility and a wider range of containerization options. The choice between them depends on whether GPU support is a primary requirement or if a more general-purpose container solution is needed.

kubernetes

115,996

Production-Grade Container Scheduling and Management

Pros of kubernetes

Comprehensive container orchestration platform for managing large-scale applications
Extensive ecosystem with wide community support and numerous integrations
Built-in features for scaling, load balancing, and self-healing

Cons of kubernetes

Steeper learning curve and more complex setup compared to nvidia-container-toolkit
Requires more resources and overhead for small-scale deployments
May be overkill for simple containerized GPU workloads

Code comparison

kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gpu-app
spec:
  containers:
  - name: gpu-container
    image: gpu-app:latest
    resources:
      limits:
        nvidia.com/gpu: 1

nvidia-container-toolkit:

docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

The kubernetes example shows a deployment configuration for a GPU-enabled container, while the nvidia-container-toolkit example demonstrates a simple Docker command to run a GPU-enabled container.

kubernetes offers a more comprehensive solution for orchestrating containerized applications, including those with GPU requirements. nvidia-container-toolkit, on the other hand, provides a simpler approach for running GPU-enabled containers directly with Docker, making it more suitable for smaller-scale or development environments.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

NVIDIA Container Toolkit

nvidia-container-stack

Introduction

The NVIDIA Container Toolkit allows users to build and run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPUs.

Product documentation including an architecture overview, platform support, and installation and usage guides can be found in the documentation repository.

Getting Started

Make sure you have installed the NVIDIA driver for your Linux Distribution Note that you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver needs to be installed

For instructions on getting started with the NVIDIA Container Toolkit, refer to the installation guide.

Usage

The user guide provides information on the configuration and command line options available when running GPU containers with Docker.

Issues and Contributing

Checkout the Contributing document!

Please let us know by filing a new issue
You can contribute by creating a pull request to our public GitHub repository

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot