Top Related Projects
NVIDIA device plugin for Kubernetes
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
An open and reliable container runtime
The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
Production-Grade Container Scheduling and Management
Quick Overview
The NVIDIA Container Toolkit is a set of tools and libraries that enable GPU support for Docker containers. It allows users to build and run GPU-accelerated Docker containers, making it easier to deploy and manage GPU-enabled applications in containerized environments.
Pros
- Seamless integration of NVIDIA GPUs with Docker containers
- Supports various NVIDIA GPU architectures and driver versions
- Enables easy deployment of GPU-accelerated applications in containerized environments
- Provides fine-grained control over GPU resources allocation
Cons
- Limited to NVIDIA GPUs only
- Requires additional setup and configuration compared to standard Docker containers
- May introduce compatibility issues with certain applications or frameworks
- Performance overhead due to virtualization layer (though minimal in most cases)
Code Examples
- Running a GPU-enabled container:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
This command runs a container with the CUDA 11.0 base image and executes the nvidia-smi
command to display GPU information.
- Specifying GPU devices:
docker run --gpus '"device=0,1"' nvidia/cuda:11.0-base nvidia-smi
This example runs a container using only GPUs 0 and 1.
- Setting GPU memory limits:
docker run --gpus all,capabilities=utility,compute,memory,graphics,video --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all -e NVIDIA_REQUIRE_CUDA="cuda>=11.0" nvidia/cuda:11.0-base nvidia-smi
This command runs a container with full GPU capabilities and sets specific CUDA version requirements.
Getting Started
- Install the NVIDIA Container Toolkit:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
- Restart the Docker daemon:
sudo systemctl restart docker
- Test the installation:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
This should display information about your NVIDIA GPUs, confirming that the toolkit is working correctly.
Competitor Comparisons
NVIDIA device plugin for Kubernetes
Pros of k8s-device-plugin
- Specifically designed for Kubernetes environments
- Simplifies GPU allocation in Kubernetes clusters
- Supports advanced features like GPU sharing and MIG
Cons of k8s-device-plugin
- Limited to Kubernetes environments
- Requires additional setup compared to nvidia-container-toolkit
- May have a steeper learning curve for non-Kubernetes users
Code Comparison
k8s-device-plugin:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
nvidia-container-toolkit:
FROM nvidia/cuda:11.0-base
RUN apt-get update && apt-get install -y --no-install-recommends \
nvidia-container-toolkit
CMD ["nvidia-smi"]
The k8s-device-plugin code snippet shows a Kubernetes DaemonSet configuration for deploying the NVIDIA device plugin, while the nvidia-container-toolkit snippet demonstrates how to include the toolkit in a Docker container.
Both repositories aim to enable GPU support in containerized environments, but k8s-device-plugin is tailored for Kubernetes, offering more advanced features and integration. nvidia-container-toolkit provides a more general-purpose solution that can be used in various container runtimes, including Docker and Kubernetes.
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Pros of gpu-operator
- Provides a complete solution for GPU management in Kubernetes clusters
- Automates driver installation and updates across nodes
- Simplifies GPU resource allocation and monitoring
Cons of gpu-operator
- Requires more resources and overhead compared to nvidia-container-toolkit
- May be overkill for simple GPU setups or single-node environments
- Less flexibility for custom configurations
Code Comparison
gpu-operator (Helm chart values):
operator:
defaultRuntime: containerd
driver:
enabled: true
version: "470.82.01"
toolkit:
enabled: true
nvidia-container-toolkit (Docker runtime configuration):
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
The gpu-operator uses a Helm chart for deployment and configuration, while nvidia-container-toolkit requires manual runtime configuration. The gpu-operator provides a more comprehensive and automated approach to GPU management in Kubernetes, whereas nvidia-container-toolkit offers a lightweight solution for Docker environments.
An open and reliable container runtime
Pros of containerd
- More widely adopted and supported across various container ecosystems
- Designed as a general-purpose container runtime, offering broader compatibility
- Actively developed with frequent updates and improvements
Cons of containerd
- Lacks built-in GPU support for NVIDIA hardware
- Requires additional configuration and plugins for GPU-accelerated workloads
- May have a steeper learning curve for users primarily focused on GPU containers
Code Comparison
nvidia-container-toolkit:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
containerd:
ctr run --runtime=io.containerd.runc.v2 --nvidia-gpu=all docker.io/nvidia/cuda:11.0-base cuda nvidia-smi
Summary
containerd is a more versatile and widely adopted container runtime, suitable for various container workloads. However, it requires additional setup for GPU support. The nvidia-container-toolkit is specifically designed for NVIDIA GPU integration, offering a more streamlined experience for GPU-accelerated containers but with a narrower focus.
Both projects serve different purposes, with containerd being a general-purpose runtime and nvidia-container-toolkit specializing in GPU support for container environments.
The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
Pros of moby
- Broader scope and functionality as a complete container platform
- Larger community and ecosystem for support and contributions
- More extensive documentation and resources for developers
Cons of moby
- Higher complexity and steeper learning curve
- May include unnecessary features for users only needing basic containerization
- Potentially higher resource overhead due to its comprehensive nature
Code Comparison
nvidia-container-toolkit:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
moby:
docker run -it --rm ubuntu:latest /bin/bash
Key Differences
nvidia-container-toolkit focuses specifically on enabling GPU support for containers, while moby serves as a comprehensive container platform. nvidia-container-toolkit is essential for GPU-accelerated workloads, whereas moby provides a broader range of containerization features.
nvidia-container-toolkit is more specialized and easier to use for GPU-related tasks, while moby offers greater flexibility and a wider range of containerization options. The choice between them depends on whether GPU support is a primary requirement or if a more general-purpose container solution is needed.
Production-Grade Container Scheduling and Management
Pros of kubernetes
- Comprehensive container orchestration platform for managing large-scale applications
- Extensive ecosystem with wide community support and numerous integrations
- Built-in features for scaling, load balancing, and self-healing
Cons of kubernetes
- Steeper learning curve and more complex setup compared to nvidia-container-toolkit
- Requires more resources and overhead for small-scale deployments
- May be overkill for simple containerized GPU workloads
Code comparison
kubernetes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-app
spec:
containers:
- name: gpu-container
image: gpu-app:latest
resources:
limits:
nvidia.com/gpu: 1
nvidia-container-toolkit:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
The kubernetes example shows a deployment configuration for a GPU-enabled container, while the nvidia-container-toolkit example demonstrates a simple Docker command to run a GPU-enabled container.
kubernetes offers a more comprehensive solution for orchestrating containerized applications, including those with GPU requirements. nvidia-container-toolkit, on the other hand, provides a simpler approach for running GPU-enabled containers directly with Docker, making it more suitable for smaller-scale or development environments.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
NVIDIA Container Toolkit
Introduction
The NVIDIA Container Toolkit allows users to build and run GPU accelerated containers. The toolkit includes a container runtime library and utilities to automatically configure containers to leverage NVIDIA GPUs.
Product documentation including an architecture overview, platform support, and installation and usage guides can be found in the documentation repository.
Getting Started
Make sure you have installed the NVIDIA driver for your Linux Distribution Note that you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver needs to be installed
For instructions on getting started with the NVIDIA Container Toolkit, refer to the installation guide.
Usage
The user guide provides information on the configuration and command line options available when running GPU containers with Docker.
Issues and Contributing
Checkout the Contributing document!
- Please let us know by filing a new issue
- You can contribute by creating a merge request to our public GitLab repository
Top Related Projects
NVIDIA device plugin for Kubernetes
NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
An open and reliable container runtime
The Moby Project - a collaborative project for the container ecosystem to assemble container-based systems
Production-Grade Container Scheduling and Management
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot