Top Related Projects
NVIDIA device plugin for Kubernetes
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
Build and run containers leveraging NVIDIA GPUs
CLI tool for spawning and running containers according to the OCI specification
An open and reliable container runtime
Quick Overview
NVIDIA/nvidia-docker is a project that enables GPU acceleration for Docker containers. It provides a set of tools and runtime libraries that allow Docker containers to leverage NVIDIA GPUs for compute-intensive tasks, such as machine learning and scientific computing.
Pros
- Seamless integration of NVIDIA GPUs with Docker containers
- Improved performance for GPU-accelerated applications in containerized environments
- Simplified deployment and management of GPU-enabled applications
- Supports a wide range of NVIDIA GPU architectures
Cons
- Limited to NVIDIA GPUs only, not compatible with other GPU manufacturers
- Requires additional setup and configuration compared to standard Docker installations
- May introduce compatibility issues with certain Docker features or third-party tools
- Potential performance overhead in some scenarios compared to bare-metal GPU usage
Getting Started
To get started with NVIDIA Docker, follow these steps:
- Install the NVIDIA GPU driver on your host system.
- Install Docker on your system.
- Install the NVIDIA Container Toolkit:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
- Restart the Docker daemon:
sudo systemctl restart docker
- Test the installation by running a sample CUDA container:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
This should display information about your NVIDIA GPU(s) if the installation was successful.
Competitor Comparisons
NVIDIA device plugin for Kubernetes
Pros of k8s-device-plugin
- Native Kubernetes integration for GPU resource management
- Supports multi-node GPU clusters and advanced scheduling features
- Easier to scale and manage in large Kubernetes environments
Cons of k8s-device-plugin
- Limited to Kubernetes environments, less flexible for standalone Docker use
- May require more complex setup and configuration compared to nvidia-docker
- Potential learning curve for teams not familiar with Kubernetes
Code Comparison
k8s-device-plugin:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
nvidia-docker:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
The k8s-device-plugin uses Kubernetes manifests to deploy and manage GPU resources, while nvidia-docker relies on Docker CLI commands with GPU options. This reflects the different approaches and use cases of the two projects, with k8s-device-plugin being more tightly integrated into Kubernetes ecosystems and nvidia-docker offering a simpler solution for Docker-based workflows.
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
Pros of gpu-operator
- Provides a more comprehensive Kubernetes-native solution for GPU management
- Automates driver installation and updates across the cluster
- Simplifies GPU resource allocation and monitoring in Kubernetes environments
Cons of gpu-operator
- Requires Kubernetes, which may not be suitable for all deployment scenarios
- Has a steeper learning curve for users unfamiliar with Kubernetes operators
- May introduce additional complexity in simpler GPU setups
Code comparison
gpu-operator (using Helm):
helm repo add nvidia https://nvidia.github.io/gpu-operator
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator
nvidia-docker:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-docker2
The gpu-operator is designed for Kubernetes environments, offering a more integrated solution for GPU management in containerized workloads. It automates many aspects of GPU setup and maintenance across the cluster. However, it requires Kubernetes and may be overkill for simpler setups.
nvidia-docker, on the other hand, is more straightforward to set up and use in non-Kubernetes environments. It's ideal for single-host Docker deployments but lacks the advanced cluster-wide management features of gpu-operator.
Build and run containers leveraging NVIDIA GPUs
Pros of nvidia-container-toolkit
- More lightweight and flexible approach to GPU support in containers
- Better integration with container runtimes like containerd and CRI-O
- Supports a wider range of NVIDIA GPU architectures
Cons of nvidia-container-toolkit
- Requires more manual configuration compared to nvidia-docker
- May have a steeper learning curve for users familiar with nvidia-docker
Code Comparison
nvidia-docker:
FROM nvidia/cuda:11.0-base
COPY --from=nvidia/cuda:11.0-runtime /usr/local/cuda/lib64/libcudart.so.11.0 /usr/local/cuda/lib64/libcudart.so.11.0
nvidia-container-toolkit:
FROM ubuntu:20.04
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
RUN apt-get update && apt-get install -y --no-install-recommends cuda-toolkit-11-0
The nvidia-container-toolkit approach allows for more granular control over GPU capabilities and doesn't require specific NVIDIA base images. It provides a more flexible setup, especially when working with custom images or non-CUDA workloads that still require GPU access.
CLI tool for spawning and running containers according to the OCI specification
Pros of runc
- More widely adopted and supported across different container runtimes
- Lightweight and focused on core container execution functionality
- Part of the Open Container Initiative (OCI), ensuring standardization
Cons of runc
- Lacks native GPU support for NVIDIA hardware
- Requires additional configuration for GPU-accelerated workloads
- May not provide optimal performance for GPU-intensive applications
Code Comparison
runc:
spec, err := loadSpec(context)
if err != nil {
return err
}
status, err := startContainer(context, spec)
nvidia-docker:
docker run --runtime=nvidia \
--gpus all \
nvidia/cuda:11.0-base \
nvidia-smi
Summary
runc is a general-purpose container runtime that adheres to OCI standards, while nvidia-docker is specifically designed for GPU-accelerated containers using NVIDIA hardware. runc offers broader compatibility and standardization, but nvidia-docker provides seamless integration with NVIDIA GPUs and optimized performance for GPU workloads. The choice between them depends on the specific requirements of your containerized applications and infrastructure.
An open and reliable container runtime
Pros of containerd
- More general-purpose container runtime, supporting multiple container formats
- Widely adopted as the default runtime for Kubernetes and Docker
- Active development with frequent updates and improvements
Cons of containerd
- Lacks built-in GPU support for NVIDIA hardware
- Requires additional configuration for GPU-accelerated containers
- May have a steeper learning curve for users familiar with Docker
Code comparison
nvidia-docker:
docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
containerd:
ctr run --runtime=io.containerd.runc.v2 --nvidia-gpu-device=all docker.io/nvidia/cuda:11.0-base cuda nvidia-smi
Summary
containerd is a more versatile container runtime with broader industry adoption, while nvidia-docker provides a simpler solution for GPU-accelerated containers on NVIDIA hardware. containerd requires additional setup for GPU support, but offers greater flexibility and is better suited for complex container orchestration scenarios. nvidia-docker, on the other hand, provides out-of-the-box GPU support for Docker containers, making it easier to use for GPU-intensive workloads on NVIDIA GPUs.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DEPRECATION NOTICE
This project has been superseded by the NVIDIA Container Toolkit.
The tooling provided by this repository has been deprecated and the repository archived.
The nvidia-docker
wrapper is no longer supported, and the NVIDIA Container Toolkit has been extended
to allow users to configure Docker to use the NVIDIA Container Runtime.
For further instructions, see the NVIDIA Container Toolkit documentation and specifically the install guide.
Issues and Contributing
Checkout the Contributing document!
- For questions, feature requests, or bugs, open an issue against the
nvidia-container-toolkit
repository.
Top Related Projects
NVIDIA device plugin for Kubernetes
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
Build and run containers leveraging NVIDIA GPUs
CLI tool for spawning and running containers according to the OCI specification
An open and reliable container runtime
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot