CUDALibrarySamples

CUDA Library Samples

2,115

410

2,115

View on GitHub

Top Related Projects

cuda-samples

8,186

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

DirectX-Graphics-Samples

6,544

This repo contains the DirectX Graphics samples that demonstrate how to build graphics intensive applications on Windows.

Quick Overview

NVIDIA/CUDALibrarySamples is a GitHub repository containing sample code demonstrating the usage of various CUDA libraries. It provides developers with practical examples and best practices for leveraging NVIDIA's GPU-accelerated libraries in their applications, covering areas such as linear algebra, signal processing, and machine learning.

Pros

Comprehensive collection of samples for multiple CUDA libraries
Well-documented code with explanations and comments
Regularly updated to reflect the latest CUDA library versions
Serves as a valuable learning resource for GPU programming

Cons

Requires NVIDIA GPU hardware for execution
Some samples may be complex for beginners
Limited to CUDA-specific implementations, not applicable to other GPU platforms
May require significant computational resources for certain examples

Code Examples

cuBLAS matrix multiplication:

#include <cuda_runtime.h>
#include <cublas_v2.h>

int main() {
    cublasHandle_t handle;
    cublasCreate(&handle);

    float *d_A, *d_B, *d_C;
    int m = 1000, n = 1000, k = 1000;
    
    cudaMalloc(&d_A, m * k * sizeof(float));
    cudaMalloc(&d_B, k * n * sizeof(float));
    cudaMalloc(&d_C, m * n * sizeof(float));

    float alpha = 1.0f, beta = 0.0f;
    cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, &alpha, d_A, m, d_B, k, &beta, d_C, m);

    cublasDestroy(handle);
    return 0;
}

cuFFT 1D FFT:

#include <cufft.h>

int main() {
    cufftHandle plan;
    cufftComplex *d_input, *d_output;
    int n = 1024;

    cudaMalloc(&d_input, n * sizeof(cufftComplex));
    cudaMalloc(&d_output, n * sizeof(cufftComplex));

    cufftPlan1d(&plan, n, CUFFT_C2C, 1);
    cufftExecC2C(plan, d_input, d_output, CUFFT_FORWARD);

    cufftDestroy(plan);
    return 0;
}

cuDNN convolution:

#include <cudnn.h>

int main() {
    cudnnHandle_t cudnn;
    cudnnCreate(&cudnn);

    cudnnTensorDescriptor_t input_descriptor;
    cudnnFilterDescriptor_t kernel_descriptor;
    cudnnConvolutionDescriptor_t convolution_descriptor;
    cudnnTensorDescriptor_t output_descriptor;

    cudnnCreateTensorDescriptor(&input_descriptor);
    cudnnCreateFilterDescriptor(&kernel_descriptor);
    cudnnCreateConvolutionDescriptor(&convolution_descriptor);
    cudnnCreateTensorDescriptor(&output_descriptor);

    // Set up descriptors...

    cudnnConvolutionForward(cudnn, &alpha, input_descriptor, d_input, kernel_descriptor, d_kernel,
                            convolution_descriptor, CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM,
                            d_workspace, workspace_size, &beta, output_descriptor, d_output);

    cudnnDestroy(cudnn);
    return 0;
}

Getting Started

Clone the repository:

git clone https://github.com/NVIDIA/CUDALibrarySamples.git

Install CUDA Toolkit and required libraries (cuBLAS, cuFFT, cuDNN, etc.)
Navigate to a specific sample directory:
```
cd CUDALibrarySamples/cuBLAS/Level-1
```
Build and run the sample:

Competitor Comparisons

cuda-samples

8,186

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Pros of cuda-samples

More comprehensive coverage of CUDA features and techniques
Includes a wider range of application domains and use cases
Better suited for beginners learning CUDA programming

Cons of cuda-samples

Larger repository size, potentially overwhelming for some users
Some samples may be outdated or less relevant for modern CUDA development
Less focused on specific CUDA libraries compared to CUDALibrarySamples

Code Comparison

CUDALibrarySamples (cuBLAS example):

cublasHandle_t handle;
cublasCreate(&handle);
cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, m, n, k, &alpha, d_A, m, d_B, k, &beta, d_C, m);
cublasDestroy(handle);

cuda-samples (vector addition example):

__global__ void vectorAdd(const float *A, const float *B, float *C, int numElements) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < numElements) {
        C[i] = A[i] + B[i];
    }
}

CUDALibrarySamples focuses on demonstrating the usage of specific CUDA libraries, while cuda-samples provides a broader range of CUDA programming examples and techniques.

oneAPI-samples

1,080

Samples for Intel® oneAPI Toolkits

Pros of oneAPI-samples

Cross-platform support for various hardware accelerators (CPUs, GPUs, FPGAs)
Broader range of sample applications covering diverse domains
Extensive documentation and tutorials for learning oneAPI

Cons of oneAPI-samples

Less mature ecosystem compared to CUDA
Potentially lower performance on NVIDIA GPUs
Smaller community and fewer third-party libraries

Code Comparison

oneAPI-samples (DPC++ example):

sycl::queue q;
sycl::buffer<int> buf(data, sycl::range<1>(n));
q.submit([&](sycl::handler& h) {
  auto acc = buf.get_access<sycl::access::mode::read_write>(h);
  h.parallel_for(sycl::range<1>(n), [=](sycl::id<1> i) {
    acc[i] *= 2;
  });
});

CUDALibrarySamples (CUDA example):

int* d_data;
cudaMalloc(&d_data, n * sizeof(int));
cudaMemcpy(d_data, data, n * sizeof(int), cudaMemcpyHostToDevice);
multiplyByTwo<<<(n + 255) / 256, 256>>>(d_data, n);
cudaMemcpy(data, d_data, n * sizeof(int), cudaMemcpyDeviceToHost);
cudaFree(d_data);

Both examples demonstrate basic data parallel operations, but oneAPI uses a higher-level abstraction while CUDA requires more explicit memory management.

DirectX-Graphics-Samples

6,544

This repo contains the DirectX Graphics samples that demonstrate how to build graphics intensive applications on Windows.

Pros of DirectX-Graphics-Samples

More comprehensive documentation and tutorials for beginners
Broader range of graphics techniques and effects demonstrated
Better integration with Windows ecosystem and tools

Cons of DirectX-Graphics-Samples

Limited to DirectX and Windows platforms
Potentially steeper learning curve for those new to graphics programming
Less focus on high-performance computing compared to CUDA samples

Code Comparison

DirectX-Graphics-Samples (D3D12HelloTriangle.cpp):

ComPtr<ID3D12Resource> m_vertexBuffer;
D3D12_VERTEX_BUFFER_VIEW m_vertexBufferView;

const UINT vertexBufferSize = sizeof(triangleVertices);
ThrowIfFailed(m_device->CreateCommittedResource(
    &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
    D3D12_HEAP_FLAG_NONE,
    &CD3DX12_RESOURCE_DESC::Buffer(vertexBufferSize),
    D3D12_RESOURCE_STATE_GENERIC_READ,
    nullptr,
    IID_PPV_ARGS(&m_vertexBuffer)));

CUDALibrarySamples (vectorAdd.cu):

float *h_A, *h_B, *h_C;
float *d_A, *d_B, *d_C;
size_t size = N * sizeof(float);

cudaMalloc((void **)&d_A, size);
cudaMalloc((void **)&d_B, size);
cudaMalloc((void **)&d_C, size);

cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CUDA Library Samples

The CUDA Library Samples repository contains various examples that demonstrate the use of GPU-accelerated libraries in CUDA. These libraries enable high-performance computing in a wide range of applications, including math operations, image processing, signal processing, linear algebra, and compression. The samples included cover:

Math and Image Processing Libraries
cuBLAS (Basic Linear Algebra Subprograms)
cuTENSOR (Tensor Linear Algebra)
cuSPARSE (Sparse Matrix Operations)
cuSOLVER (Dense and Sparse Solvers)
cuFFT (Fast Fourier Transform)
cuRAND (Random Number Generation)
NPP (Image and Video Processing)
nvJPEG (JPEG Encode/Decode)
nvCOMP (Data Compression)
and more...

About

The CUDA Library Samples are provided by NVIDIA Corporation as Open Source software, released under the 3-clause "New" BSD license. These examples showcase how to leverage GPU-accelerated libraries for efficient computation across various fields.

For more information on the available libraries and their uses, visit GPU Accelerated Libraries.

Library Examples

Explore the examples of each CUDA library included in this repository:

Each sample provides a practical use case for how to apply these libraries in real-world scenarios, showcasing the power and flexibility of CUDA for a wide variety of computational needs.

Additional Resources

For more information and documentation on CUDA libraries, please visit:

License

The CUDA Library Samples are distributed under the 3-clause "New" BSD license. For more details, refer to the license terms below:

Copyright

  Redistribution and use in source and binary forms, with or without modification, are permitted
  provided that the following conditions are met:
      * Redistributions of source code must retain the above copyright notice, this list of
        conditions and the following disclaimer.
      * Redistributions in binary form must reproduce the above copyright notice, this list of
        conditions and the following disclaimer in the documentation and/or other materials
        provided with the distribution.
      * Neither the name of the NVIDIA CORPORATION nor the names of its contributors may be used
        to endorse or promote products derived from this software without specific prior written
        permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
  IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
  FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE
  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
  OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
  STRICT LIABILITY, OR TOR (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot