Convert Figma logo to code with AI

ROCm logohip

HIP: C++ Heterogeneous-Compute Interface for Portability

4,171
575
4,171
35

Top Related Projects

Samples for Intel® oneAPI Toolkits

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

5,740

AMD ROCm™ Software - GitHub Home

Quick Overview

HIP (Heterogeneous-Compute Interface for Portability) is an open-source C++ runtime API and kernel language that allows developers to create portable applications for AMD and NVIDIA GPUs. It provides a way to write code that can run on both AMD ROCm and NVIDIA CUDA platforms, enabling easier migration between GPU architectures.

Pros

  • Portability between AMD and NVIDIA GPUs
  • Simplified code migration from CUDA to HIP
  • Open-source and actively maintained by AMD
  • Supports a wide range of GPU computing applications

Cons

  • Performance may not always match native CUDA or ROCm implementations
  • Limited support for some advanced CUDA features
  • Learning curve for developers familiar with only one platform
  • Ecosystem and community support still growing compared to CUDA

Code Examples

  1. Vector Addition:
__global__ void vectorAdd(float* a, float* b, float* c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) {
        c[i] = a[i] + b[i];
    }
}

// Host code
hipLaunchKernelGGL(vectorAdd, dim3(gridSize), dim3(blockSize), 0, 0, d_a, d_b, d_c, n);
  1. Matrix Multiplication:
__global__ void matrixMul(float* A, float* B, float* C, int width) {
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;
    float sum = 0.0f;
    for (int i = 0; i < width; ++i) {
        sum += A[row * width + i] * B[i * width + col];
    }
    C[row * width + col] = sum;
}

// Host code
hipLaunchKernelGGL(matrixMul, dim3(gridSize), dim3(blockSize), 0, 0, d_A, d_B, d_C, width);
  1. Device Memory Allocation and Copy:
float* h_data = new float[size];
float* d_data;
hipMalloc(&d_data, size * sizeof(float));
hipMemcpy(d_data, h_data, size * sizeof(float), hipMemcpyHostToDevice);

// After computation
hipMemcpy(h_data, d_data, size * sizeof(float), hipMemcpyDeviceToHost);
hipFree(d_data);
delete[] h_data;

Getting Started

  1. Install ROCm (for AMD GPUs) or CUDA (for NVIDIA GPUs)
  2. Clone the HIP repository:
    git clone https://github.com/ROCm-Developer-Tools/HIP.git
    
  3. Build and install HIP:
    cd HIP
    mkdir build && cd build
    cmake ..
    make -j$(nproc)
    sudo make install
    
  4. Set up environment variables:
    export HIP_PLATFORM=hcc  # For AMD GPUs
    export HIP_PLATFORM=nvcc # For NVIDIA GPUs
    
  5. Compile your HIP program:
    hipcc your_program.cpp -o your_program
    

Competitor Comparisons

Samples for Intel® oneAPI Toolkits

Pros of oneAPI-samples

  • Broader scope covering multiple hardware architectures (CPU, GPU, FPGA)
  • More comprehensive examples and tutorials for various domains
  • Active development with frequent updates and community engagement

Cons of oneAPI-samples

  • Steeper learning curve due to the wide range of topics covered
  • Potentially overwhelming for developers focused solely on GPU programming
  • Less specialized for specific GPU architectures compared to HIP

Code Comparison

HIP (ROCm/hip):

#include <hip/hip_runtime.h>

__global__ void vectorAdd(float* a, float* b, float* c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}

oneAPI (oneAPI-samples):

#include <CL/sycl.hpp>

void vectorAdd(queue& q, float* a, float* b, float* c, int n) {
    q.parallel_for(range<1>(n), [=](id<1> i) {
        c[i] = a[i] + b[i];
    });
}

The HIP code uses CUDA-like syntax, while oneAPI uses SYCL for cross-platform compatibility. HIP is more GPU-specific, whereas oneAPI abstracts hardware details for broader compatibility across different architectures.

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

Pros of compute-runtime

  • Broader hardware support for Intel GPUs and integrated graphics
  • More extensive documentation and developer resources
  • Tighter integration with Intel's oneAPI toolkit

Cons of compute-runtime

  • Limited to Intel hardware, less cross-platform compatibility
  • Smaller community and ecosystem compared to HIP
  • Less mature for high-performance computing workloads

Code Comparison

HIP code example:

#include <hip/hip_runtime.h>

__global__ void vectorAdd(float *a, float *b, float *c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}

compute-runtime (OpenCL) code example:

#include <CL/cl.h>

const char* kernelSource = 
"__kernel void vectorAdd(__global float *a, __global float *b, __global float *c, int n) {"
"    int i = get_global_id(0);"
"    if (i < n) c[i] = a[i] + b[i];"
"}";

Both repositories aim to provide GPU acceleration capabilities, but they target different hardware ecosystems. HIP focuses on AMD GPUs and provides a CUDA-like programming model, while compute-runtime is tailored for Intel GPUs and uses OpenCL. HIP offers better cross-platform compatibility between AMD and NVIDIA GPUs, whereas compute-runtime provides deeper integration with Intel's hardware and software stack.

5,740

AMD ROCm™ Software - GitHub Home

Pros of ROCm

  • Comprehensive GPU computing ecosystem with drivers, libraries, and tools
  • Supports a wider range of AMD GPUs and provides more extensive functionality
  • Offers better integration with machine learning frameworks and HPC applications

Cons of ROCm

  • Larger and more complex codebase, potentially harder to navigate
  • May have a steeper learning curve for developers new to GPU computing
  • Requires more system resources and setup time compared to HIP alone

Code Comparison

ROCm (using rocBLAS):

#include <rocblas.h>

rocblas_handle handle;
rocblas_create_handle(&handle);
rocblas_dgemm(handle, rocblas_operation_none, rocblas_operation_none,
              m, n, k, &alpha, A, lda, B, ldb, &beta, C, ldc);
rocblas_destroy_handle(handle);

HIP:

#include <hip/hip_runtime.h>

hipLaunchKernelGGL(matrixMultiply, dim3(gridSize), dim3(blockSize), 0, 0,
                   A, B, C, m, n, k);
hipDeviceSynchronize();

The ROCm example showcases the use of a high-level library (rocBLAS) for matrix multiplication, while the HIP example demonstrates a lower-level kernel launch for a custom matrix multiplication implementation.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

HIP

[!CAUTION] The hip repository is retired, please use the ROCm/rocm-systems repository for development. This develop branch will only accept patch updates from a bot that mirrors hip-specific updates from rocm-systems into here.

HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.