hip

HIP: C++ Heterogeneous-Compute Interface for Portability

4,171

575

4,171

View on GitHub

Top Related Projects

compute-runtime

1,275

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

Quick Overview

HIP (Heterogeneous-Compute Interface for Portability) is an open-source C++ runtime API and kernel language that allows developers to create portable applications for AMD and NVIDIA GPUs. It provides a way to write code that can run on both AMD ROCm and NVIDIA CUDA platforms, enabling easier migration between GPU architectures.

Pros

Portability between AMD and NVIDIA GPUs
Simplified code migration from CUDA to HIP
Open-source and actively maintained by AMD
Supports a wide range of GPU computing applications

Cons

Performance may not always match native CUDA or ROCm implementations
Limited support for some advanced CUDA features
Learning curve for developers familiar with only one platform
Ecosystem and community support still growing compared to CUDA

Code Examples

Vector Addition:

__global__ void vectorAdd(float* a, float* b, float* c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) {
        c[i] = a[i] + b[i];
    }
}

// Host code
hipLaunchKernelGGL(vectorAdd, dim3(gridSize), dim3(blockSize), 0, 0, d_a, d_b, d_c, n);

Matrix Multiplication:

__global__ void matrixMul(float* A, float* B, float* C, int width) {
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;
    float sum = 0.0f;
    for (int i = 0; i < width; ++i) {
        sum += A[row * width + i] * B[i * width + col];
    }
    C[row * width + col] = sum;
}

// Host code
hipLaunchKernelGGL(matrixMul, dim3(gridSize), dim3(blockSize), 0, 0, d_A, d_B, d_C, width);

Device Memory Allocation and Copy:

float* h_data = new float[size];
float* d_data;
hipMalloc(&d_data, size * sizeof(float));
hipMemcpy(d_data, h_data, size * sizeof(float), hipMemcpyHostToDevice);

// After computation
hipMemcpy(h_data, d_data, size * sizeof(float), hipMemcpyDeviceToHost);
hipFree(d_data);
delete[] h_data;

Getting Started

Install ROCm (for AMD GPUs) or CUDA (for NVIDIA GPUs)

Clone the HIP repository:

git clone https://github.com/ROCm-Developer-Tools/HIP.git

Build and install HIP:

cd HIP
mkdir build && cd build
cmake ..
make -j$(nproc)
sudo make install

Set up environment variables:

export HIP_PLATFORM=hcc  # For AMD GPUs
export HIP_PLATFORM=nvcc # For NVIDIA GPUs

Compile your HIP program:
```
hipcc your_program.cpp -o your_program
```

Competitor Comparisons

oneAPI-samples

1,080

Samples for Intel® oneAPI Toolkits

Pros of oneAPI-samples

Broader scope covering multiple hardware architectures (CPU, GPU, FPGA)
More comprehensive examples and tutorials for various domains
Active development with frequent updates and community engagement

Cons of oneAPI-samples

Steeper learning curve due to the wide range of topics covered
Potentially overwhelming for developers focused solely on GPU programming
Less specialized for specific GPU architectures compared to HIP

Code Comparison

HIP (ROCm/hip):

#include <hip/hip_runtime.h>

__global__ void vectorAdd(float* a, float* b, float* c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}

oneAPI (oneAPI-samples):

#include <CL/sycl.hpp>

void vectorAdd(queue& q, float* a, float* b, float* c, int n) {
    q.parallel_for(range<1>(n), [=](id<1> i) {
        c[i] = a[i] + b[i];
    });
}

The HIP code uses CUDA-like syntax, while oneAPI uses SYCL for cross-platform compatibility. HIP is more GPU-specific, whereas oneAPI abstracts hardware details for broader compatibility across different architectures.

compute-runtime

1,275

Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver

Pros of compute-runtime

Broader hardware support for Intel GPUs and integrated graphics
More extensive documentation and developer resources
Tighter integration with Intel's oneAPI toolkit

Cons of compute-runtime

Limited to Intel hardware, less cross-platform compatibility
Smaller community and ecosystem compared to HIP
Less mature for high-performance computing workloads

Code Comparison

HIP code example:

#include <hip/hip_runtime.h>

__global__ void vectorAdd(float *a, float *b, float *c, int n) {
    int i = blockDim.x * blockIdx.x + threadIdx.x;
    if (i < n) c[i] = a[i] + b[i];
}

compute-runtime (OpenCL) code example:

#include <CL/cl.h>

const char* kernelSource = 
"__kernel void vectorAdd(__global float *a, __global float *b, __global float *c, int n) {"
"    int i = get_global_id(0);"
"    if (i < n) c[i] = a[i] + b[i];"
"}";

Both repositories aim to provide GPU acceleration capabilities, but they target different hardware ecosystems. HIP focuses on AMD GPUs and provides a CUDA-like programming model, while compute-runtime is tailored for Intel GPUs and uses OpenCL. HIP offers better cross-platform compatibility between AMD and NVIDIA GPUs, whereas compute-runtime provides deeper integration with Intel's hardware and software stack.

ROCm

5,740

AMD ROCm™ Software - GitHub Home

Pros of ROCm

Comprehensive GPU computing ecosystem with drivers, libraries, and tools
Supports a wider range of AMD GPUs and provides more extensive functionality
Offers better integration with machine learning frameworks and HPC applications

Cons of ROCm

Larger and more complex codebase, potentially harder to navigate
May have a steeper learning curve for developers new to GPU computing
Requires more system resources and setup time compared to HIP alone

Code Comparison

ROCm (using rocBLAS):

#include <rocblas.h>

rocblas_handle handle;
rocblas_create_handle(&handle);
rocblas_dgemm(handle, rocblas_operation_none, rocblas_operation_none,
              m, n, k, &alpha, A, lda, B, ldb, &beta, C, ldc);
rocblas_destroy_handle(handle);

HIP:

#include <hip/hip_runtime.h>

hipLaunchKernelGGL(matrixMultiply, dim3(gridSize), dim3(blockSize), 0, 0,
                   A, B, C, m, n, k);
hipDeviceSynchronize();

The ROCm example showcases the use of a high-level library (rocBLAS) for matrix multiplication, while the HIP example demonstrates a lower-level kernel launch for a custom matrix multiplication implementation.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

HIP

[!CAUTION] The hip repository is retired, please use the ROCm/rocm-systems repository for development. This develop branch will only accept patch updates from a bot that mirrors hip-specific updates from rocm-systems into here.

HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot