Top Related Projects
Samples for Intel® oneAPI Toolkits
Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
AMD ROCm™ Software - GitHub Home
Quick Overview
HIP (Heterogeneous-Compute Interface for Portability) is an open-source C++ runtime API and kernel language that allows developers to create portable applications for AMD and NVIDIA GPUs. It provides a way to write code that can run on both AMD ROCm and NVIDIA CUDA platforms, enabling easier migration between GPU architectures.
Pros
- Portability between AMD and NVIDIA GPUs
- Simplified code migration from CUDA to HIP
- Open-source and actively maintained by AMD
- Supports a wide range of GPU computing applications
Cons
- Performance may not always match native CUDA or ROCm implementations
- Limited support for some advanced CUDA features
- Learning curve for developers familiar with only one platform
- Ecosystem and community support still growing compared to CUDA
Code Examples
- Vector Addition:
__global__ void vectorAdd(float* a, float* b, float* c, int n) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < n) {
c[i] = a[i] + b[i];
}
}
// Host code
hipLaunchKernelGGL(vectorAdd, dim3(gridSize), dim3(blockSize), 0, 0, d_a, d_b, d_c, n);
- Matrix Multiplication:
__global__ void matrixMul(float* A, float* B, float* C, int width) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
float sum = 0.0f;
for (int i = 0; i < width; ++i) {
sum += A[row * width + i] * B[i * width + col];
}
C[row * width + col] = sum;
}
// Host code
hipLaunchKernelGGL(matrixMul, dim3(gridSize), dim3(blockSize), 0, 0, d_A, d_B, d_C, width);
- Device Memory Allocation and Copy:
float* h_data = new float[size];
float* d_data;
hipMalloc(&d_data, size * sizeof(float));
hipMemcpy(d_data, h_data, size * sizeof(float), hipMemcpyHostToDevice);
// After computation
hipMemcpy(h_data, d_data, size * sizeof(float), hipMemcpyDeviceToHost);
hipFree(d_data);
delete[] h_data;
Getting Started
- Install ROCm (for AMD GPUs) or CUDA (for NVIDIA GPUs)
- Clone the HIP repository:
git clone https://github.com/ROCm-Developer-Tools/HIP.git
- Build and install HIP:
cd HIP mkdir build && cd build cmake .. make -j$(nproc) sudo make install
- Set up environment variables:
export HIP_PLATFORM=hcc # For AMD GPUs export HIP_PLATFORM=nvcc # For NVIDIA GPUs
- Compile your HIP program:
hipcc your_program.cpp -o your_program
Competitor Comparisons
Samples for Intel® oneAPI Toolkits
Pros of oneAPI-samples
- Broader scope covering multiple hardware architectures (CPU, GPU, FPGA)
- More comprehensive examples and tutorials for various domains
- Active development with frequent updates and community engagement
Cons of oneAPI-samples
- Steeper learning curve due to the wide range of topics covered
- Potentially overwhelming for developers focused solely on GPU programming
- Less specialized for specific GPU architectures compared to HIP
Code Comparison
HIP (ROCm/hip):
#include <hip/hip_runtime.h>
__global__ void vectorAdd(float* a, float* b, float* c, int n) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < n) c[i] = a[i] + b[i];
}
oneAPI (oneAPI-samples):
#include <CL/sycl.hpp>
void vectorAdd(queue& q, float* a, float* b, float* c, int n) {
q.parallel_for(range<1>(n), [=](id<1> i) {
c[i] = a[i] + b[i];
});
}
The HIP code uses CUDA-like syntax, while oneAPI uses SYCL for cross-platform compatibility. HIP is more GPU-specific, whereas oneAPI abstracts hardware details for broader compatibility across different architectures.
Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
Pros of compute-runtime
- Broader hardware support for Intel GPUs and integrated graphics
- More extensive documentation and developer resources
- Tighter integration with Intel's oneAPI toolkit
Cons of compute-runtime
- Limited to Intel hardware, less cross-platform compatibility
- Smaller community and ecosystem compared to HIP
- Less mature for high-performance computing workloads
Code Comparison
HIP code example:
#include <hip/hip_runtime.h>
__global__ void vectorAdd(float *a, float *b, float *c, int n) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < n) c[i] = a[i] + b[i];
}
compute-runtime (OpenCL) code example:
#include <CL/cl.h>
const char* kernelSource =
"__kernel void vectorAdd(__global float *a, __global float *b, __global float *c, int n) {"
" int i = get_global_id(0);"
" if (i < n) c[i] = a[i] + b[i];"
"}";
Both repositories aim to provide GPU acceleration capabilities, but they target different hardware ecosystems. HIP focuses on AMD GPUs and provides a CUDA-like programming model, while compute-runtime is tailored for Intel GPUs and uses OpenCL. HIP offers better cross-platform compatibility between AMD and NVIDIA GPUs, whereas compute-runtime provides deeper integration with Intel's hardware and software stack.
AMD ROCm™ Software - GitHub Home
Pros of ROCm
- Comprehensive GPU computing ecosystem with drivers, libraries, and tools
- Supports a wider range of AMD GPUs and provides more extensive functionality
- Offers better integration with machine learning frameworks and HPC applications
Cons of ROCm
- Larger and more complex codebase, potentially harder to navigate
- May have a steeper learning curve for developers new to GPU computing
- Requires more system resources and setup time compared to HIP alone
Code Comparison
ROCm (using rocBLAS):
#include <rocblas.h>
rocblas_handle handle;
rocblas_create_handle(&handle);
rocblas_dgemm(handle, rocblas_operation_none, rocblas_operation_none,
m, n, k, &alpha, A, lda, B, ldb, &beta, C, ldc);
rocblas_destroy_handle(handle);
HIP:
#include <hip/hip_runtime.h>
hipLaunchKernelGGL(matrixMultiply, dim3(gridSize), dim3(blockSize), 0, 0,
A, B, C, m, n, k);
hipDeviceSynchronize();
The ROCm example showcases the use of a high-level library (rocBLAS) for matrix multiplication, while the HIP example demonstrates a lower-level kernel launch for a custom matrix multiplication implementation.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
What is this repository for?
HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.
Key features include:
- HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
- HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
- HIP allows developers to use the "best" development environment and tools on each target platform.
- The HIPIFY tools automatically convert source from CUDA to HIP.
- Developers can specialize for the platform (CUDA or AMD) to tune for performance or handle tricky cases.
New projects can be developed directly in the portable HIP C++ language and can run on either NVIDIA or AMD platforms. Additionally, HIP provides porting tools which make it easy to port existing CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.
[!NOTE] The published documentation is available at HIP documentation in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the
HIP/docs
folder of this GitHub repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see Contribute to ROCm documentation.
DISCLAIMER
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED âAS IS.â AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
© 2023 Advanced Micro Devices, Inc. All Rights Reserved.
Repository branches
The HIP repository maintains several branches. The branches that are of importance are:
- develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable.
- Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest HIP release is rocm-4.3, main branch will be the repository based on this release.
- Release branches. These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.2, rocm-4.3, etc.
Release tagging
HIP releases are typically naming convention for each ROCM release to help differentiate them.
- rocm x.yy: These are the stable releases based on the ROCM release. This type of release is typically made once a month.*
More Info
- Installation
- HIP FAQ
- HIP C++ Language Extensions
- HIP Porting Guide
- HIP Porting Driver Guide
- HIP Programming Guide
- HIP Logging
- Building HIP From Source
- HIP Debugging
- HIP RTC
- HIP Terminology (including Rosetta Stone of GPU computing terms across CUDA/HIP/OpenCL)
- HIPIFY
- Supported CUDA APIs:
- Developer/CONTRIBUTING Info
- Release Notes
How do I get set up?
See the Installation notes.
Simple Example
The HIP API includes functions such as hipMalloc, hipMemcpy, and hipFree. Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernelGGL" macro call. Here is simple example showing a snippet of HIP API code:
hipMalloc(&A_d, Nbytes);
hipMalloc(&C_d, Nbytes);
hipMemcpy(A_d, A_h, Nbytes, hipMemcpyHostToDevice);
const unsigned blocks = 512;
const unsigned threadsPerBlock = 256;
hipLaunchKernelGGL(vector_square, /* compute kernel*/
dim3(blocks), dim3(threadsPerBlock), 0/*dynamic shared*/, 0/*stream*/, /* launch config*/
C_d, A_d, N); /* arguments to the compute kernel */
hipMemcpy(C_h, C_d, Nbytes, hipMemcpyDeviceToHost);
The HIP kernel language defines builtins for determining grid and block coordinates, math functions, short vectors, atomics, and timer functions. It also specifies additional defines and keywords for function types, address spaces, and optimization controls (See the HIP C++ Language Extensions for a full description). Here's an example of defining a simple 'vector_square' kernel.
template <typename T>
__global__ void
vector_square(T *C_d, const T *A_d, size_t N)
{
size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
size_t stride = blockDim.x * gridDim.x;
for (size_t i=offset; i<N; i+=stride) {
C_d[i] = A_d[i] * A_d[i];
}
}
The HIP Runtime API code and compute kernel definition can exist in the same source file - HIP takes care of generating host and device code appropriately.
HIP Portability and Compiler Technology
HIP C++ code can be compiled with either,
- On the NVIDIA CUDA platform, HIP provides header file in the repository hipother which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined functions and thus has very low overhead - developers coding in HIP should expect the same performance as coding in native CUDA. The code is then compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA profiler and debugger.
- On the AMD ROCm platform, HIP provides a header and runtime library built on top of HIP-Clang compiler in the repository Compute Language Runtime (CLR). The HIP runtime implements HIP streams, events, and memory APIs, and is a object library that is linked with the application. The source code for all headers and the library implementation is available on GitHub. HIP developers on ROCm can use AMD's ROCgdb for debugging and profiling.
Thus HIP source code can be compiled to run on either platform. Platform-specific features can be isolated to a specific platform using conditional compilation. Thus HIP provides source portability to either platform. HIP provides the hipcc compiler driver which will call the appropriate toolchain depending on the desired platform.
Examples and Getting Started
-
The ROCm-examples repository includes many examples with explanations that help users getting started with HIP, as well as providing advanced examples for HIP and its libraries.
-
HIP's documentation includes a guide for Porting a New Cuda Project.
Tour of the HIP Directories
-
include:
- hip_runtime_api.h : Defines HIP runtime APIs and can be compiled with many standard Linux compilers (GCC, ICC, CLANG, etc), in either C or C++ mode.
- hip_runtime.h : Includes everything in hip_runtime_api.h PLUS hipLaunchKernelGGL and syntax for writing device kernels and device functions. hip_runtime.h can be compiled using a standard C++ compiler but will expose a subset of the available functions.
- amd_detail/** , nvidia_detail/** : Implementation details for specific platforms. HIP applications should not include these files directly.
-
bin: Tools and scripts to help with hip porting
- hipcc : Compiler driver that can be used to replace nvcc in existing CUDA code. hipcc will call nvcc or HIP-Clang depending on platform and include appropriate platform-specific headers and libraries.
- hipconfig : Print HIP configuration (HIP_PATH, HIP_PLATFORM, HIP_COMPILER, HIP_RUNTIME, CXX config flags, etc.)
-
docs: Documentation - markdown and doxygen info.
Reporting an issue
Use the GitHub issue tracker. If reporting a bug, include the output of "hipconfig --full" and samples/1_hipInfo/hipInfo (if possible).
Top Related Projects
Samples for Intel® oneAPI Toolkits
Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
AMD ROCm™ Software - GitHub Home
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot