compute-runtime
Intel® Graphics Compute Runtime for oneAPI Level Zero and OpenCL™ Driver
Top Related Projects
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
HIP: C++ Heterogeneous-Compute Interface for Portability
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Quick Overview
The intel/compute-runtime repository is an open-source project that provides the Intel Graphics Compute Runtime for OpenCL and oneAPI Level Zero. It enables developers to leverage Intel's integrated and discrete GPUs for general-purpose computing tasks, supporting a wide range of Intel processors.
Pros
- Supports both OpenCL and oneAPI Level Zero, providing flexibility for developers
- Optimized for Intel GPUs, offering high performance for compatible hardware
- Regularly updated with new features and improvements
- Open-source nature allows for community contributions and customizations
Cons
- Limited to Intel GPUs, not compatible with other manufacturers' hardware
- May require specific driver versions, which can lead to compatibility issues
- Learning curve for developers new to GPU computing or Intel's ecosystem
- Performance may vary depending on the specific Intel GPU model
Getting Started
To get started with the Intel Graphics Compute Runtime:
- Ensure you have a compatible Intel GPU and supported Linux distribution.
- Install the necessary dependencies:
sudo apt-get update
sudo apt-get install ocl-icd-libopencl1 opencl-headers clinfo
- Download and install the latest release from the GitHub repository:
git clone https://github.com/intel/compute-runtime.git
cd compute-runtime
mkdir build && cd build
cmake ..
make
sudo make install
- Verify the installation:
clinfo
This should display information about the available OpenCL platforms and devices, including your Intel GPU.
Competitor Comparisons
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
Pros of DirectML
- Broader hardware support across multiple vendors (not limited to Intel)
- Designed for machine learning workloads, potentially offering better performance for AI/ML tasks
- Integration with DirectX ecosystem for graphics and compute
Cons of DirectML
- Windows-centric, less cross-platform support
- May have a steeper learning curve for developers not familiar with DirectX
Code Comparison
DirectML (simplified example):
dml::Expression input = dml::InputTensor(graph, 0, inputDesc);
dml::Expression weights = dml::InputTensor(graph, 1, weightsDesc);
dml::Expression output = dml::Convolution(input, weights);
Compute Runtime (OpenCL-based):
cl_kernel kernel = clCreateKernel(program, "convolution", NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &inputBuffer);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &weightsBuffer);
clEnqueueNDRangeKernel(queue, kernel, work_dim, global_work_size, local_work_size, 0, NULL, NULL);
Both repositories aim to provide efficient compute capabilities, but they target different ecosystems and use cases. DirectML focuses on machine learning acceleration within the Microsoft ecosystem, while Compute Runtime provides a more general-purpose compute solution for Intel hardware using OpenCL.
HIP: C++ Heterogeneous-Compute Interface for Portability
Pros of HIP
- Open-source and vendor-neutral, supporting multiple GPU architectures
- Easier porting of CUDA code to run on AMD GPUs
- Active community development and frequent updates
Cons of HIP
- Limited support for non-AMD GPUs compared to compute-runtime
- Potentially lower performance on Intel hardware
- Steeper learning curve for developers new to GPU programming
Code Comparison
HIP:
#include <hip/hip_runtime.h>
__global__ void vectorAdd(float *a, float *b, float *c, int n) {
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < n) c[i] = a[i] + b[i];
}
compute-runtime:
#include <CL/sycl.hpp>
void vectorAdd(sycl::queue& q, float* a, float* b, float* c, int n) {
q.parallel_for(sycl::range<1>(n), [=](sycl::id<1> i) {
c[i] = a[i] + b[i];
});
}
The HIP code uses CUDA-like syntax, while compute-runtime uses SYCL for a more abstracted approach. HIP's syntax may be more familiar to CUDA developers, but compute-runtime's SYCL implementation offers better portability across different hardware vendors.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Pros of cuda-samples
- Extensive collection of CUDA code examples covering various GPU computing topics
- Well-documented samples with detailed explanations and performance tips
- Regular updates to support the latest CUDA features and best practices
Cons of cuda-samples
- Limited to NVIDIA GPUs and CUDA framework
- Samples may require specific NVIDIA hardware or driver versions
Code Comparison
compute-runtime (OpenCL):
cl_int status;
cl_platform_id platform;
status = clGetPlatformIDs(1, &platform, NULL);
cl_device_id device;
status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
cuda-samples (CUDA):
int deviceCount;
cudaError_t error = cudaGetDeviceCount(&deviceCount);
if (error != cudaSuccess) {
printf("Error: %s\n", cudaGetErrorString(error));
exit(EXIT_FAILURE);
}
The compute-runtime repository focuses on Intel's OpenCL implementation for their GPUs, while cuda-samples provides CUDA examples for NVIDIA GPUs. compute-runtime is more of a runtime implementation, whereas cuda-samples is a collection of educational examples. The code snippets show the different APIs used for device initialization in OpenCL and CUDA.
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Broader ecosystem and community support
- More extensive documentation and learning resources
- Supports a wider range of hardware platforms
Cons of TensorFlow
- Larger codebase and potentially steeper learning curve
- May have higher overhead for simple tasks
- Less optimized for Intel-specific hardware
Code Comparison
TensorFlow example:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
Compute Runtime example:
#include <CL/cl.h>
cl_platform_id platform;
cl_device_id device;
clGetPlatformIDs(1, &platform, NULL);
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
Summary
TensorFlow is a more versatile and widely-used machine learning framework with extensive community support. Compute Runtime, on the other hand, is specifically designed for Intel hardware and may offer better performance optimization for Intel GPUs. TensorFlow provides higher-level abstractions for machine learning tasks, while Compute Runtime offers lower-level control over compute operations on Intel devices.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Broader machine learning framework with extensive ecosystem
- More active community and frequent updates
- Supports dynamic computational graphs for flexible model development
Cons of PyTorch
- Larger codebase and potentially steeper learning curve
- May have higher resource requirements for basic operations
- Less focused on specific hardware optimizations
Code Comparison
PyTorch example (tensor creation and basic operation):
import torch
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)
compute-runtime example (OpenCL kernel execution):
cl_int err;
cl_kernel kernel = clCreateKernel(program, "vector_add", &err);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &buffer_A);
clSetKernelArg(kernel, 1, sizeof(cl_mem), &buffer_B);
clSetKernelArg(kernel, 2, sizeof(cl_mem), &buffer_C);
clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_size, &local_size, 0, NULL, NULL);
Summary
PyTorch is a comprehensive machine learning framework with a large community and ecosystem, while compute-runtime focuses on low-level GPU compute capabilities for Intel hardware. PyTorch offers more flexibility and ease of use for general machine learning tasks, but compute-runtime may provide better performance for specific Intel-based applications.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver
Introduction
The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is an open source project providing compute API support (Level Zero, OpenCL) for Intel graphics hardware architectures (HD Graphics, Xe).
What is NEO?
NEO is the shorthand name for Compute Runtime contained within this repository. It is also a development mindset that we adopted when we first started the implementation effort for OpenCL.
The project evolved beyond a single API and NEO no longer implies a specific API. When talking about a specific API, we will mention it by name (e.g. Level Zero, OpenCL).
License
The Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM) Driver is distributed under the MIT License.
You may obtain a copy of the License at: https://opensource.org/licenses/MIT
Supported Platforms
Platform | OpenCL | Level Zero |
---|---|---|
Intel Core Processors with Gen8 graphics devices (formerly Broadwell) | 3.0 | - |
Intel Core Processors with Gen9 graphics devices (formerly Skylake, Kaby Lake, Coffee Lake) | 3.0 | Y |
Intel Atom Processors with Gen9 graphics devices (formerly Apollo Lake, Gemini Lake) | 3.0 | - |
Intel Core Processors with Gen11 graphics devices (formerly Ice Lake) | 3.0 | Y |
Intel Atom Processors with Gen11 graphics devices (formerly Elkhart Lake) | 3.0 | - |
Intel Core Processors with Gen12 graphics devices (formerly Tiger Lake, Rocket Lake, Alder Lake) | 3.0 | Y |
Release cadence
Release cadence changed from weekly to monthly late 2022
- At the beginning of each calendar month, we identify a well-tested driver version from the previous month as a release candidate for our monthly release.
- We create a release branch and apply selected fixes for significant issues.
- The branch naming convention is releases/yy.ww (yy - year, ww - work week of release candidate)
- The builds are tagged using the following format: yy.ww.bbbbb.hh (yy - year, ww - work week, bbbbb - incremental build number from the master branch, hh - incremental commit number on release branch).
- We publish and document a monthly release from the tip of that branch.
- During subsequent weeks of a given month, we continue to cherry-pick fixes to that branch and may publish a hotfix release.
- Quality level of the driver (per platform) will be provided in the Release Notes.
Installation Options
To allow NEO access to GPU device make sure user has permissions to files /dev/dri/renderD*.
Via system package manager
NEO is available for installation on a variety of Linux distributions and can be installed via the distro's package manager.
For example on Ubuntu* 22.04:
apt-get install intel-opencl-icd
Manual download
.deb packages for Ubuntu are provided along with installation instructions and Release Notes on the release page
Linking applications
Directly linking to the runtime library is not supported:
- Level Zero applications should link with Level Zero loader
- OpenCL applications should link with ICD loader library (ocl-icd)
Dependencies
- GmmLib - https://github.com/intel/gmmlib
- Intel Graphics Compiler - https://github.com/intel/intel-graphics-compiler
In addition, to enable performance counters support, the following packages are needed:
- Intel(R) Metrics Discovery (MDAPI) - https://github.com/intel/metrics-discovery
- Intel(R) Metrics Library for MDAPI - https://github.com/intel/metrics-library
How to provide feedback
By default, please submit an issue using native github.com interface.
How to contribute
Create a pull request on github.com with your patch. Make sure your change is cleanly building and passing ULTs. A maintainer will contact you if there are questions or concerns. See contribution guidelines for more details.
See also
Level Zero specific
- oneAPI Level Zero specification
- Intel(R) OneApi Level Zero Specification API C/C++ header files
- oneAPI Level Zero tests
OpenCL specific
- OpenCL on Linux guide
- Intel(R) GPU Compute Samples
- Frequently Asked Questions
- Interoperability with VTune
- OpenCL Conformance Tests
(*) Other names and brands may be claimed as property of others.
Top Related Projects
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
HIP: C++ Heterogeneous-Compute Interface for Portability
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot